Disclaimer: This dissertation has been written by a student and is not an example of our professional work, which you can see examples of here.

Any opinions, findings, conclusions, or recommendations expressed in this dissertation are those of the authors and do not necessarily reflect the views of UKDiss.com.

Optical Music Recognition System

Info: 5301 words (21 pages) Dissertation
Published: 11th Dec 2019

Reference this

Tagged: MusicTechnology

CHAPTER 1

 

INTRODUCTION

  1. An overview of the need for optical music recognition

 

Music composers have been using paper as a media to translate their musical compositions into permanent masterpieces for centuries. Every region has their own distinct idea and theory on music and; an equally unique way of representing this music on paper. In Western music, the standardized notation called the Common Music Notation (CMN) is widely used to represent music. The representation of music on paper is called a music score or a music sheet. A music sheet is a very useful tool for novices and experts alike. Digitizing music sheets will not only help preserve old and antique music sheets but also help new artists compose music that can be readily printed and published. A system that aims at digitizing scores is called an Optical Music Recognition system. Optical music recognition is a computer system that can ‘read’ printed music [2]. Systems that can read and comprehend written word has been in existence for a few years now and; it has led to the greater availability of digitized content thus; preserving old novels and manuscripts.

4-2%20project/pics/quatuor.jpg

Fig 1.1.  An image of a music sheet.

OMR has been the focus of research for three decades now, but the perfect system is still far from realization. The first paper published on OMR was by scientists at MIT in the 1960s led by Pruslin. Blostein and Baird published an overview of OMR systems that were developed and researched between 1966 and 1992. Bainbridge and Bell published a generic framework for optical music recognition systems. And since then scientists have been continually conducting research into ways to improve the current OMR systems and also create a single system that is capable of doing all the functions that an ideal system should be capable of performing.

Scientists have come up with different systems over the years and; there are quite a few open source software available in addition to some expensive and professional software. OMR systems ideally should allow novices to listen to a piece of music form the music sheet for practice and also; allow musicians to translate their music directly into a music sheet[2].

There are many ways proposed by scientists on how music recognition can proceed, a majority of these methods start off with the detection and removal of staff lines. Stafflines form a large portion of the music sheet, the staff lines are simply sets of five parallel lines running through the length of the sheet. These stafflines as mentioned earlier are important in indicating the pitch of a note. They also help distinguish the key of the piece (means the fundamental notes making up the song’s melody, chords, and bassline are all derived from that group of notes of that scale e.g. for the key of C major the notes are derived from the scale of C major-C, D, E, F, G, A, B). Stafflines are a major hindrance when trying to extract individual notes and other music symbols from the music sheet. In optical character recognition where a definite set of characters is compared; music notes and other symbols have varying meanings depending on their position on the stafflines so, although they pose an obstacle in character extraction they have to be detected and subsequently removed in order to extract the music notes and symbols.

Though there are a handful of methods where OMR is performed without bothering the stafflines, as mentioned in[11], most systems start the process with staff line detection and removal.

  1.     History of OMR research

 

OMR has been the interest of researchers since the 1960s. There has been a lot of development in this field.

The research field of OMR began with Pruslin and Prerau and, since then, has undergone much important advancements. Several surveys and summaries have been presented to the scientific community: Kassler reviewed two of the first dissertations on OMR, Blostein and Baird published an overview of OMR systems developed between 1966 and 1992, Bainbridge and Bell published a generic framework for OMR (subsequently adopted by many researchers in this field), and both Homenda and Rebelo et al. presented pattern recognition studies applied to music notation. Jones et al. presented a study in music imaging, which included digitalization, recognition, and restoration and also provided a well-detailed list of hardware and software in OMR together with an evaluation of three OMR systems [11].

Access to low-cost flat-bed digitizers during the late 1980s contributed to an expansion of OMR research activities. Several commercial OMR software have appeared, but none with a satisfactory performance in terms of precision and robustness, in particular for handwritten music scores. Until now, even the most advanced recognition products including Notescan in Nightingale, Midiscan in Finale, Photoscore in Sibelius and others such as Smartscore, and Sharpeye cannot identify all musical symbols. Furthermore, these products are focused primarily on recognition of type- set and printed music documents and while they can produce quite good results for these documents, they do not perform very well with hand-written music. The bi-dimensional structure of musical notation revealed by the presence of the staff lines alongside the existence of several combined symbols organized around the note heads poses a high level of complexity in the OMR task [11].

Nowadays there are some libraries, such as the International Music Store Library Project (IM- SLP) that possess documents that are already indexed to the databases not only through parameters of usual researches, but also throughout its own melody. The existence of documents in these formats, its mainly due to the work of people that transcript the documents manually, or recently, due to developed image processing techniques that compile the information in an automatic [12].

Fig 1.2. OMR architecture as described by Ana Rebelo, Arthur Capela and co.

Methods that allow the detection of handwritten musical symbols using artificial intelligence will be developed. In a further stage, it is planned the development of a database, of a web server and a website. The digitized original documents, its digital component saved as MusicXML and its meta-data inserted by the user will be stored in the repository. The web server will be responsible for the processing of research modules and of the optical recognition of the scores. It will also be responsible of making the interface between the repository and the website. Finally, the website will be in charge for the interface with the user and for the managing of all of the system. The digitized original documents, its digital component saved as MusicXML and its meta-data inserted by the user will be stored in the repository. The web server will be responsible for the processing of research modules and of the optical recognition of the scores. It will also be responsible of making the interface between the repository and the website. Finally, the website will be in charge for the interface with the user and for the managing of all of the system [12].

Three other methods were submitted by Christoph Dalitz and Andreas Kitzig from the Niederrhein University of Applied Sciences, Institute for Pattern Recognition (iPattern), Krefeld, Germany [9]. The first one considers long horizontal runs as candidates. In order to work with music scores with some small rotation or curvature, this method uses a pre-processing step that rotates the image, aligning vertical strips based on their projection correlation.

The second method computes the staff height detecting the most frequent vertical run length. The detection of the stafflines is done by vertically thinning long horizontal runs with an average black percentage above a threshold (this step limits the detection of curved stafflines) and then, after grouping this horizontal lines based on the staff height and distance, the staffs are connected as sub-graphs. At last, the staff height is used to remove all the vertical runs with less than 2 * staff heightaround the stafflines.

The third method submitted by this group of authors is based on the construction of a graph comprising the branching and the corner points of the skeleton image. Using heuristic rules, the graph vertical and horizontal links are created giving evidencing the stafflines. Since the horizontal linking is based on extrapolation, the method fails for heavily curved stafflines.

All of these methods were developed to work with binary image, since the proposed database itself comprises only binary images. Besides the stable path method, that is upgraded in this work to work with grayscale images, there are two methods that could be adapted to work in grayscale too, the first and second methods developed by Christoph Dalitz and Andreas Kitzig. The first method, starts by considering long horizontal runs and then rotates the image in order to align vertical strips based on their projection correlation. In order to work in grayscale domain, this method could do a similar approach to the one used on Robust Staffline Thickness and Distance Estimation in Gray-Level Music Scores. That means, apply the existing method to all the binary images generated by all the possible thresholds and then instead of searching the long horizontal lines, it could detect small horizontal segments and later use them to check if there are some combinations of small segments that form a long horizontal line. Once the candidates are detected the second part of the method can work in the same way [12].

The second method, the approach could be similar to the previous method as regards the use of the binary images generated by all the threshold, but instead of searching for long horizontal lines it as to search for small vertical lines that meet the conditions defined in the binary mode. In a certain way, this method is similar to the one used on the Staff Pixel Detection algorithm but with different rules [12].

In recent years, scientists have been developing OMR systems that can be integrated onto mobile platforms to help musicians transcribe their music wherever they are. There are a few mobile applications already in existence but, most of these are paid applications and do not have functionality offered by older software made for computer systems. In a few years, scientists may succeed in creating a complete OMR system for mobile devices which is capable of performing all the functions that were envisioned by the people who started the initial research in this field.

  1.     Music Theory

Music theory considers how aural phenomena apply in music. Musical sounds are explained by characters called notes, which are named after the first seven letters of the alphabet, namely, A, B, C, D, E, F, G. A tone is a periodic sound characterized by duration, pitch, intensity and timbre. A note in music a more complex than a musical tone. It is the sum of simple tones with harmoniously related frequencies.

Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including ancient symbols or modern musical symbols and including ancient symbols cut into stone, made in clay tablets or made using a pen on papyrus, parchment or manuscript paper; printed using a printing press (ca. 1400s), a computer printer (ca. 1980s) or other printing or modern copying technology. Types and methods of notation have varied between cultures and throughout history, and much information about ancient music notation is fragmentary. Even in the same time period, such as in the 2010s, different styles of music and different cultures use different music notation methods; for example, for professional classical music performers, sheet music using staves and note heads is the most common way of notating music, but for professional music session, the Nashville Number System is the main method.

The notes are written on five parallel lines called the staff lines. The names of the notes are determined by the clef which is placed at the commencement of the staff. The two clefs required for the pianoforte are the Treble Clef and the Bass Clef. Two staffs are required as a rule for the pianoforte, they are joined by a brace.

Fig 1.3. Stafflines and clefs in a music sheet.

A music score or sheet is used by musicians to represent music in a manner that it can be read and interpreted by other musicians in the way the composer intended. The CMN or Common Music Notation is the most widely followed notation of music in the Western world. Music from other countries are different from Western music both in the way they are interpreted and represented. In Western music, all the instruments are tuned to A440Hz, which means that the note A in the fourth octave is tuned to 440Hz.

Apart from the notes, a music sheet consists of various musical symbols that add meaning to the notes. These symbols indicate the time signature (number of beats per bar) of the piece of music, the key signature of the piece (pitch of the piece) and other grammar like whether a note is loud (forte) or soft (piano), etc.

Fig 1.4. A typical music sheet.

In short a music score is a collection of notes and various other symbols represented on the staff lines. Figure 1.4 shows a typical music score or sheet. The notes and symbols are all placed on the staff lines, except for the symbols that give the expressions or dynamics of the piece, like the ‘ff’ (which stands for fortissimo in Italian) and ‘p’ (which stands for piano) in the figure above; which stands for ‘very loud’ and ‘soft’ respectively.

4-2%20project/pics/pianoforte.jpg

Fig 1.5. A pianoforte keyboard with the names of the keys.

The keyboard of the piano (earlier called the pianoforte), consists of black and white keys. The white keys indicate the natural notes while the black keys indicate flats and sharps. The black keys are divided into alternate groups of two and three, considering this it is easy to identify the names of the white keys. Figure 1.5 shows a piano’s keyboard along with the names of the keys and their representation on music sheets.

Apart from the notes there are various symbols and terms written on the sheet to add meaning to the notes. Table 1.1 is a summary of some of the most common music notations [11].

Table 1.1. Common musical symbols and their description

Symbols Description
Music-Staff.jpg Staff: An arrangement of parallel lines with spaces.
clefs.gif Treble, Alto and Bass clef: The names of the notes are determined by the clef which is placed at the commencement of the staff.
sharp_flat_natural.jpeg Sharp, Natural and Flat: These signs are used before a note to indicate a change in pitch.
1200px-Beamed_notes.svg.png Beams: Used to connect notes in note groups; they demonstrate the metrical and the rhythmic divisions.
n7LP9fh.jpg Staccato, Staccatissimo, Tenuto, Marcato, Accent, etc.: Symbols for special or exaggerated stress upon any beat or a portion of a beat.
music-note-1275650_960_720.png2000px-1-16_note_semiquaver_(music).svg.png17886d1237708205-how-write-song-one-love-32nd-note-use-1453CFF8E61037EB775.jpg Quarter, Eighth, Sixteenth, Thirty-second and Sixty-fourth note: The note head indicates the time duration of the note while, the flag indicates the relative time of that note with respect to a closed note (1 beat note called the crochet).
note-and-rest-values.png Rests: These symbols indicate a period of silence in music. There are specific rests for different kinds of notes like quarter, eighth, etc.
tumblr_inline_n2qxzpQE3q1sc8vmn.png Ties and slurs: Ties join the time value of two notes of identical pitch. Slurs affect note-groups as entities indicating that the two notes are to be played in one physical stroke without a break between them.
mordent.jpgTurn-2-website.png Mordent and Turn: Ornament symbols that modify the pitch pattern of individual notes.

There are many terms and symbols used in a music sheet, each adding meaning to the piece of music. Every term and symbol must therefore, be properly separated from the music sheet and recognised so that the music is reproduced as intended by the composer.

A music score is a great tool for novice musicians to learn the nuances of music on the other hand, the ability to read a music score takes quite a while to master. With this in mind scientists came up with the idea of creating a system that can play music from a music score directly and also allows musicians to translate their original music into digitized music scores.

  1. Uses of OMR

 

Initially the OMR was thought to be only useful in preserving music sheets but soon other uses for the system were thought of. Music comes in two forms: written and performed [2]. Music written on music sheets are interpreted and performed by musicians. Music can also be composed on an instrument first and then written down.

Scientists have developed OMR systems that can support both forms of music. These systems are useful to both novice and professional musicians alike. For novice musicians, the OMR system would serve as a learning tool. Music notation takes times to learn and understand so, for novice musicians the OMR system could directly read a music sheet and play the MIDI file generated thus, enabling the musician to learn the piece without having to read the sheet. For professional musicians, they can compose music and the OMR system could directly transcribe the music for them. Scientists are working towards creating an OMR system that can be used in both forms of music.

Another potential use of OMR systems is transposition of music. Any music is composed in a certain ‘key signature’, a key signature or key; indicates the set of chords used in that piece of music. Transposition is the process of changing the key of the music, this process is a bit complex when done manually especially, by amateurs. Researchers have developed OMR systems that are solely used for reading music and transposing music. This function is very useful for musicians who would like to change the key quickly and accurately.

OMR greatly simplifies the task of musical data acquisition however; we are not arguing that OMR should replace musical keyboard entry. In the particular circumstance where the music is already available in printed form, OMR can substantially accelerate the process of musical data acquisition. OMR, therefore, is an attractive supplement. Not only does the technique reduce the chance of human error made in the transcription, but it can also capture much of the “extra” information that the former method requires the user to laboriously add after the notes have been played.

The most likely scenario is one where OMR is used to process the majority of symbols on the page, followed by an editing stage using a standard music editor, where the musical and computer keyboards are used to correct mistakes and omissions. This has the added benefit of greatly reducing the musical keyboard skills required. Of course, OMR is not perfect, and the editing can be a significant part of the time taken for OMR-based data entry; however, OMR systems are steadily improving, and the number of music-based projects utilising OMR steadily increasing (Carter, 1992a). Given the vast body of printed music, OMR could radically reform computer applications in music.

4-2%20project/pics/forms%20of%20music.jpeg

Fig 1.6. Forms of music.

Figure 1.6 shows the two forms of music, written and performed. To help clarify the limits in musical data acquisition by computer, let us consider the larger context of music representation. Written music is converted to performed music by interpretation, and performed music is converted to written music by transcription. Both are non-trivial operations. A performer may study a written work for months before finally presenting their interpretation. The performer will draw upon a range of information, such as knowledge about the composer, and the technical limits of the instrument, as well as considering the mood and feeling of the work. To transcribe an imagined performance into a written form, a composer reverses this process, deciding what notation would indicate the desired effect. A similar process is undertaken when a recording that has never before existed in written form such as a jazz improvisation, or a folk tune; is transcribed.

Only limited success has been achieved by computer techniques that imitate these human processes. A computer application that requires the translation from written music to an audio equivalent or vice versawill inevitably suffer from a loss of information. Such a situation occurs in synthesiser keyboard data entry, and explains the strong reliance on post editing. Conversely, having a computer play exactly what is written in a score can lead to a mechanical sounding rendition. Better results are achievable if the computer music application stays on one side of the diagram in figure 6, a MIDI keyboard can be very expressive in sequencing work; and an OMR system is ideal for editing the written page [2].

An effective and robust OMR system for printed and handwritten music scores can provide several advantages to the scientific community: (1) an automated and time saving input method to transform paper-based music scores into a machine-readable symbolic format for several music software, (2) enable translations, for instance to Braille notations, (3) better access to music, (4) new functionalities and capabilities with interactive multimedia technologies, for instance association of scores and video excerpts, (5) playback, musical analysis, reprinting, editing, and digital archiving, and (6) preservation of cultural heritage.

Thus, over the years, researchers have discovered quite a few novel functions that an OMR system could perform that could help musicians everywhere to create new music with ease.

  1. Optical Music Recognition

An Optical Music Recognition system is one, that is capable of reading and understanding typeset, printed or handwritten music sheets. OMR is similar to Optical Character Recognition (OCR), in that it’s a system that comprehend written word or in this case written music. Scientists came up with the idea of developing OMR systems with the intention of preserving old handwritten music sheets that were on the verge of being lost to the effects of ageing and decay. But as research in the field increased, many scientists realized that the system can do more than just digitize and preserve the music, it can also play the music. OMR systems that can understand the meaning of the musical notes and symbols in the music sheet were developed.

In general, any OMR system consists of the following blocks:

/Users/antonygomez/Desktop/block2.jpeg

Fig 1.7. Block diagram of general OMR system.

The photo or image of the music sheet serves as the input to an OMR system. The music sheet can be either a typeset sheet or a handwritten sheet. Image preprocessing is done to prepare the image for the next steps in the process. Preprocessing usually entails binarization of the image and; in case the image is titled, deskewing is performed. Noise removal, deblurring and removal of white speckles are a few other image preprocessing techniques applied to the image in order to prepare it for the next block. Staff line detection and removal is a vital step in OMR systems. Stafflines, as mentioned earlier, are five parallel black lines running across the breadth of a music sheet, all the notes and musical symbols are placed on the stafflines. To perform OMR, one must be able to segregate the music symbols and notes to identify them and; to understand the meaning of each symbol. The stafflines make this process of individually identifying the symbols very difficult. Thus, the accurate detection and removal of stafflines is essential for the proper recognition of the music sheet. There have been many algorithms proposed for the detection and removal of staff lines, each with its own merits and flaws.

After the removal of stafflines the next step is the recognition of musical objects. The musical symbols and objects are identified and named. The notes are identified based on their position on the stafflines. The interpretation of musical symbols and notes is the next block in an OMR system. The meaning of the musical objects is determined from a library of sorts, containing the symbols and their meaning. Once all the symbols are correctly named, the MusicXML or MIDI file is generated.

  1. Performance parameters

 

The International Conference on Document Analysis and Recognition (ICDAR) 2011 Music Scores Competition: Staff Removal and Writer Identification [9] has set parameters to find out the robustness and efficiency of the competing algorithms namely, error rate, precision, recall and f. The competition prepared an extensive data set of images along with their ground truth images which could be used to test the algorithms and find out the parameters. This data set includes images with various distortions like curvature, rotation and white speckles in order to test various algorithms.

The performance parameters are as follows:

        for j=1:col-1

staff(i,j)=BW(i,j);

end

end

end

figure, imshow(staff),title(‘Stafflines’);

symbol=BW;

count=0;

flag=0;

% LTH Algorithm

for j=1:col

for i=1:row

if BW(i,j)==0

flag=i;

while BW(i,j)~=1&&i<row

count=count+1;

i=i+1;

end

if count<=height+2

for f=flag:flag+count

symbol(f,j)=1;

end

end

count=0;

end

end

end

figure, imshow(symbol),title(‘Stafflines removed’);

% Calculation of number of classified and misclassified staff and non-staff pixels

norop=ground|staff;

notop=~norop;

norop2=idsymbol|symbol;

notop2=~norop2;

sp=0;

nsp=0;

totsp=0;

totnsp=0;

corrsp=0;

corrnsp=0;

for i=1:col

for j=1:row

if ground(j,i)==0

sp=sp+1;

end

if idsymbol(j,i)==0

nsp=nsp+1;

end

if staff(j,i)==0

totsp=totsp+1;

end

if symbol(j,i)==0

totnsp=totnsp+1;

end

if notop(j,i)==1

corrsp=corrsp+1;

end

if notop2(j,i)==1

corrnsp=corrnsp+1;

end

end

end

missp=abs(totsp-corrsp);

misnsp=abs(totnsp-corrnsp);

% Calculation of the performance parameters for LTH algorithm

prec1=100*(corrsp/(corrsp+misnsp));

recall1=100*(corrsp/(corrsp+missp));

ER1=100*((missp+misnsp)/(sp+nsp));

F1=1/((0.5/prec1)+(0.5/recall1));

nsymbol=BW;

nstaff=staff;

count=0;

flag=0;

% ALTH Algorithm

for j=1:col

for i=1:row

if BW(i,j)==0

flag=i;

while BW(i,j)~=1&&i<row

count=count+1;

i=i+1;

end

if count<=height+2

for f=flag:flag+count

if staff(f,j)==0

nsymbol(f,j)=1;

end

end

end

count=0;

end

end

end

figure, imshow(nsymbol),title(‘Stafflines removed second time’);

for j=1:col

for i=1:row

if nsymbol(i,j)==0&& staff(i,j)==0

nstaff(i,j)=1;

end

end

end

figure, imshow(nstaff), title(‘Stafflines second time’);

% Calculation of number of classified and misclassified staff and non-staff pixels

norop=ground|nstaff;

notop=~norop;

norop2=idsymbol|nsymbol;

notop2=~norop2;

sp=0;

nsp=0;

totsp=0;

totnsp=0;

corrsp=0;

corrnsp=0;

for i=1:col

for j=1:row

if ground(j,i)==0

sp=sp+1;

end

if idsymbol(j,i)==0

nsp=nsp+1;

end

if nstaff(j,i)==0

totsp=totsp+1;

end

if nsymbol(j,i)==0

totnsp=totnsp+1;

end

if notop(j,i)==1

corrsp=corrsp+1;

end

if notop2(j,i)==1

corrnsp=corrnsp+1;

end

end

end

missp=abs(totsp-corrsp);

misnsp=abs(totnsp-corrnsp);

% Calculation of the performance parameters for LTH algorithm

prec2=100*(corrsp/(corrsp+misnsp));

recall2=100*(corrsp/(corrsp+missp));

ER2=100*((missp+misnsp)/(sp+nsp));

F2=1/((0.5/prec2)+(0.5/recall2));

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

Related Content

All Tags

Content relating to: "Technology"

Technology can be described as the use of scientific and advanced knowledge to meet the requirements of humans. Technology is continuously developing, and is used in almost all aspects of life.

Related Articles

DMCA / Removal Request

If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: