Having software that reads the printed (or handwritten) page of music and converts it into digital notation is a hugely complex task that is still in its infancy, but is one that serves as a huge boon for those people who need to use it. The amount of time saved by not having to re-enter all the notes by hand is immeasurable. It’s been a long journey from having the concept of integrating Optical Music Recognition (OMR) into QuickStave to actually having the feature implemented.

I started by stating the principles that would govern how the system should work:

  1. OMR should be free. If there is already free software available that does OMR then we should not be charging. We want to make an already existing process easier for you.
  2. OMR should be fast. (Or at least not painfully slow.). We are called QuickStave for a reason.
  3. OMR should be available on any device. You shouldn’t have to install an application to use OMR, it should work straight in the browser.

Optical Music Recognition in QuickStave

The decision to make it free severely limited my options. I didn’t want the code to be run server side, because it would then incur a financial cost for each rendering. Running client side limits everything to the resources available on whatever device runs it, whether that is a mobile phone, or high-end gaming PC.

I started by analysing what was currently available (both commercial and free), and tried to plot a route forward.

The first big contender was basing our engine selection on Audiveris. This is a Java project released under the GNU Affero General Public Licence and has an installable application for scanning. The software is a little confusing, but it is fast, established and best-in-class. The only problem (haha) was that it didn’t run in a browser. So my first experiment was to use CheerpJ to compile it into WASM and run the code in a worker thread.

It worked!

It also took 170 seconds to run on a single page. There was seemingly no way forward. Even if I found a way forward, I could not see how to get round the AGPL licence… I don’t want to be releasing all my code publicly.

So I sat on it for a little while.

When I revisited the idea, I found the Oemer project, released under the MIT licence, a much more permissive licence that allows free use of the code. This is lovely cutting edge stuff, using neural networks to evaluate the scores, and runs nicely in a browser! Everything looks great… except it took 340s to scan a page using CPU, and 100s using an RTX3070 GPU. Still too long to really be a viable solution. Also (maybe this was a personal issue) it wasn’t great at detecting grand staves, so tended to render everything as one long line.

Again, I felt like I was approaching a dead end, but the GitHub project for Oemer mentions another project called homr, which is again released under the GNU Affero General Public Licence.

Well, I had to try it.

And it is great. Fast (15 seconds for a single page) and gives absolutely fantastic results. It also mentions a follow up project called Andromr which is an Android app for OMR. This is an ongoing project, so if anybody wants to help them, I’m sure they would be happy for you to contact.

But again, there’s the AGPL licence… so I don’t want to be using this software directly. But I can learn from it. And I can write a decoder using the ideas from the base library in Typescript. And then I can also write a new training application, download the same open source datasets used to train homr, and then kick off my own training.

And screw up several times.

And kick off more training.

So now, long story short, OMR is available in QuickStave, and you can access it by pressing import on the “Scores” screen. This is an early iteration, and an early release, but I’m especially proud of this feature, and I hope you make good use of it.

— John Quick