Music source separation is the task of extracting separate instrument tracks, or stems, from a final a song mix. This is an incredibly hard problem, often compared to getting the eggs back from an omelet.
While theorists haven’t completely solved yet the source separation problem, recent advances in AI are getting us closer and closer to the point where results start to be usable for practical purposes. If you’re curious about how the stuff work, take a look at the various algorithms out there: Spleeter, Open-Unmix, Demucs, to name just three of them.
Many of these systems are open source and available for you to play with, as long as you’re comfortable with Python scripting and that kind of thing. Some of us, though, would like music separation to be available through a graphical user interface. It would be great if we could, in a few clicks, extract audio stems from a video clip and isolate specific instruments during playback.
And that’s what Chronotron brings you in Release 159. To implement music separation, Chronotron relies upon Spleeter’s 2-stem and 4-stem algorithm flavors – huge thumbs up by the way to the guys at Deezer who open sourced the algorithm and models!
For the output stage, Chronotron uses a different approach than most other tools out there: instead of producing a separate audio file for each stem, Chronotron relies on the ability of MP4 media containers to support multiple audio streams, and creates a single output file (.mp4 for video, .m4a for audio) with all extracted stems embedded on it. The app also lets you choose whether or not to include the original audio track in the final output.
You can enjoy the resulting file on any player that supports switching audio streams! Most video players have this feature – maybe even your smart TV set does – to let you select between different movie languages.
Chronotron, too, supports multiple audio streams since Release 155. You can select the current stream in the Audio panel, next to the Channel selection controls, as depicted below – it’s worth noting that audio stream selection is only visible when the media clip contains more than one audio track.
That’s about all there’s to it. I hope you’ll enjoy source separation in Chronotron as much as I enjoyed implementing it.
As cool as this feature is, however, it does have some limitations you should be aware of.
For starters, there’s still progress to be made on the AI side so it performs more consistently across a wider variety of audio material, so you will find that separation may not work great in many cases. To make things even harder, most music today is delivered after compression with lossy algorithms – MP3 being a good example – that discard sounds that can’t normally be heard in the final mix, like weaker notes masked behind more prominent sounds. This poses an additional challenge for widespread adoption, as the generally available music sources aren’t exactly audio masters.
Other limitations are related to the implementation itself. For example, as of this writing, the source separation feature is only available on the x64 CPU architecture. This means that if you have an old x86 CPU, or even a brand new ARM64 system, you’re out of luck. Also, the current version of the app processes the audio data in memory, therefore requiring a lot of RAM; that’s why Chronotron won’t let you separate clips longer than 10 minutes.
Finally, I would have liked to label the audio streams in the output file as “Vocals”, “Drums”, and so on, to make them easier to identify during playback. Unfortunately, as of this writing, the technology Chronotron uses for multiplexing – a fancy word for merging – audio and video streams doesn’t support labelling them.
Update: Starting in Release 160, audio streams are properly labelled! Because the app assigns track names while generating the output file, only newly separated tracks will be labelled, so you may need to run separation again if having proper stream names is important to you.
Some of these limitations will hopefully fade away in future releases, so stay tuned!