Week 3: Two Sides of the Same Coin

Published on 01.24.2020 in [rc]

Hello again. Its been really nice taking the time on Fridays to try to write and collect my thoughts on what I've been doing over the course of the week, looking back on the previous weeks, and looking forward into the future.

At the end of last week, I spent some time trying to diagram, to my understanding, the world of audio ML. Here's a photo of what I whiteboarded:

What I started to realize is that the field of audio ML has two distinct "sides": analysis and synthesis. This shouldn't be too surpiring to me. In my Audio Signal Processing class, we're always talking about analysis first, and then synthesis (which is usually the inverse of the analysis process).

This lead me to thinking that what how I should spend my time at RC. Maybe I should spend my first six weeks deeply working on analysis, which in this case would be classification tasks. Then, I could spend my last six weeks looking at synthesis, which would be the task of genrating sounds. This was what I was thinking about doing before coming to RC, and this approach seemed like a good way to see the entire field of audio ML.

I was worried though. Would I actually come out with something subtantial if I split my time like this? My larger goal at RC was to become a dramatically better programmer, and I began to think that maybe becoming more of an expert at one side of the coin would actually prove to be a better use of my time here.

The answer I came to was also driven by the fact that I would be a resident at Pioneer Works right after my time at RC, so I would be spending an addtional 12 weeks somewhere else where I could be more "creative" and "artistic" in my approach. So, that lead me to decide (for now) that I would dedicate the rest of my time at RC fully and deeply understanding the analysis side of audio ML, and build out a tool kit for real-time audio classification that I could use in the field. Yay!

With that in mind, I'm continuing my investigation into bird sound classification, with the intention of making a real-time audio classification app that lets you identify birds in the field, along with other environmental sounds (which would end up being an upgrade to my Whisp app), and speech as well. I don't necessarly have to get to all of these things at RC, but I can build out the scaffolding/framework to do this, and use bird sounds as my first, deeply investigated dataset. I think I will also have the time to fold in environmental sounds as well, as its something I've done before, and maybe even sneak in speech as a stretch goal.

All of this is being facilitated with my involvement with fastai's audio library. I'm proud to say that my first pull request was merged into the library this week! This makes for my first open source contribution :D

I had a lot of great conversations with people in the audio ML space this week, including Yotam Mann and Vincent Lostanlen. Both have been super supportive in my work and have made themselves available to help out where they can. In particular, Vincent pointed me to a lot of great research around bird sound classification, including BirdNet. I wasn't able to find their dataset, but it lead me to the BirdCLEF dataset. Vincent said it was weakly labeled, with no onset/offset times for the bird sounds, so it might require a lot of work to get going. We shall see!

Otherwise, this week was also good for my Audio Signal Processing class. We learned about the Sinusoid model and how to do things like spectral peak finding, sine tracking, and additive synthesis.

In algorithms, a lot of time this week was spent on trees, including binary trees, binary search trees, and self-balancing trees like AVL trees. In the Algorithms Study Group we also spend a lot of time looking at Floyd's cycle finding algorithm, quicksort, and graph traversal algorithms like depth-first search, breadth-first search, and Dijkstra's shortest path finding algorithm.