Flights of Fancy

Published on 05.26.2020 in [software]

For the past year or so, I’ve turned my ears to the underwater rumblings, industrial gnashings, and overhead zoomings that make up the sonic environment surrounding the Newtown Creek, a body of water that separates Brooklyn and Queens and is one of the most polluted Superfund sites in the country. The creek is infamously known for the Greenpoint Oil Spill, where somewhere around 17 to 30 million gallons of oil and petroleum products seeped into the creekbed over the span of decades, only to be discovered by a Coast Guard patrol helicopter in 1978. Since being designated a federal Superfund site in 2010, many environmental remediation projects in the area aim to clean the water and surrounding ecosystem. Still, the area remains environmentally compromised due to industrialization, continued oil pollution from nearby refineries, combined sewage overflow events that regularly dump human waste into the creek, and toxic runoff from cars and trucks that drive across the busy streets, bridges, and highways that pass over and through the area.

An ariel photogrpah of Newtown Creek cutting between Brooklyn and Queens
An ariel photogrpah of Newtown Creek cutting between Brooklyn and Queens
Another view of Newtown Creek with Manhattan in the background
Another view of Newtown Creek with Manhattan in the background

Despite this, the area piques my curiosity as both a harbinger of the climate crisis at our doorsteps and also as a potential stage for how we might learn to coexist with such a present-future. Sonically I’m drawn to the whooshing of cars that pass above on the Long Island Expressway, the stochastic bubbling of aeration systems meant to re-oxygenate the murky waters, and the resilient wildlife that still makes this once vibrant marshland home. If you’re lucky you may catch a glimpse of a stray egret searching for food among patches of sawgrass planted by ecological restoration projects. Crabs, jellyfish, and the occasional seal still swim below the creek’s still surface. Closer to the nearby Fedex distribution facility, cacophonous calls of birds suggest an area that isn’t so devoid of wildlife after all.

A photograph of the Fedex distribution facility
A photograph of the Fedex distribution facility

That is, until you realize that these bird calls are not made by living creatures. Rather, they’re a mix of artificial, pre-recorded birds that are meant to keep actual birds away from the inside of the distribution facility so they won’t nest or cause any unnecessary disruption to the continual churn of capitalism and industry. Having heard these calls, I was left wondering: What birds are these calls meant to be? (World-renowned birder Laura Erickson believes they might be American Robin or European Blackbird and some kestrels.) More interestingly, what birds were these calls meant to keep away? Are those “pest” birds even around anymore? If not, are these artificial bird calls singing out to phantom birds that no longer exist?

Curious about the poetic implications of these "Birds of FedEx,” and in keeping with my desire to learn more about audio machine learning during my time at the Recurse Center, I set out to make a bird sound classifier that could be used to identify birds in the Newtown Creek based on their calls. I had built an environmental sound classifier before, so the project was meant to get me more experience with audio scene classification. I was also interested in how identifying bird species could be useful for environmental remediation projects that require identifying and maintaining counts of animal species across a wide area—something that microphone arrays and sensors networks make possible. Finally, for my own creative and artistic investigations, I thought the idea of a bird sound classifier could help in my acoustic explorations of the Newtown Creek.

An photograph of a man holding a cellphone near the shoreline of Newtown Creek
An photograph of a man holding a cellphone near the shoreline of Newtown Creek

I started by finding a suitable bird sound dataset to train on. The largest and most comprehensive one I could find was the BirdCLEF dataset, which comprises over 36,000 recordings of bird calls across 1,500 species native to Central and South America. Based on the work done in the BirdCLEF baseline system, I was able to organize the recordings into folders named after each of the bird species to be later used for classification training.

A screenshot of a computer terminal printing out the names of various species of birds
A screenshot of a computer terminal printing out the names of various species of birds

The next big hurdle was finding the bird calls in the recordings. Each recording lasts a few seconds to a few minutes, with no documentation of where in the recording the bird calls are (this is known as a weak-label problem, common in audio event classification where the onset and offset times of the event in question are unknown). I wrote code to scan through each of the recordings and cut them into one-second segments. From there, I applied a short-time Fourier transform on each segment to generate a spectrogram. Then, by using a heuristic, I was able to determine whether or not a segment contains a “chrip” (which I assumed to be a viable bird call). From there, I saved each spectrogram in a folder named after the corresponding bird species. The result was thousands of bird call spectrograms, nested inside their respective bird species folder, which I used for training a neural network. This animated GIF shows spectrograms generated for the Golden-Capped Parakeet (Aratinga auricapillus).

An animated spectrogram of birdsong audio
An animated spectrogram of birdsong audio

After preprocessing all of this audio data, the only thing left to do was to train a neural network to classify the bird calls. I opted to use fast.ai, a wonderful library that serves as a easy-to-use API on top of PyTorch, and does a lot of the boilerplate work for you in setting up and training a neural network. Since I was going to be training on spectrograms (images that represent the frequency content of a signal), I used a convolutional neural network pretrained on ImageNet. I used a technique known as transfer learning, which reuses most of a pre-trained network that’s already able to recognize visual patterns like curves and edges. The spectrogram training data was only used to “fine-tune” the last layer of the network. After many hours of training, the neural network was able to recognize bird calls from my validation set to an acceptable accuracy, validating this project and idea.

A screenshot of a Jupyter notebook showing a machine learning model in training
A screenshot of a Jupyter notebook showing a machine learning model in training
Another screenshot of a Jupyter notebook, this time showing the model classifying audio files into bird species.
Another screenshot of a Jupyter notebook, this time showing the model classifying audio files into bird species.

The work continues to this day, and I’m working to find more bird call field recordings specific to the Newtown Creek to use as training data for my network. I also collaborated with artist Kelly Heaton to use my classifier for categorizing bird-like sounds made by her electronic bird sculptures. I’m still out and about the Newtown Creek these days, so if you see someone, ears covered by headphones and microphone in hand, call out a hoot. I might turn and look in your direction, hoping to hear new and exciting things happening in the area.