Whisp

Published on 04.01.2019 in [software]

Cat Spectrograms
Fireworks Spectrograms
Sea Waves Spectrograms
Sirens Spectrograms

Whisp - An Environmental Sound Classifier

Whisp is an environmental sound classifer that can be used to identify sounds around you. In its current form, Whisp will classify 5 second sounds with an 87.25% accuracy across a range of 50 categories, based on the ESC-50 dataset. You can also record sounds in the field to get a another perspective of what is happening in your sonic environment.

You can try the app here!

Trying the "Record your sound" feature on your computer might not get very satisfying results because, well, most of us are on a computer in pretty sonically uninteresting places. Definitely give it a shot on your mobile device when you're out and about, surrounded by more interesting environmental sounds :)

Introduction

As someone who has spent a lot of time recording and listening to sounds, the idea of a generalized sound classifier has always been a dream of mine to build.

I'm finding my interests moving more towards research in audio event recognition, so Whips is a first attempt to dive into that world.

Some applications that I've wanted to use one for include:

To those ends, I built a environmental sound classifier using the ESC-50 dataset and fastai library.

In this write up I will walk through the steps to create the classifier, as well as drop hints and insights along the way that I picked up from the fastai course on deep learning.

If you want to skip ahead, feel free to check out the Whisp repo on Github.

Dataset

The data I'm using comes from the ESC-50 (Environmental Sound Classification) Dataset.

This dataset provides a labeled collection of 2000 environmental audio recordings. Each recording is 5 seconds long, and is organized into 50 categories, with 40 examples per category.

Before training the model, its useful to spend some time getting familiar with the data in order to see what we are working with.

In particular, we are going to train our model not with the audio files, but with images generated from the audio files. Specifically, we will be geneating spectrograms from the audio files and train them with a deep learning neural net that has been pre-trained on images.

For more information on how I generated the spectrograms from the audio files, check out my spectrogram generator notebook on how I did this.

One thing to note is that with spectrogram images, I was able to get better accuracy by creating square images rather than rectangles, so that the training would take into account the entire spectrogram rather than just parts.

Training

To train the model, we are going to use a resnet34, use our learning rate finder, and train twice over 10 epochs.

From the fastai forms, I was able to get a general sense of when I'm overfitting or underfitting.

Training loss > valid loss = underfitting
Training loss < valid loss = overfitting
Training loss ~ valid loss = just about right

epoch train_loss valid_loss error_rate
1 1.063904 1.055990 0.325000
2 1.036396 2.332567 0.562500
3 1.049258 1.470638 0.387500
4 1.032500 1.107848 0.337500
5 0.924266 1.392631 0.417500
6 0.768478 0.623403 0.212500
7 0.596911 0.535597 0.165000
8 0.446205 0.462682 0.160000
9 0.325181 0.419656 0.135000
10 0.251277 0.402070 0.127500

Nice! That gets us an error rate of 0.127500, or 87.25%!.

There is a bit of overfitting going on (Jeremy Howard would think its ok), but still, really great results!

Here is our confusion matrix which looks pretty good.

Whisp Confusion Matrix
Whisp Confusion Matrix

Future Paths Forward

I'd like to train this model on Google's AudioSet data.

Explore more data augmentation methods as described in Salamon and Bello's paper on Environmental Sound Classification.

References

ESC-50 Dataset

ESC: Dataset for Environmental Sound Classification

Audio Classification using FastAI and On-the-Fly Frequency Transforms

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Environmental Sound Classification with Convolutional Neural Networks

Music Genre Classification

Labocine

Published on 05.30.2017 in [software]

Labocine is a new platform for films from the science new wave. In the iOS app, you can browse Labocine's monthly ISSUES for a special selection of exclusive science films every month and read about the scientists and filmmakers leading the science new wave in SPOTLIGHTS.

In App

In action

Format No. 1

Published on 05.29.2017 in [software]

Format No. 1 is a novel optical sound experience that consists of an iPhone application and visual scores. For this project I developed an iOS application turns the iPhone into an optical sound device and visual scores / installations.

Transmissions

Published on 05.29.2017 in [software]

Transmissions is an iOS application that allows an audience to participate in a musical group performance.

Format No. 2

Published on 05.29.2017 in [software]

Format No. 2 is a novel optical sound experience that consists of an iPhone application and visual scores. For this project I developed an iOS application. The iPhone application uses computer vision algorithms, to recognize circles and plays a soundscape depending on the size and the location of the circles it sees.

Transient

Published on 05.29.2017 in [software]

Transient allows you to record sounds and quickly upload them to a map so you can listen to the world around you!

Simply hold down the record button and let go. When you're done, go to the map view and see your sound where you recorded it! You can also listen to sounds other people have recorded.