Whisp is an environmental sound classifer that can be used to identify sounds around you. In its current form, Whisp will classify 5 second sounds with an 87.25% accuracy across a range of 50 categories, based on the ESC-50 dataset. You can also record sounds in the field to get a another perspective of what is happening in your sonic environment.
You can try the app here! It works on desktop Firefox and mobile only Safari on iOS (Chrome has some issues that don't let using the microphone for recording work right now, sorry!).
Trying the "Record your sound" feature on your computer might not get very satisfying results because, well, most of us are on a computer in pretty sonically uninteresting places. Definitely give it a shot on your mobile device when you're out and about, surrounded by more interesting environmental sounds :)
As someone who has spent a lot of time recording and listening to sounds, the idea of a generalized sound classifier has always been a dream of mine to build.
I'm interested in creating technologies that change our relationship to the sounds in our environment. Or another way, I like creating sound technologies that change our relationship to our environment and the world at large.
I'm finding my interests moving more towards research in audio event recognition, so Whips is a first attempt to dive into that world.
Some applications that I've wanted to use one for include:
A tag suggester for field recordings
An augmented reality app that identifies sounds around you for the hard-of-hearing community
A tool for sound artists for analyzing audio events in your surroundings
To those ends, I built a environmental sound classifier using the ESC-50 dataset and fastai library.
In this write up I will walk through the steps to create the classifier, as well as drop hints and insights along the way that I picked up from the fastai course on deep learning.
This dataset provides a labeled collection of 2000 environmental audio recordings. Each recording is 5 seconds long, and is organized into 50 categories, with 40 examples per category.
Before training the model, its useful to spend some time getting familiar with the data in order to see what we are working with.
In particular, we are going to train our model not with the audio files, but with images generated from the audio files. Specifically, we will be geneating spectrograms from the audio files and train them with a deep learning neural net that has been pre-trained on images.
One thing to note is that with spectrogram images, I was able to get better accuracy by creating square images rather than rectangles, so that the training would take into account the entire spectrogram rather than just parts.
To train the model, we are going to use a resnet34, use our learning rate finder, and train twice over 10 epochs.
From the fastai forms, I was able to get a general sense of when I'm overfitting or underfitting.
Training loss > valid loss = underfitting
Training loss < valid loss = overfitting
Training loss ~ valid loss = just about right
Nice! That gets us an error rate of 0.127500, or 87.25%!.
Here is our confusion matrix which looks pretty good.
Testing in the Field
I've been taking Whisp with me out on field recording expeditions around the Newtown Creek.
One night with Mitch Waxman, I took an early version of Whisp and made field recordings around the Dutch Kills area of the Newtown Creek, and down near Blissville. I extracted 3 sounds from the recordings that I knew would show up in the ESC-50 dataset categories.
Whisp classified this sound as a washing machine with 69% confidence, which... isn't exactly correct. But hey, a washing machine does sound a lot like an engine when its running right? I can understand the ambiguity. Whisp had 18% confidence that it was a helicopter, and 5% confidence that it was an engine (of some sorts).
Whisp classified this sound as a thunderstorm with 97% confidence, which are usually pretty windy! The next highest confidence score was wind, with 7% accuracy.
Finally, Whisp classified this sound as a car horn with about 99% accuracy. Given that the dataset doesn't have "train horn" as a category, we can live with this being close enough ;)
Hunter's Point Park (Hunter's Point South Park Extension) - mouth of Newtown Creek
I recently took Whisp out into the field with Taiwanese sound artist Ping Sheng Wu to test Whisp in the field.
We saw a group of birds off into the distance.
Whisp was able to hear and classify their chirping!
We tried getting some water sounds, but most of it came back as wind, as that was the dominant sound out there. Sea waves did come back though, but with a low 3% confidence rating.
On our walk back to the train station, we found a fan and decided to try Whisp'ing it.
Whisp thought it was a vaccum cleaner, which, like the example above of the engine that sounded like a washing machine, isn't too far off. It also thought it could have been a washing machine and plain old wind.
Future Paths Forward
I'd like to train this model on Google's AudioSet data.
I'm also interested in Exploring more data augmentation methods as described in Salamon and Bello's paper on Environmental Sound Classification.
Some ideas that I'd love to explore are the idea of a "sound homonym". For example, there are a lot of sounds that sound similar to each other, and that the classifier gets wrong but is pretty close (washing machine vs. engine, for example). I wonder what it would look like to play around with sound homonyms for performance.
The other thing that I'm interested in is the "distance" between sounds. For example, the classifier gives you the "closest" prediction it thinks the sound is. You could imagine that the prediction that is the least close is the furthest away. It would be interesting to push this idea further and think about how different sounds are more or less distant from each other. What would it mean for a sound to be the opposite of another sound? Or the most different sound?
Because of the amount of audio files, this post is best experienced using Firefox!
Quiet Music, Weak Sounds
Early one summer morning in Kyoto, I took this photo on along the banks of the Kamo River.
On the left bank, a women greeted the morning with outstretched arms. Up above, three birds circled over the water. Kyoto, already a sleepy city, was still waking up. In its stillness, all I had was the quiet of the city's early dawn around me. From that vantage point, Kyoto's deep morning silence stretched far into the horizon, up the Kamo River valley, and into the mountains hidden behind the clouds.
Since that summer in 2012 I have been searching for quiet sounds all around me. Sometimes these are literally quiet sounds — inaudible because of their low volume compared to the bigger, louder sounds around them. But often times these quiet sounds are not so quiet at all, and instead are quiet because of our relationship to them. They are sounds that we don't pay attention to. They take up less space. They are often "overheard" (analogous to "overlooked") because they are not usually sounds we focus on. They are effectively inaudible, used here in a similar way one might use the term "invisible". They are muted and minuscule, diminutive and shrunken, minor and pathetic. They have no wants and cause no trouble. They are sounds pushed off to the side and forgotten, overshadowed by the larger, familiar and more heroic sounds in the environment that people instantly recognize, are drawn towards, and quickly reference when describing a place.
Some examples of quiet sounds that I've come across include:
The reverberations of street life transduced through a hollow pole
The piercing pitching of neon signs
The rhythmic knocking inside of cross-walk buttons
The brushing of ripples against a lake's shore
The soft patterings of light February snow
Most of my work for the past five years has been shifting people's attention to those sounds, in an attempt to broaden our understanding of the world around us. Through this deeper understanding, we can create new, original, and more personal relationships to our environment through the discovery of the delicate, poetic, and ephemeral sounds around us — if only we took the time to listen.
My time in Japan, especially as an InterLab Researcher at YCAM, taught me a lot about listening in new ways, and I knew I would one day return back to Kyoto to get to deeply know the city and the sounds within.
The following year, I contacted Tetsuya Umeda, a sound artist from the Kansai area, about the possibility of doing listening tours in Kyoto to explore its sounds, and he advised me to get in touch with Social Kitchen, a community arts center in the city.
It would take a couple of years before I could find a way to work with Social Kitchen, and in 2015 I was introduced to the Asian Cultural Council, who ended up supporting my time in Kyoto through a fellowship grant.
I got in touch with Social Kitchen again in 2016, and they introduced me to Eisuke Yanagisawa as someone with whom I should collaborate with during my time in Kyoto.
Five years after I took that photo on the Kamo River, I was able to return to Kyoto and embark on a four-week long residency to explore its sounds through a series of workshops and field work research expeditions, titled Quiet Music, Weak Sounds
Before I arrived, Social Kitchen, Eisuke and I came up with the following program:
Quiet Music, Weak Sounds is a collaboration between sound artists Johann Diedrick and Eisuke Yanagisawa to discover, amplify, and share the subtle sounds in Kyoto, Japan.
Over the course of four weeks, Diedrick and Yanagisawa will explore Kyoto’s soundscape with custom microphones, amplifiers and field recorders.
Informed by their findings, the two will host a series of workshops, teaching members of the community how to build and use their own sonic investigation tools.
They will turn participants into acoustic explorers and take them on explorations of Kyoto to find, record, edit, and present their own found sounds.
Afterward they will construct Aeolian harps with the participants and introduce the harp's sounds to Kyoto’s Kamo River path.
Finally, the two will present their findings to the community at large, in the form of a talk and reception party.
After finally arriving in Kyoto in April 2017, Eisuke and I met and began our collaboration together.
Mobile Listening Kit Workshop
The first event hosted at Social Kitchen was a Mobile Listening Kit workshop. The workshop introduced participants to the world of sound art and provided techniques for making tools to create these experiences. This included the fabrication of a mobile listening kit and a contact microphone for use in installations, performances, and scientific research.
The mobile listening kits are portable amplifiers that can be used to hear quiet sounds in your environment. They consist of an input for different kinds of microphones — in the workshop we built and used contact microphones. You can adjust the volume of the input sound with a volume knob, and hear vibrations on surfaces through headphones or speakers. The kits are used to focus in on sounds that normally can't be heard because of their volume, and are designed to be portable for everyday use and exploration.
Most of the participants had never built any kind of electronic device before, and the workshop involved a lot of soldering and hands-on fabrication. It was important for me to have people actually build these kits, instead of using pre-built ones, because I think it is important to teach people how to teach themselves. Only by learning how to teach myself was I able to do and make the things I can today. I think it is critically important for artists to learn how their tools work and function, so that they can modify them for their own creative purposes.
Field Recording Workshop
The next day, Eisuke and I hosted a field recording workshop. In this workshop we gave participants the opportunity to find and record sounds outside with their mobile listening kits, field recorders, and different kinds of microphones. We didn't give much instruction on what to listen for, except only to try to discover sounds in places where they least expected. In this way, the workshop encouraged listeners to reimagine their sonic environment by playfully exploring the world through their ears.
The workshop began with a quick lecture on how to use microphones and field recorders for recording sound.
Soon after we went to Goryou Shrine, a Shinto shrine just a short walk from Social Kitchen. We spent most of the afternoon at the shrine exploring its cracks, surfaces, and hidden spaces.
After spending two hours recording sounds, we came back and did a short lecture on editing field recordings. At the end of the workshop, participants presented their recordings to each other, which prompted lively questions and discussion.
Aeolian Harp Workshop
In the final workshop, we built Aeolian harps, a type of string instrument played by the wind.
Aeolian harps are objects of mystique because of the quality of the sound they produce and how that sound is made. They can range in look and form, but in general they look like simplified harps or guitars, with a hollow wooden body, usually with a sound hole, and a number of strings stretched across. Instead of plucking or bowing the harp, you can place it in the vicinity of a moderate, consistent gust of wind, and as the wind vibrates the strings, the harp produces a ghostly, haunting sound — seemingly out of thin air.
We were both really excited to host the workshop because we knew it would be a beautiful demonstration of how one can collaborate with the environment to produce sound, instead of treating sounds in the environment as a resource to be extracted as we had done in the two previous workshops. We knew that producing sounds from the harp would be difficult for a number of reasons, least of which would be that we didn't have any control of the wind on the day of the workshop. Conceptually this worked in our favor, because it meant that participants had to concentrate hard to produce and hear the sounds from the harp. They wouldn't be able to get the immediate satisfaction of making sounds like you would with an electric guitar, drum set, or computer. Instead, they had to be very patient and work with the environment to orient the harp in such a way that when a gust of wind blew their way, the harp would sound. Each sound was to be precious. The participants had to wait in anticipation, excitement, and yes, frustration, for each sound to come. Our hope was for them to ultimately develop a new kind of appreciation for weak, quiet sounds that can be just as fleeting as the wind.
Before starting this workshop, Eisuke and I traveled to Osaka to visit Kosuke Nakagawa, an expert at building string instruments including Aeolian harps. At his studio he showed us his instruments and walked us through how to build an Aeolian harp for our workshop.
Here is a video of our prototype in action:
Back at Social Kitchen, we built our Aeolian harps together.
When we were done, we brought their natural singing sound to the Kamo River. As expected, it was difficult to get the harps to sing. Walking around the river, we searched for the best place to find ideal wind conditions. Participants readjusted and realigned their harps in order to find the best position. In the process, they developed a consciousness around wind speed, path, and direction in the surrounding environment. And soon enough, the sounds came.
You can hear a sample of what the Aeolian harp sounded like here:
During my last week in Kyoto, Eisuke and I were able to spend two days doing our own field work. We were both interested in quiet sounds, but from two different perspectives. I was interested in sounds that were quiet both in their actual volume and in their general level of recognition - sounds that lack audibility (analogous to visibility). Eisuke is interested both in sounds that reside outside of our human hearing range (mostly ultrasonic sounds), and sounds that also lack "hear"-ability because of how remote they are (he studies highland gong music from Vietnam). We picked two sites in the city noted for their quietness and sonic diversity.
The first place we went was already very familiar to me - the Kamo River. When we arrived, we found ropes installed over parts of the river that were designed to deter birds from eating fish that were swimming upstream to spawn during the spring season. The ropes would vibrate with the wind and cause a really deep frequency sound that we could record with our contact mics.
Here is what one of the ropes sounded like:
Eisuke made a similar recording as well:
I also recorded some sounds from the surface of the water with my mobile listening kit.
Eisuke also recorded sounds from underneath the Kamo River with his hydrophone.
We also found a nearby pipe that captured and reverberated the sounds of the river.
Eisuke was able to stick a mic in the pipe and record some of the sounds inside.
The next day we traveled to Katsura and Kamikatsura, located near the mountains northwest of central Kyoto. There we recorded the sounds of the Hankyu Line, the Katsura River and a nearby bamboo forest.
Against the fence you can hear the roaring and rumbling of passing bikes, cars and trains.
Closer to the mountains, we visited Jizo-in Temple.
In this temple, you could hear the sounds of birds in the bamboo...
and the sound of two flowers rubbing against bamboo while swaying in the wind.
Further up we recorded the sounds of a small falls near the Katsura river. I recorded some sounds with my mobile listening kit.
Eisuke recorded similar sounds from the same river with his parabolic microphone.
He also captured the sounds of the rustling bamboo...
...and these incredibly physical sounds of large bamboo shoots cracking and snapping.
In my final week in Kyoto, we hosted a reception at Social Kitchen to present our past work and our collaboration together.
At the end of the reception we did a live performance of our field recordings.
May Peace Prevail on Earth
Over the past few years, I have been documenting my explorations of weak sounds through short recordings with my mobile listening kit and photos taken with disposable cameras. That project, titled It Is Impossible To Know About Earth, So We Must Hear Her Voice In Our Own Way is still ongoing. During this residency, however, I decided to try documenting my recording situation with drawings as well. As part of the reception, I showed a selection of these drawings in a tiny exhibition titled May Peace Prevail on Earth.
Four weeks may not sound like a lot of time for a residency, and it isn't. With that in mind, Eisuke and I designed an incredibly packed itinerary, with most of our activities happening over the course of my last two weeks.
A part of me is still deciding on whether or not it was a good idea to plan as much as I did for my time in Kyoto. On one hand, I was only going to be there for a short amount of time, so I thought it would be best to pack in as many events and activities as possible. On the other hand, I didn't have as much time to wander and explore as I wanted to. No doubt I was able to really feel like I sunk into Kyoto, but it would have been nice to have had more idle time to let my mind drift.
I also, intentionally or not, decided not to do as much material preparation for my workshops before I arrived. A lot of this was circumstantial, as I was traveling from another conference/workshop and probably couldn't have really brought all the materials I needed in the first place. Either way, one of the challenges that I set up for myself was answering the following questions:
What would it be like to be an artist in Kyoto?
Would I be able to find the materials I need to produce the kinds of work that I want?
Would I feel happy, inspired and able to live out my fullest artistic life here in this city?
I can say with confidence that I was able to pull off most of what I sent out to do during my time there.
Looking back at my time at Social Kichen, I feel like I developed more confidence in my artistic practice. I more firmly know what I like to do, and, maybe more importantly, what I don't like to do. For example, I know now that I am less interested in making works that are meant to be consumed on a screen. Instead I want to make more works that get people to stand up, move around, and interact with each other and the sounds around them. My mobile listening kits were always an extension of this desire, and the Aeolian harp workshop, which was so delightful to me, seems to be a continuation of that trajectory.
Even more so, I feel like I'm moving a bit away from sound recordings in general and more into physical sound environments that can be manipulated and played with. I'll probably still be interesting in documenting my "sonic experiences", but as my interest in drawing makes apparent, how I chose to document these experiences will constantly change and evolve.
One thing I am excited to do is improve my workshops. Having done so many now, I feel confident in facilitating them. I am already looking to improve my workshops by creating pictorial instructional guides that can be understood and enjoyed by anyone regardless of language. It would be much more time efficient and helpful if I can provide participants an instructional guide that they can go off with and use, allowing me to occasionally hop in when they need specific assistance.
One thing I am curious about working on more are self-sustained sound installations that use solar power to power speakers for amplification and natural sources of energy (wind, water) for sound activation. I am working on two new works, one for this year's Megapolis Festival and another for a group show at Little Berlin (both in Philadelphia), that have me working through these ideas and with these materials.
During the Everything Without a “Real” is False: APA (aka Xuan Ye) Artists-in-Residence @ Being Generation & JØ (aka Johann Diedrick) @ Friends-at-Home Mirco-Residency (2015.5.30), the two impromptu music performances Der Stromausfall / Cloudbursting happened as a result of unexpected electrical outage and fortuitous power-back on Dundas West, Toronto. Der Stromausfall was improvised completely without wall power. In Cloudbursting, the instrument (eventually the performance) allows the performer to use a one second vocal recording as a sound source for a tactile interface. When touching two contacts, thereby closing a circuit, the vocal sample is played back in a fluttering staccato of pitches based around a pentatonic scale. This allows for a wildly expressive sonic palette that is driven through touch between oneself and others. When the circuit is used in reverse, a group can form a connection together that results in silence. As they start to let go of each other, melodic voices shower down.
Labocine is a new platform for films from the science new wave. In the iOS app, you can browse Labocine's monthly ISSUES for a special selection of exclusive science films every month and read about the scientists and filmmakers leading the science new wave in SPOTLIGHTS.
Format No. 1 is a novel optical sound experience that consists of an iPhone application and visual scores. For this project I developed an iOS application turns the iPhone into an optical sound device and visual scores / installations.
Format No. 2 is a novel optical sound experience that consists of an iPhone application and visual scores. For this project I developed an iOS application. The iPhone application uses computer vision algorithms, to recognize circles and plays a soundscape depending on the size and the location of the circles it sees.
The Good Vibrations Mobile Listening Device allows users to tap into of the least audible sounds of a city. With the use of a custom handmade contact microphone, the user can tune in to subtle acoustic vibrations in the environment and explore the city's cracks and surfaces. A field guide for urban listening directs users to acoustic 'points of interest.'
The mobile listening kits are custom-designed, featuring hand-made audio amplifier circuits inside orange fanny packs for hands-free usability.
Each mobile listening kit comes with a contact microphone and instructional guide.
Harvester is a hand-held, portable live sampler that lets you make music with everyday sounds. With the instrument, you are able to capture sounds around you (your voice, another musical instrument, environmental noises etc.). The instrument provides an interface that lets you play back the sampled sound based around a musical, pentatonic scale. This allows for a wildly expressive sonic palette that can be used for musical performance and sound art installations.
In this workshop we will build Aeolian Harps, legendary instruments played by the wind. We will bring their natural singing sound to the Kamo River and develop a consciousness around wind speed, path, and direction in the surrounding environment.
This workshop is for artists and researchers wanting to become more familiar with the potential of sound. Our focus will be on expanding our understanding of sound in an environment. This requires a basic vocabulary for talking about sound as a material, as well as the ability to make and use tools for investigating and manipulating sound for creative purposes.
Our frame of reference will be Good Vibrations, a mobile listening kit that allows “acoustic explorers” to find abandoned sounds in their environment. By using hand-made microphones and amplifiers, listeners can tune into the subtle vibrations that usually go unnoticed. The project encourages listeners to reimagine their sonic environment by playfully exploring the world through their ears.
The workshop will introduce participants to the world of sound art, while providing techniques for making tools for creating these experiences. This will include the fabrication of a hand-made microphone and amplifier for use in installations, performances, and scientific research. The goal of the workshop is to take these tools into the field and use them for artistic investigation and public engagement.
In this 2 hour workshop, we will take a handful of wires and resistors and create cacophonous bleeps and bloops. We will work through three basic circuits, making simple square wave oscillators, elevate to 8-bit sounding tones, and graduate to a simple sequencer that you can use for live performances, public installations and any other sound-related projects. This hands-on workshop will provide tips and tricks for making your own instruments or interactive noise machines, with time to experiment and customize as you learn about electronics and sound. Prepare to get noisy and loud.
You don’t need to have any electronics experience to participate! Materials and instructions will be included, and I guarantee that anyone will be able to walk out of this workshop with something pleasantly squealing and shrieking.
I recently came across an article written by Sakiko Sugawa about my sound art
workshop at Reverse Space, and its relation to my upcoming residency in at her
art and culture space, Social Kitchen. Japanese and English translations below!
Before at Hanare, I was contacted by sound artist Johann Diedrick, and he shared
with me his workshop. He is a sound artist/programmer, and his recent workshop
was around "personal sounds" — specifically, "small" sounds in one's daily life.
These sounds are collected with a kit he made, consisting of a microphone,
amplifier and earphones. Listeners bring the kids outside to find sounds that we
don't usually associate with every day life, as well as sounds tha are barely
audible. The theme of the worksho was around gathering and collecting these
sounds, and building the kits to do that. In the near future at Social Kitchen,
Johann will work together with a Japanese researcher/sound artist, beginning by
teaching workshops and studying these sounds. It is for these reasons that I
went to observe his workshop.
His workshop reminded me of Hanare member Kumi-san, who says: "Social Kitchen
wants to be a place for quiet sounds, weak sounds". Hanare was once featured
in a Taiwanese art magazine, and those words comes directly from her feature in
the magazine. Kumi-san is an experimental musician, and Social Kitchen is also a
place to meet to hear "quiet music", through our sponsorshop and hosting.
"Sounds without flashy harmony or popular music cliches...", these things are
good. Sounds like this are important, and having a space for them is equally
important. This is what I think Kumi-san really means. Since then, these words
are often on my mind..to make a place that preserves and showcases sounds like
For example, what is a weak sound? Disregarded, unimportant...sounds displaced
to the outer fringes...sounds that are not useful...sounds that are not
judged as being beautiful... Are quietness and weakness things we choose? When
collecting sounds, how do we make them more important? There are already sounds
with elevated volume that people are able to hear. In contrast, there are
smaller sounds that few people listen to carefully, along with few people
helping to hear and investigate these sounds. Potential access to these sounds
by making them audible through a device can offer a path to hear, explore and
investigate these sounds.