Flights of Fancy

Published on 05.26.2020 in [software]

For the past year or so, I’ve turned my ears to the underwater rumblings, industrial gnashings, and overhead zoomings that make up the sonic environment surrounding the Newtown Creek, a body of water that separates Brooklyn and Queens and is one of the most polluted Superfund sites in the country. The creek is infamously known for the Greenpoint Oil Spill, where somewhere around 17 to 30 million gallons of oil and petroleum products seeped into the creekbed over the span of decades, only to be discovered by a Coast Guard patrol helicopter in 1978. Since being designated a federal Superfund site in 2010, many environmental remediation projects in the area aim to clean the water and surrounding ecosystem. Still, the area remains environmentally compromised due to industrialization, continued oil pollution from nearby refineries, combined sewage overflow events that regularly dump human waste into the creek, and toxic runoff from cars and trucks that drive across the busy streets, bridges, and highways that pass over and through the area.

An ariel photogrpah of Newtown Creek cutting between Brooklyn and Queens
An ariel photogrpah of Newtown Creek cutting between Brooklyn and Queens
Another view of Newtown Creek with Manhattan in the background
Another view of Newtown Creek with Manhattan in the background

Despite this, the area piques my curiosity as both a harbinger of the climate crisis at our doorsteps and also as a potential stage for how we might learn to coexist with such a present-future. Sonically I’m drawn to the whooshing of cars that pass above on the Long Island Expressway, the stochastic bubbling of aeration systems meant to re-oxygenate the murky waters, and the resilient wildlife that still makes this once vibrant marshland home. If you’re lucky you may catch a glimpse of a stray egret searching for food among patches of sawgrass planted by ecological restoration projects. Crabs, jellyfish, and the occasional seal still swim below the creek’s still surface. Closer to the nearby Fedex distribution facility, cacophonous calls of birds suggest an area that isn’t so devoid of wildlife after all.

A photograph of the Fedex distribution facility
A photograph of the Fedex distribution facility

That is, until you realize that these bird calls are not made by living creatures. Rather, they’re a mix of artificial, pre-recorded birds that are meant to keep actual birds away from the inside of the distribution facility so they won’t nest or cause any unnecessary disruption to the continual churn of capitalism and industry. Having heard these calls, I was left wondering: What birds are these calls meant to be? (World-renowned birder Laura Erickson believes they might be American Robin or European Blackbird and some kestrels.) More interestingly, what birds were these calls meant to keep away? Are those “pest” birds even around anymore? If not, are these artificial bird calls singing out to phantom birds that no longer exist?

Curious about the poetic implications of these "Birds of FedEx,” and in keeping with my desire to learn more about audio machine learning during my time at the Recurse Center, I set out to make a bird sound classifier that could be used to identify birds in the Newtown Creek based on their calls. I had built an environmental sound classifier before, so the project was meant to get me more experience with audio scene classification. I was also interested in how identifying bird species could be useful for environmental remediation projects that require identifying and maintaining counts of animal species across a wide area—something that microphone arrays and sensors networks make possible. Finally, for my own creative and artistic investigations, I thought the idea of a bird sound classifier could help in my acoustic explorations of the Newtown Creek.

An photograph of a man holding a cellphone near the shoreline of Newtown Creek
An photograph of a man holding a cellphone near the shoreline of Newtown Creek

I started by finding a suitable bird sound dataset to train on. The largest and most comprehensive one I could find was the BirdCLEF dataset, which comprises over 36,000 recordings of bird calls across 1,500 species native to Central and South America. Based on the work done in the BirdCLEF baseline system, I was able to organize the recordings into folders named after each of the bird species to be later used for classification training.

A screenshot of a computer terminal printing out the names of various species of birds
A screenshot of a computer terminal printing out the names of various species of birds

The next big hurdle was finding the bird calls in the recordings. Each recording lasts a few seconds to a few minutes, with no documentation of where in the recording the bird calls are (this is known as a weak-label problem, common in audio event classification where the onset and offset times of the event in question are unknown). I wrote code to scan through each of the recordings and cut them into one-second segments. From there, I applied a short-time Fourier transform on each segment to generate a spectrogram. Then, by using a heuristic, I was able to determine whether or not a segment contains a “chrip” (which I assumed to be a viable bird call). From there, I saved each spectrogram in a folder named after the corresponding bird species. The result was thousands of bird call spectrograms, nested inside their respective bird species folder, which I used for training a neural network. This animated GIF shows spectrograms generated for the Golden-Capped Parakeet (Aratinga auricapillus).

An animated spectrogram of birdsong audio
An animated spectrogram of birdsong audio

After preprocessing all of this audio data, the only thing left to do was to train a neural network to classify the bird calls. I opted to use, a wonderful library that serves as a easy-to-use API on top of PyTorch, and does a lot of the boilerplate work for you in setting up and training a neural network. Since I was going to be training on spectrograms (images that represent the frequency content of a signal), I used a convolutional neural network pretrained on ImageNet. I used a technique known as transfer learning, which reuses most of a pre-trained network that’s already able to recognize visual patterns like curves and edges. The spectrogram training data was only used to “fine-tune” the last layer of the network. After many hours of training, the neural network was able to recognize bird calls from my validation set to an acceptable accuracy, validating this project and idea.

A screenshot of a Jupyter notebook showing a machine learning model in training
A screenshot of a Jupyter notebook showing a machine learning model in training
Another screenshot of a Jupyter notebook, this time showing the model classifying audio files into bird species.
Another screenshot of a Jupyter notebook, this time showing the model classifying audio files into bird species.

The work continues to this day, and I’m working to find more bird call field recordings specific to the Newtown Creek to use as training data for my network. I also collaborated with artist Kelly Heaton to use my classifier for categorizing bird-like sounds made by her electronic bird sculptures. I’m still out and about the Newtown Creek these days, so if you see someone, ears covered by headphones and microphone in hand, call out a hoot. I might turn and look in your direction, hoping to hear new and exciting things happening in the area.

Week 12: Offset

Published on 03.27.2020 in [rc]

This was my last week in batch at the Recurse Center, and what an experience this has been :). I'm so thankful for the community and what we were able to do these past two weeks in making remote RC work so well. I'm grateful for everyone at RC, including the faculity, everyone in batch, and the supportive alumni community. As we say here, we "never graduate", so this isn't the end of my time at RC, just the end of a long beginning on-boarding process into the RC community.

I spent most of my programming time this week working on my real-time audio classifier mobile app in Expo/React Native. I spent most of my time trying to understand the capabilities of Expo's expo-av library. Audio data isn't easily accessbile, so I had to figure out how to save an audio file, and load it back as binary data. I was able to get that far, and wanted to try drawing the audio signal to the screen. Unfortunately I had a lot of difficulty with this, due to the fact that most documentation I came across doesn't use stateless functional components, so I had trouble converting those examples to this more modern React Native paradigm.

Here are some drawing/2d canvas references I came across:

Canvas drawing in React Native:

I also did to some work getting my JS development environment set up (installing ctags /JS plugins for vim) so I could code a bit more comfortably.

Ctags / JS development:

This was also my last week watching video lecture's for MIT's 6.006 couse, Introduction to Algorithms! It was such a solid class, I highly recommend it. I'm looking forward to studying more during my time at Pioneer Works, now with this foundational material under my belt :)

Alas, this is my last write up for my time at RC. Its been real! I'm looking forward to staying involved with the RC community. Never graduate!

Week 11: Buffer

Published on 03.20.2020 in [rc]

This was our first week of Remote RC. I spent most of the week adjusting to this new normal, trying to create some positive habits while working from home (still getting up at 7am and going for a bike ride, stopping work around 7pm). There were some highs and lows this week - I think everything is day-to-day right now. All said, I did accomplish a few good things this week.

I made some progress on Whisp v2, and I'm at a point now where I can finally get audio data out of Expo's AV library (albeit kind of crudely). Hopefully by the end of next week I can show of something intesting like doing a STFT with that data. Follow along on the Whisp app repo.

Here are some links on some on-going research on getting audio data in Expo/React Native:

Research on Whisp V2:
Building an instrument tuner

Base 64 to Binary

base64 string to ArrayBuffer

audio to base64

Read binary file as base64



Linear PCM Format Setting

expo-av Audio class

expo-av FileSystem class

Core Audio Format


JS ArrayBuffer class

Otherwise this week I started up fastai's 4th version of Deep Learning for Coders, while working on finishing up part 2 of their previous version. I'm kind of behind but trying to keep up!

This week I also watched more of MIT's 6.006 - mostly on Dynamic Programming this week. A lot of it I think will only make sense with practice, so I'm excited to start doing more dynamic programming problems and go back to these videos as reference in the future.

Next week is my last week at RC :( I'm hoping to just keep coding and working on my projects up until the last day. Maybe I'll try streaming in Twitch? Heh. We'll see. By next week I hope to 1) Finish 6.006 lecture! 2) Be caught up on Fastai lessons 3) Have STFT working in JS

Week 10: Parabolic

Published on 03.13.2020 in [rc]

Whew. Quite a week. With all the news and everything going on with the global pandemic, its been a bit of an unproductive, chaotic week. I do want to just write at least something about what I did, just to keep these posts going...

This week I have been extending my work on Flights of Fancy in collaboration with Kelly Heaton to take a recording of her electronic bird sculpture and run it through my bird sound classifier to see what bird specieies it predicts.

The work is now up online here: Deep Fake Birdsong

We started by taking a recording of Kelly's electronic bird sculptures:

Then I used my Flights of Fancy software to extract spetrograms from that recording, and ran them against my classifier to see what birds it predicts these sounds came from. Here are some results:

We also produced a bar chart showing the amount of of times a speicies was detected over the 122 segments.

We submitted thr project to Ars Electronica, Hackaday, and Project 61. We'll see how this project develops!

Otherwise, I didn't get much else done programming wise. I did do my first competitive coding contest though! I got one right 8-)

I also started two new study groups at Recurse Center - one on Computational Audio and one on Machine Learning.

For the Computational Audio Study Group, one RC member showed off how to build an in-browser turner, and another showed off this neat in-browesr DSP language called SOUL.

For the Machine Learning Study Group, we are all collectively reading a paper on Independent Component Analysis, and I'm individually going to read about Autopool.

With that, for next week I plan on continuing on with 6.006, fastai lessons, implmenting STFT in JS/React Native for the Computational Audio Study Group, and doing some more audio ML/reading of papers for the Machine Learning Study Group.

Week 9: Unsounding

Published on 03.06.2020 in [rc]

This week started out with a continuation of last week's Un/Sounding the Relational City conference. On Monday I got to see a wonderful talk put on by Cathy Van Eck, and then in the evening there was a performance by Jenn Grossman, Viola Yip, Cathy Van Eck, and Keiko Uenishi.

Back at RC, I started working back on my real-time audio classifier mobile app. One of the things I need to work on is generating spectrograms in my app. I opted to use React Native in order to have it work cross-platform, so for now I'm working on implementing some essential DSP algorithms in Javascript (which is not really a programming language I use often or feel very proficient in). I'm hoping just by doing this project I'll get a lot better at DSP, Javascript, Reactive Native, and eventually doing real-time audio ML on embedded/mobile devices. Quite a tall order / high mountain to climb, but I know its going to feel great once I get there :)

For now I'm researching how to perform the FFT in Javascript. Here are some of my current research notes:

DFT in Python from the ASPMA course

numpy's FFT module

FFT as implemented in librosa (which essentially uses numpy's fft module)


Nayuki's post on how to implement the DFT

Nayuki's post on how to implement the FFT

3brown1blue's visual introduction to the Fourier transform

Intuitive Understanding of the Fourier Transform and FFTs

I didn't get very far with it this week, but having done some initial research I feel good about moving forward with it more next week (especially now with more free time now that ASPMA is done!)

On the audio ML front, I've gotten back in touch with the fastai audio library team and I'm looking forward to contributing to something starting next week. For now, I put my bird sound classifier up on Github, to share with others: Flights of Fancy

As always, I kept up with 6.006 again this week as well as the fastai pt 2 video lectures. Looking forward to continuing that work next week!

Week 8: Inflection Point

Published on 02.28.2020 in [rc]

This week felt like more of a continuation of last week. Instead of much coding, I was coming down from my talk at the Experiential Music Hackathon, and spent most of my time/mental energy on my talk for Localhost and my performance at the Un/Sounding the Relational City conference - both of which went really well!

Animated GIF of spectrograms generated from the calls of the golden-capped parakeet
Animated GIF of spectrograms generated from the calls of the golden-capped parakeet

Batch of spectrograms generated from calls of the Golden-capped Parakeet, found in Brazil and Paraguay and currently threatened by habitat loss

This week felt like the first time in a long time where I could take a breath and try to sit back without heads-down working. I'm going this signals a move from me being aggressively inward facing to a space where I'm a bit more loose and relaxed while at RC, open to new things and finding time to work in a more loose, less structured way. We shall see...

This week I finished the Audio Signal Processing for Music Applications course on Cousera, which was pretty amazing. I feel like I got exposed to a lot of fundamental audio signal processing concepts, as well as had the opportunity to practice them programatically. Moving forward with my real-time audio classifier, I'll definitely need to implment short-time Fourier transformations and log-mel-spectrograms, so I feel like with this course, I now have the tools to not only implement these algorithms but to deeply understand them from a theoretical point of view, as well as how and why they are used.


It also turns out that the MTG group worked on Vocaloid!

In 6.006 news, we learned about shortest path algorithms this week - from a more general point of view and with Dijkstra's algorithm.


What I thought was a beautiful diagram about graphs


Using starcraft early game / rushing build order as an example shortest path problem

My plan for the following week is to bring my real-time audio classifier app back to the fore front and use all the programming I learned in ASPMA to help implement any of the audio signal processing I come across to finish the project.

Week 7: Cycle

Published on 02.21.2020 in [rc]

This week saw a lot less coding and a lot more synthesizing. I spent most of my week preparing two talks that I will be giving.

The first is at the Experimental Music Hackathon.

Music Hackathon Flyer
Music Hackathon Flyer

At this event, I'm going to be talking about how attuning our hearing to environmental sounds can inspire new ways of music making.

The second talk is at Localhost, a series of monthly technical talks in NYC, open to the public, and given by members of the Recurse Center community. I will talk about using fastai’s new audio library to train a neural network to recognize bird sounds around the Newtown Creek, an ecologically compromised estuary between Brooklyn and Queens.

I was able to clean my notebook up to show off how training works, from getting the dataset to performing inference! It should end up being a pretty great presentation :)

Flights of Fancy
Flights of Fancy

I'm also preparing for a performance at the Un/Sounding the Relational City conference at NYU, where I will be performing Cerulean Waters with Ethan Edwards.

Unsounding the Relational City Poster
Unsounding the Relational City Poster

Here is a video of us performing a version of it earlier last year at H0L0:

Because of all of that, I haven't done much coding this week. I did do a fair amount of coding to clean up my bird classifier notebook, which I'm now calling Flights of Fancy (which is also the name of my Localhost talk :D).

Otherwise, I kept up with my video lectures, watching week 9 of the fastai's deep learning course, week 9 of ASPMA, and Lecture 13 and 14 from MIT's 6.006 course.


Sound Collections
Sound Collections

Sound Similarity
Sound Similarity

Sound Clustering
Sound Clustering

The first half of next week should have me in the same headspace, and then starting on Tuesday I'll be back to coding: working on the fastai audio library, training more model examples, and working on my real-time audio classifer mobile app.

Hope to see you out at some of my upcoming events! ✌ 🏿

Week 6: Midpoint

Published on 02.14.2020 in [rc]

Today is the end of my sixth week at Recurse Center, and the halfway point of my 12 week batch. Its 9am, and I'm sitting here by myself on the 5th floor, feeling exhausted but in a good way, tired yet full of energy, unsure about how I feel about the week but knowing that I've accomplished a lot and still have an equivalent amount of time left to push myself to do more.

This week I finished week 8 of ASPMA, which was really interesting and was actally the kind of material I was hoping to learn. We learned about sound transformations, and how we could take a lot of the fundamental ideas and models from previous weeks (short-time Fourier transformations and the harmomic plus residual / stochastic model) in order to make some really compelling transforms. Here are some screenshots of some of them.

I'm looking forward to taking some of these techniques with me to Pioneer Works in and seeing how I can incorporate them when generating new soundscapes.

I also finished Part 1 of the fastai Deep Learning course for a second time! It was great getting a second pass at the material, as it really does require multiple viewings because of the density. Something I noted to look into is how to use (W)GANs to create new soundscapes from (environmental/industrial) noise. I think there is a lot of rich material here, especially in thinking about the poetics of taking enviromnetal "noise" and turning it into a more "desirable" state. I'm really excited to now know about some approaches that I can play with at Pioneer Works!

One of my goals for the week was to build out a prototype of a mobile app that shows your camera view, and also lets you record and playback audio. I'm happy that I was able to achieve that goal this week! Using Expo, I was able to get the camera view up and running in no time.

With the help of another Recurser, we were able to take Expo's audio recording example and refactor it to work in my current app.

I'm not sure specifically where to go next with this (maybe making spectrograms on the phone?), but I feel like this was a great first step and gives me confidence in moving forward with this project.

I hit a plateau with my bird sound classifier this week and kind of stalled out on it... I spent most of the week training, and tried running a 20+ hour training job overnight that ended up not completing. Lesson learned: If you can (and I can!) run something in a smaller, incremental number epoch cycles, do that! My sneaking suspicion, after talking with another Recurser, is that the Python process managing my training was doing a poor job managing memory, causing the RAM on Mercer (one of our GPU-enabled cluster computers) to slowly fill overtime and not get released, which in turn caused a swapdisk process on the machine to constantly go back in forth between trying to retrieve memory to and from the hard disk. I'm running my training again with a smaller amount of epochs (10 instead of 30), which I think is much safer and will always be done if I run it over night. I almost went home in defeat but took some time to do some non-coding things and felt better in the end :)

Me going down to visit Mercer to say "You're doing a great job...keep going!!"

So close and yet so far...

I think I hit a point with this project where I should try reaching out for advice on how to move forward, so I'm going to be posting my notebook to the fastai forums to see what others think. Next week I need to get my head out of the weeds, step back and tie it up at this point, in order to have a nice completed version of this project to share for my upcoming Localhost presentation.

I think my goals for next week will be tidying up my bird classifier project and demonstrating it doing inference on recordings from the dataset, and then from recordings of birds found in the Newtown Creek (ideally with my own recordings). All of this should be in the service of preparing for my Localhost talk. I think with that done, I'll be in a better place to try training models on outher audio datasets. I'd ideally like to also find time to pair with others on my mobile app, which is already at a good place. As always, I'll be continuing ASPMA, 6.006 lectures, and now Part 2 of the fastai Deep Learning lectures!

Week 5: A Local Maxima

Published on 02.07.2020 in [rc]

This week felt a bit chaotic, but maybe the good kind? I feel like I had a few small victories, and reached a new plateau from which I can start to look outward and see what I want to accomplish next. A local maxima, if you will.

Over the weekend I attended NEMISIG, a regional conference for music informations/audio ML at Smith College. I feel like I got a lot of good information and contacts through it, and it was a really valuable experience that I'm still unpacking.

My kind of humor, only to be found in a liberal Northeastern small college town

Poster session for the conference

Poster on Few-shot Sound Event Detection

Poster on wave2shave - drum physical modeling using scatter transformations

Vincent Lostanlen giving us a whirlwind crash course into scatter transformations and wavelets

One of my RC conpatriots put me on to Olivier Messiaen's Catalogue d'oiseaux, relevant to my bird sound research

This week I finally got to training my bird sounds! After spending last week creating my spectrograms, I was able to move everything over from our cluster machine with the largest amount of space (broome) to a GPU-enabled machine for training (mercer). Afterwards, I looked at some of the new fastai tutorial notebooks to put together the training pipeline necessary to train with my spectrograms.

As of writing, I was able to train my model down to a <30% error rate, which is really greatcompared to the literature I read before, which was much higher (closer to 50%).

I still don't understand some of the metrics involved in some of evaluations in papers, so I'm going to dedicate sometime to understand them better in order to better understand my own training metrics. For example, the paper written about this dataset, Recognizing Birds from Sound - The 2018 BirdCLEF Baseline System, says thei "best single model achieves a multi-label MLRAP of0.535 on our full local validation set including backgroundspecies after 70 epochs". I'm not really sure how to calculate that and how that even relates to my single-label classification method, so its definitely something to dig into.

I am using transfer learning to train my dataset with a ResNet34 architecture trained on mageNet, which is definitely why I'm getting such good results. After doing some more testing, I should retrain the whole model a bit by unfreezing it, and then train it specifically on recordings of birds from the Newtown Creek. Only then will I have a classifier that will work on those specific species of birds.

Starting next week I want to train another neural network based on the Freesound General-Purpose Audio Tagging Challenge on Kaggle, which uses the FSDKaggle2018 dataset found on Zenodo. In doing all of this, I think its going to be important to figure out a good way to pick out relevant parts of audio signals for training. This goes back to the "eventness" paper I was talking about in my last post, and as I see that weakly labeled data is a perennial problem in audio classification taks, it might end up being an area that I can focus on and try to offer some novel solutions.

All of this work is helping lead me to making my real-time audio classifier mobile app, which I started whiteboarding this week.

Whiteboarding a real-time audio classifier

Next week I want to do some preliminary research and maybe just get something deployed on my phone that shows the camera feed, with it maybe recording and playing back sound just to make sure that works. That would be a really good first step! I want to reach out to MARL at NYU because I know I saw a real-time classification demo they made with their SONYC project. It would be nice to get some insights from them on how to tackle this problem, and what challenges I might face along the way.

I also finished up Week 7 of ASPMA, where we looked at different models for analyzing and reconstructing residual parts of a signal not captured by the sinusoid/harmonic model, speficially with a stochastic model. It was pretty interesting and it has been nice seeing how all of these transformations and models are coming together to allow us to do some pretty sophistaced stuff.

Harmonic plus residual analysis

Harmonic plus stochastic analysis

Doing some short-time Fourier transform analysis on a vibraphone sample

I had some breakthroughs with algorithm questons, specially around binary trees. For the Algorithm Study Group, I presented a way to solve the question of finding the maximum depth of a binary tree in a way that could be used as a template for solving other binary tree problems. It felt nice to feel like I was making some progress around the topic!

My 9am morning routine, watching MIT 6.006 lectures

Me whiteboarding out a solution to finding the maximum depth of a binary tree

Me presenting my solution to the Algorithms Study Group

Week 4: Chirps

Published on 01.31.2020 in [rc]

This week at RC I focused on preparing by bird sound dataset for traning next week. I decided to go with the LifeCLEF 2017 dataset, which "contains 36,496 recordings covering 1500 species of central and south America (the largest bioacoustic dataset in the literature)". Much of my week was spend reimplementing a spectrogram generation pipeline from the BirdCLEF baseline system, which used this same dataset.

The pipeline goes through all the 1500 classes of bird species, and for each recording, the piplelne creates one second spectrograms across the entire recording (with a 0.25 second overlap between each generated spectrogram). From these spectrograms, a signal-to-noise ratio is produced to determine whether or not the spectrogram contains a meaningful signal that we use to determine if it contains a bird vocalization of that species.

If the signal-to-noise ratio is above a certain threshold, we save that spectrogram in a folder for that bird species. If not, we save that noise-y spectrogram to be usedlater for generalizing during training.

This whole process takes an estimated six hours to run, and it results in about 50,000+ spectrograms across the 1500 classes of bird species.

My intuitive feeling about this process is that it is a bit heavy-handed, suseptible to inaccuracy, and not very efficient. However, I can understand the approach and ultimately it does get the job done. I do think this paper on Eventness (a concept for audio event detection that used the idea of "objectness" from computer vision and applies that to detecting audio events in spectrograms) proposes a more nuanced way to pull out meaningful sonic "events" in a recording. It might be something worth incorporating in another pass on this system.

I'm happy I achieved my goal for the week of generating the spectrograms from the recordings! I have been full of doubts though of how this fits into my larger goals. I think this week I fofocused on the "trees" and not the "forest", so to speak, and maybe what I'm feeling is getting a bit lost in the forest. With most of the data processing out of the way, I'm xcited to pull out a bit and think more about the context of what I'm working on and how it fits into my overall goals.

For instance, generating all of these spectrograms with this pipeline has moved me way from contributing to the fastai audio library. If I had wanted to keep down that path, I would have had to really work to not cut up the spectrograms in the way that I did, and instead come up with a way to generate the onset/offset times of the bird vocalizations from each recording and do the on-the-fly spectrogram generation with the built-in fasai audio library tools. I think that doing the actual learning task is what I want to be focusing on though, so maybe that makes it ok that I reimplemented another way of doing it, because it serves my end goal of diving deeper into the learning part of classification (it is something I would love to go back and dig deeper into though).

With that observation in mind, I think that focusing on training a neural network on these bird vocalization spectrograms next week will get me back into contributing to the library and focusing on the things I want to learn. I don't think this particular bird classification project will lead me to fixing the batch display issue, for example. I think that's okay, and maybe it will just be something I circle back to later on when trying to do other classification tasks with the other datasets I'm interested in.

I think going back to my larger goals at RC, I want to create a real-time sound classification application that can be used to classify different kinds of sounds. I've made a model for environmental sounds, and now I'm tackling bird vocalizations. I want to look at other environmental sound datasets, and if I have the time, I want to do speech as well. I think having this app as a "wrapper" application that lets you do, in general, real-time classification with any model you import will give me the room to d training on many different datasets, giving me more opportunities to dig into fastai v2, the audio library, convolutional neural networks, real-time on-device machine learning, short-time Fourier transforms, and classification in general.

So for now, my goals are to finish up the bird classification system, get it on device, and then make more models to get better at using deep learning for sound classification and understanding what it takse to do real-time on device machine learning.

Week 3: Two Sides of the Same Coin

Published on 01.24.2020 in [rc]

Hello again. Its been really nice taking the time on Fridays to try to write and collect my thoughts on what I've been doing over the course of the week, looking back on the previous weeks, and looking forward into the future.

At the end of last week, I spent some time trying to diagram, to my understanding, the world of audio ML. Here's a photo of what I whiteboarded:

What I started to realize is that the field of audio ML has two distinct "sides": analysis and synthesis. This shouldn't be too surpiring to me. In my Audio Signal Processing class, we're always talking about analysis first, and then synthesis (which is usually the inverse of the analysis process).

This lead me to thinking that what how I should spend my time at RC. Maybe I should spend my first six weeks deeply working on analysis, which in this case would be classification tasks. Then, I could spend my last six weeks looking at synthesis, which would be the task of genrating sounds. This was what I was thinking about doing before coming to RC, and this approach seemed like a good way to see the entire field of audio ML.

I was worried though. Would I actually come out with something subtantial if I split my time like this? My larger goal at RC was to become a dramatically better programmer, and I began to think that maybe becoming more of an expert at one side of the coin would actually prove to be a better use of my time here.

The answer I came to was also driven by the fact that I would be a resident at Pioneer Works right after my time at RC, so I would be spending an addtional 12 weeks somewhere else where I could be more "creative" and "artistic" in my approach. So, that lead me to decide (for now) that I would dedicate the rest of my time at RC fully and deeply understanding the analysis side of audio ML, and build out a tool kit for real-time audio classification that I could use in the field. Yay!

With that in mind, I'm continuing my investigation into bird sound classification, with the intention of making a real-time audio classification app that lets you identify birds in the field, along with other environmental sounds (which would end up being an upgrade to my Whisp app), and speech as well. I don't necessarly have to get to all of these things at RC, but I can build out the scaffolding/framework to do this, and use bird sounds as my first, deeply investigated dataset. I think I will also have the time to fold in environmental sounds as well, as its something I've done before, and maybe even sneak in speech as a stretch goal.

All of this is being facilitated with my involvement with fastai's audio library. I'm proud to say that my first pull request was merged into the library this week! This makes for my first open source contribution :D

I had a lot of great conversations with people in the audio ML space this week, including Yotam Mann and Vincent Lostanlen. Both have been super supportive in my work and have made themselves available to help out where they can. In particular, Vincent pointed me to a lot of great research around bird sound classification, including BirdNet. I wasn't able to find their dataset, but it lead me to the BirdCLEF dataset. Vincent said it was weakly labeled, with no onset/offset times for the bird sounds, so it might require a lot of work to get going. We shall see!

Otherwise, this week was also good for my Audio Signal Processing class. We learned about the Sinusoid model and how to do things like spectral peak finding, sine tracking, and additive synthesis.

In algorithms, a lot of time this week was spent on trees, including binary trees, binary search trees, and self-balancing trees like AVL trees. In the Algorithms Study Group we also spend a lot of time looking at Floyd's cycle finding algorithm, quicksort, and graph traversal algorithms like depth-first search, breadth-first search, and Dijkstra's shortest path finding algorithm.

Week 2: Noise to Signal

Published on 01.17.2020 in [rc]

Hello! This week I really felt like I made a lot of progress towards my goals. A lot of things came together in a really great way, and I can start to see how my overall approach to RC and what I'm studying is informing each other and interleaving in ways that I wanted it to.

This week I started by getting a lot of video lectures out of the way on Sunday, including the week 4 of fastai's deep learning course and ASPMA. That really set me up well to focus on programming for most of the week, instead of burning most of my time with lectures and fueling my anxiety that I'm not programming/making enough.

I also decided this week to try not to context switch as much - for now, I'm trying to still spend the mornings working on algorithms, but now I'll alternate days where I focus one day on ASPMA/audio signal processing and the other day on audio ML/fastai. I think it worked out really well this week, and made me feel less anxious to rush through something so I could switch to another related but contextually different tasks. So for this week I did ASPMA work on Monday and Wednesday, and audo ML work on Tuesday and Thursday. I found it successful, so I'm going to try it again next week!

This week I feel like I made a lot of progress in the audio ML front, combining some of the stuff I've been learning about in the ASPMA course into the work I've been doing with fastai's new audio library. In the ASPMA course, we learned about short-time Fourier transform and how its used to generate spectrograms. I was able to use some of that knowledge to try to make a real-time spectrogram generator from the microphone. It didn't turn out super well, and its something I want to master, so I think I'll take another crack at it next week.

Earlier in the week, I met with Marko Stamenovic, an RC alum who works professionally on audio ML at Bose. We had an amazing conversation about audio ML, some of the current topics in the field, areas to check out related to my interests, and what it would be like to work professionally in that field.

We talked about a lot of topics that I need to go back and check out, including:

For audio generation, Marko pointed me to:
- WaveNet (DeepMind)
- FFT Net (Adobe, Justin Saloman)
- LPCNet (Mozilla)

He suggested first trying to genrate sine waves, then speech, then field recordings with these architectures.

Marko also told me to really focus on the STFT as its a fundamental algorithm in audio ML. He also mentioned that being able to do deployed real-time audo ML on the phone is very in-demand so that might be something I try to refocus on while at RC.

This week I was also able to finish my PR on fastai's audio library. The task at hand was creating a test to make sure spectrograms generated with the library always returned right-side up. I was able to use some of the skills I learned in the ASPMA class, specifically around generating an audio signal, in order to create a test case to create a simple 5hz signal, generate a spectrogram from that, and test to make sure the highest energy bin in that specgrogram was at the bottom. This was such a great moment where everything felt like it came together, and I only imagine that this will happen more and more :)

Finally, I did more MIT 6.006 lectures on algorithms. This week was sorting, including insertion sort, merge sort and heap sort. I particularly love heap sort! I also gave a small presentation on merge sort at RC as part of our Algorithms Study Group, which forced me to really dig into merge sort and understand how it works, including writing out its recursion tree. I love forcing myself into situations that make it guarenteed that I'll have to really focus and deeply understand something so that I can present it to others. I hope to do it more in the future.

For now, I think everything is moving well. I do want to realign what I'm working towards, and try to keep the bigger goals in mind of making something that generates sound. I do think though that the listening part of this is just as important, so I want to think about how to combine the two, because I do think they are both two sides of the same coin. I'll spend a bit more time thinking about that today and I'll hopefully have some idea forward before setting my goals for next week.

Week 1: Hello, RC!

Published on 01.10.2020 in [rc]

Hello! If you are reading this, welcome! This is my attempt to be a better (technical) writer, starting with writing about my programming life at the Recurse Center. For more about me, please visit my personal website. For a quick intro, I make installations, performances, and sculptures that let you explore the world through your ears. I surface the vibratory histories of past interactions inscribed in material and embedded in space, peeling back sonic layers to reveal hidden memories and untold stories. I share my tools and techniques with others through listening tours, workshops, and open source hardware/software. During my time at RC, I want to dive deep into the world of machine listening, computational audio, and programamtic sound. To do that, I'm splitting my time, 2/3s of which will be spent on audio ML and audio signal processing. The other 1/3 of my time will be spent on getting a better foundation on computer science, algorithms, and data structures. In the following post, I'll write about my experience with those areas, and pepper in some observations along the way that I've had since being here!

On the audio ML side of things, this week I dove into fastai's new version 2 of their library, specifically so I could start working on their new audio extension! I'm really excited to contirbute to this extension, as this will be the first time I've really contributed to open source. The current team seems incredibly nice and smart, so I'm really looking forward to working with them. The first thing I did was get version 2 of fastai and fastcore setup on my Paperspace machine, but then I realized that I could/should get it set up on RC's Heap cluster! This took a bit to get working, but it was pretty smooth to get everything setup, so now I feel ready to start working with it. My first project idea was to build a bird classifier, using examples of birds found around the Newtown Creek. I was able to put together a test dataset from recordings I downloaded on I did want to start training this week, but I think that's going to have to happen next week. This week I also finished up to week 3 of the fastai DL lectures, so that was good progress. Next week I'll tackle week 4 and use the rest of the week to actually code something.

On the audio signal processing side of things, I was able to finish week 3 of the Audio Signal Processing for Musical Applications course on Coursera, which I've really been enjoying. Week 1 and 2's homework assignments were pretty easy and straightforward, but this week's homework assignment was way more difficult! I didn't expect it to take as much time as it did, and I did have to cut some corners at the end and look at someone else's example to finish it. It wasn't the most ideal situation, and I now know going into next week to anticipate needing to spend more time with the assignments.

Finally, on the algos side, I finished Lectures 1 and 2 of Introduction to Algorithms 6.006 from MIT Open Courseware. I tried a couple of LeetCode questions related to those lectures as well. I need to find a way to make sure I actually code things related to that course, instead of just simply watching the videos. My approach has been 1) Watch a video 2) Do a couple of problems related to that, all before lunch. I think if I can get into a good flow for this, I'll be doing just fine.

Over the course of my first week, I've already had my ups and downs. One thing has been being overambitious in what I can get done in a day. I'm ready spending 9am-7pm at RC, and I still have the feeling that I can't get everything done. I'm going to have to be ok with not getting everything done that I've set out to do each day.

I had a nice check-in with one of the faculty members about algo studying and project management. Two takeaways were: 1) Don't spend all your time at RC griding on algorithm studying/cramming videos. Do some, but don't spend the entire day doing it. And 2) Once you feel like you know enough of what you need to get started on a project, start! Let the project drive what you need to learn.

One of the things I think I should start doing is create a list of goals for the week on Sunday night, and then let that drive what I should be focusing on for the week, making sure I've planned out enough time and space during the week to realistically make those goals happen, knowing that I want to leave space for serendipity while at RC.

Going forward with RC, I made a list of projects I want to work on. I'm categorizing them as "Small/Known" (as in I already know how to do them or have an understanding of a clear path as to how to make them real, and "Big/Ambitious", as in I'm not exactly quite sure where to start and they will be take a longer time to do.

For now that list looks like:

Small / Known

Big / Ambitious

For next week, I want to:

Week 4 of fastai
Week 4 of ASPMA
Lecture 3 and 4 of MIT 6.006

Make bird classifier
Make Shepard tone sound generator
More LeetCode problems


Published on 05.29.2019 in [software]

Cat Spectrograms
Fireworks Spectrograms
Sea Waves Spectrograms
Sirens Spectrograms

Whisp - An Environmental Sound Classifier

Whisp is an environmental sound classifer that can be used to identify sounds around you. In its current form, Whisp will classify 5 second sounds with an 87.25% accuracy across a range of 50 categories, based on the ESC-50 dataset. You can also record sounds in the field to get a another perspective of what is happening in your sonic environment.

You can try the app here! It works on desktop Firefox and mobile only Safari on iOS (Chrome has some issues that don't let using the microphone for recording work right now, sorry!).

Trying the "Record your sound" feature on your computer might not get very satisfying results because, well, most of us are on a computer in pretty sonically uninteresting places. Definitely give it a shot on your mobile device when you're out and about, surrounded by more interesting environmental sounds :)


As someone who has spent a lot of time recording and listening to sounds, the idea of a generalized sound classifier has always been a dream of mine to build.

I'm interested in creating technologies that change our relationship to the sounds in our environment. Or another way, I like creating sound technologies that change our relationship to our environment and the world at large.

I'm finding my interests moving more towards research in audio event recognition, so Whips is a first attempt to dive into that world.

Some applications that I've wanted to use one for include:

To those ends, I built a environmental sound classifier using the ESC-50 dataset and fastai library.

In this write up I will walk through the steps to create the classifier, as well as drop hints and insights along the way that I picked up from the fastai course on deep learning.

If you want to skip ahead, feel free to check out the Whisp repo on Github.


The data I'm using comes from the ESC-50 (Environmental Sound Classification) Dataset.

This dataset provides a labeled collection of 2000 environmental audio recordings. Each recording is 5 seconds long, and is organized into 50 categories, with 40 examples per category.

Before training the model, its useful to spend some time getting familiar with the data in order to see what we are working with.

In particular, we are going to train our model not with the audio files, but with images generated from the audio files. Specifically, we will be geneating spectrograms from the audio files and train them with a deep learning neural net that has been pre-trained on images.

For more information on how I generated the spectrograms from the audio files, check out my spectrogram generator notebook on how I did this.

One thing to note is that with spectrogram images, I was able to get better accuracy by creating square images rather than rectangles, so that the training would take into account the entire spectrogram rather than just parts.


To train the model, we are going to use a resnet34, use our learning rate finder, and train twice over 10 epochs.

From the fastai forms, I was able to get a general sense of when I'm overfitting or underfitting.

Training loss > valid loss = underfitting
Training loss < valid loss = overfitting
Training loss ~ valid loss = just about right

epoch train_loss valid_loss error_rate
1 1.063904 1.055990 0.325000
2 1.036396 2.332567 0.562500
3 1.049258 1.470638 0.387500
4 1.032500 1.107848 0.337500
5 0.924266 1.392631 0.417500
6 0.768478 0.623403 0.212500
7 0.596911 0.535597 0.165000
8 0.446205 0.462682 0.160000
9 0.325181 0.419656 0.135000
10 0.251277 0.402070 0.127500

Nice! That gets us an error rate of 0.127500, or 87.25%!.

There is a bit of overfitting going on (Jeremy Howard would think its ok), but still, really great results!

Here is our confusion matrix which looks pretty good.

Whisp Confusion Matrix
Whisp Confusion Matrix

Testing in the Field

I've been taking Whisp with me out on field recording expeditions around the Newtown Creek.

Dutch Kills

One night with Mitch Waxman, I took an early version of Whisp and made field recordings around the Dutch Kills area of the Newtown Creek, and down near Blissville. I extracted 3 sounds from the recordings that I knew would show up in the ESC-50 dataset categories.

Train Engine

Whisp classified this sound as a washing machine with 69% confidence, which... isn't exactly correct. But hey, a washing machine does sound a lot like an engine when its running right? I can understand the ambiguity. Whisp had 18% confidence that it was a helicopter, and 5% confidence that it was an engine (of some sorts).


Whisp classified this sound as a thunderstorm with 97% confidence, which are usually pretty windy! The next highest confidence score was wind, with 7% accuracy.

Train horn

Finally, Whisp classified this sound as a car horn with about 99% accuracy. Given that the dataset doesn't have "train horn" as a category, we can live with this being close enough ;)

Hunter's Point Park (Hunter's Point South Park Extension) - mouth of Newtown Creek

I recently took Whisp out into the field with Taiwanese sound artist Ping Sheng Wu to test Whisp in the field.

We saw a group of birds off into the distance.

Whisp was able to hear and classify their chirping!

We tried getting some water sounds, but most of it came back as wind, as that was the dominant sound out there. Sea waves did come back though, but with a low 3% confidence rating.

On our walk back to the train station, we found a fan and decided to try Whisp'ing it.

Whisp thought it was a vaccum cleaner, which, like the example above of the engine that sounded like a washing machine, isn't too far off. It also thought it could have been a washing machine and plain old wind.

Testing in the wild

Since releasing Whisp I've taken it out with me to try to classify sounds around me, which it does a really good job at!

Here are some examples of it classyfing sounds like:


Sea Waves

Ambulance Siren



Future Paths Forward

I'd like to train this model on Google's AudioSet data.

I'm also interested in Exploring more data augmentation methods as described in Salamon and Bello's paper on Environmental Sound Classification.

Some ideas that I'd love to explore are the idea of a "sound homonym". For example, there are a lot of sounds that sound similar to each other, and that the classifier gets wrong but is pretty close (washing machine vs. engine, for example). I wonder what it would look like to play around with sound homonyms for performance.

The other thing that I'm interested in is the "distance" between sounds. For example, the classifier gives you the "closest" prediction it thinks the sound is. You could imagine that the prediction that is the least close is the furthest away. It would be interesting to push this idea further and think about how different sounds are more or less distant from each other. What would it mean for a sound to be the opposite of another sound? Or the most different sound?


ESC-50 Dataset

ESC: Dataset for Environmental Sound Classification

Audio Classification using FastAI and On-the-Fly Frequency Transforms

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Environmental Sound Classification with Convolutional Neural Networks

Music Genre Classification

Published on 06.01.2017 in [residencies]

Because of the amount of audio files, this post is best experienced using Firefox!

Quiet Music, Weak Sounds


Early one summer morning in Kyoto, I took this photo on along the banks of the Kamo River.

On the left bank, a women greeted the morning with outstretched arms. Up above, three birds circled over the water. Kyoto, already a sleepy city, was still waking up. In its stillness, all I had was the quiet of the city's early dawn around me. From that vantage point, Kyoto's deep morning silence stretched far into the horizon, up the Kamo River valley, and into the mountains hidden behind the clouds.

Since that summer in 2012 I have been searching for quiet sounds all around me. Sometimes these are literally quiet sounds — inaudible because of their low volume compared to the bigger, louder sounds around them. But often times these quiet sounds are not so quiet at all, and instead are quiet because of our relationship to them. They are sounds that we don't pay attention to. They take up less space. They are often "overheard" (analogous to "overlooked") because they are not usually sounds we focus on. They are effectively inaudible, used here in a similar way one might use the term "invisible". They are muted and minuscule, diminutive and shrunken, minor and pathetic. They have no wants and cause no trouble. They are sounds pushed off to the side and forgotten, overshadowed by the larger, familiar and more heroic sounds in the environment that people instantly recognize, are drawn towards, and quickly reference when describing a place.

Some examples of quiet sounds that I've come across include:

The reverberations of street life transduced through a hollow pole

The piercing pitching of neon signs

The rhythmic knocking inside of cross-walk buttons

The brushing of ripples against a lake's shore

The soft patterings of light February snow

Most of my work for the past five years has been shifting people's attention to those sounds, in an attempt to broaden our understanding of the world around us. Through this deeper understanding, we can create new, original, and more personal relationships to our environment through the discovery of the delicate, poetic, and ephemeral sounds around us — if only we took the time to listen.

My time in Japan, especially as an InterLab Researcher at YCAM, taught me a lot about listening in new ways, and I knew I would one day return back to Kyoto to get to deeply know the city and the sounds within.

The following year, I contacted Tetsuya Umeda, a sound artist from the Kansai area, about the possibility of doing listening tours in Kyoto to explore its sounds, and he advised me to get in touch with Social Kitchen, a community arts center in the city.

It would take a couple of years before I could find a way to work with Social Kitchen, and in 2015 I was introduced to the Asian Cultural Council, who ended up supporting my time in Kyoto through a fellowship grant.

I got in touch with Social Kitchen again in 2016, and they introduced me to Eisuke Yanagisawa as someone with whom I should collaborate with during my time in Kyoto.

Five years after I took that photo on the Kamo River, I was able to return to Kyoto and embark on a four-week long residency to explore its sounds through a series of workshops and field work research expeditions, titled Quiet Music, Weak Sounds

Before I arrived, Social Kitchen, Eisuke and I came up with the following program:

Quiet Music, Weak Sounds is a collaboration between sound artists Johann Diedrick and Eisuke Yanagisawa to discover, amplify, and share the subtle sounds in Kyoto, Japan.

Over the course of four weeks, Diedrick and Yanagisawa will explore Kyoto’s soundscape with custom microphones, amplifiers and field recorders.

Informed by their findings, the two will host a series of workshops, teaching members of the community how to build and use their own sonic investigation tools.

They will turn participants into acoustic explorers and take them on explorations of Kyoto to find, record, edit, and present their own found sounds.

Afterward they will construct Aeolian harps with the participants and introduce the harp's sounds to Kyoto’s Kamo River path.

Finally, the two will present their findings to the community at large, in the form of a talk and reception party.

After finally arriving in Kyoto in April 2017, Eisuke and I met and began our collaboration together.

Mobile Listening Kit Workshop

The first event hosted at Social Kitchen was a Mobile Listening Kit workshop. The workshop introduced participants to the world of sound art and provided techniques for making tools to create these experiences. This included the fabrication of a mobile listening kit and a contact microphone for use in installations, performances, and scientific research.

The mobile listening kits are portable amplifiers that can be used to hear quiet sounds in your environment. They consist of an input for different kinds of microphones — in the workshop we built and used contact microphones. You can adjust the volume of the input sound with a volume knob, and hear vibrations on surfaces through headphones or speakers. The kits are used to focus in on sounds that normally can't be heard because of their volume, and are designed to be portable for everyday use and exploration.

Most of the participants had never built any kind of electronic device before, and the workshop involved a lot of soldering and hands-on fabrication. It was important for me to have people actually build these kits, instead of using pre-built ones, because I think it is important to teach people how to teach themselves. Only by learning how to teach myself was I able to do and make the things I can today. I think it is critically important for artists to learn how their tools work and function, so that they can modify them for their own creative purposes.

Field Recording Workshop

The next day, Eisuke and I hosted a field recording workshop. In this workshop we gave participants the opportunity to find and record sounds outside with their mobile listening kits, field recorders, and different kinds of microphones. We didn't give much instruction on what to listen for, except only to try to discover sounds in places where they least expected. In this way, the workshop encouraged listeners to reimagine their sonic environment by playfully exploring the world through their ears.

The workshop began with a quick lecture on how to use microphones and field recorders for recording sound.

Soon after we went to Goryou Shrine, a Shinto shrine just a short walk from Social Kitchen. We spent most of the afternoon at the shrine exploring its cracks, surfaces, and hidden spaces.

After spending two hours recording sounds, we came back and did a short lecture on editing field recordings. At the end of the workshop, participants presented their recordings to each other, which prompted lively questions and discussion.

Aeolian Harp Workshop

In the final workshop, we built Aeolian harps, a type of string instrument played by the wind.

Aeolian harps are objects of mystique because of the quality of the sound they produce and how that sound is made. They can range in look and form, but in general they look like simplified harps or guitars, with a hollow wooden body, usually with a sound hole, and a number of strings stretched across. Instead of plucking or bowing the harp, you can place it in the vicinity of a moderate, consistent gust of wind, and as the wind vibrates the strings, the harp produces a ghostly, haunting sound — seemingly out of thin air.

We were both really excited to host the workshop because we knew it would be a beautiful demonstration of how one can collaborate with the environment to produce sound, instead of treating sounds in the environment as a resource to be extracted as we had done in the two previous workshops. We knew that producing sounds from the harp would be difficult for a number of reasons, least of which would be that we didn't have any control of the wind on the day of the workshop. Conceptually this worked in our favor, because it meant that participants had to concentrate hard to produce and hear the sounds from the harp. They wouldn't be able to get the immediate satisfaction of making sounds like you would with an electric guitar, drum set, or computer. Instead, they had to be very patient and work with the environment to orient the harp in such a way that when a gust of wind blew their way, the harp would sound. Each sound was to be precious. The participants had to wait in anticipation, excitement, and yes, frustration, for each sound to come. Our hope was for them to ultimately develop a new kind of appreciation for weak, quiet sounds that can be just as fleeting as the wind.

Before starting this workshop, Eisuke and I traveled to Osaka to visit Kosuke Nakagawa, an expert at building string instruments including Aeolian harps. At his studio he showed us his instruments and walked us through how to build an Aeolian harp for our workshop.

Here is a video of our prototype in action:

Back at Social Kitchen, we built our Aeolian harps together.

When we were done, we brought their natural singing sound to the Kamo River. As expected, it was difficult to get the harps to sing. Walking around the river, we searched for the best place to find ideal wind conditions. Participants readjusted and realigned their harps in order to find the best position. In the process, they developed a consciousness around wind speed, path, and direction in the surrounding environment. And soon enough, the sounds came.

You can hear a sample of what the Aeolian harp sounded like here:

Field Work

During my last week in Kyoto, Eisuke and I were able to spend two days doing our own field work. We were both interested in quiet sounds, but from two different perspectives. I was interested in sounds that were quiet both in their actual volume and in their general level of recognition - sounds that lack audibility (analogous to visibility). Eisuke is interested both in sounds that reside outside of our human hearing range (mostly ultrasonic sounds), and sounds that also lack "hear"-ability because of how remote they are (he studies highland gong music from Vietnam). We picked two sites in the city noted for their quietness and sonic diversity.

The first place we went was already very familiar to me - the Kamo River. When we arrived, we found ropes installed over parts of the river that were designed to deter birds from eating fish that were swimming upstream to spawn during the spring season. The ropes would vibrate with the wind and cause a really deep frequency sound that we could record with our contact mics.

Here is what one of the ropes sounded like:

Eisuke made a similar recording as well:

I also recorded some sounds from the surface of the water with my mobile listening kit.

Eisuke also recorded sounds from underneath the Kamo River with his hydrophone.

We also found a nearby pipe that captured and reverberated the sounds of the river.

Eisuke was able to stick a mic in the pipe and record some of the sounds inside.

The next day we traveled to Katsura and Kamikatsura, located near the mountains northwest of central Kyoto. There we recorded the sounds of the Hankyu Line, the Katsura River and a nearby bamboo forest.

Against the fence you can hear the roaring and rumbling of passing bikes, cars and trains.

Closer to the mountains, we visited Jizo-in Temple.

In this temple, you could hear the sounds of birds in the bamboo...

and the sound of two flowers rubbing against bamboo while swaying in the wind.

Further up we recorded the sounds of a small falls near the Katsura river. I recorded some sounds with my mobile listening kit.

Eisuke recorded similar sounds from the same river with his parabolic microphone.

He also captured the sounds of the rustling bamboo...

...and these incredibly physical sounds of large bamboo shoots cracking and snapping.


In my final week in Kyoto, we hosted a reception at Social Kitchen to present our past work and our collaboration together.

At the end of the reception we did a live performance of our field recordings.

May Peace Prevail on Earth

Over the past few years, I have been documenting my explorations of weak sounds through short recordings with my mobile listening kit and photos taken with disposable cameras. That project, titled It Is Impossible To Know About Earth, So We Must Hear Her Voice In Our Own Way is still ongoing. During this residency, however, I decided to try documenting my recording situation with drawings as well. As part of the reception, I showed a selection of these drawings in a tiny exhibition titled May Peace Prevail on Earth.


Four weeks may not sound like a lot of time for a residency, and it isn't. With that in mind, Eisuke and I designed an incredibly packed itinerary, with most of our activities happening over the course of my last two weeks.

A part of me is still deciding on whether or not it was a good idea to plan as much as I did for my time in Kyoto. On one hand, I was only going to be there for a short amount of time, so I thought it would be best to pack in as many events and activities as possible. On the other hand, I didn't have as much time to wander and explore as I wanted to. No doubt I was able to really feel like I sunk into Kyoto, but it would have been nice to have had more idle time to let my mind drift.

I also, intentionally or not, decided not to do as much material preparation for my workshops before I arrived. A lot of this was circumstantial, as I was traveling from another conference/workshop and probably couldn't have really brought all the materials I needed in the first place. Either way, one of the challenges that I set up for myself was answering the following questions:

What would it be like to be an artist in Kyoto?

Would I be able to find the materials I need to produce the kinds of work that I want?

Would I feel happy, inspired and able to live out my fullest artistic life here in this city?

I can say with confidence that I was able to pull off most of what I sent out to do during my time there.

Looking back at my time at Social Kichen, I feel like I developed more confidence in my artistic practice. I more firmly know what I like to do, and, maybe more importantly, what I don't like to do. For example, I know now that I am less interested in making works that are meant to be consumed on a screen. Instead I want to make more works that get people to stand up, move around, and interact with each other and the sounds around them. My mobile listening kits were always an extension of this desire, and the Aeolian harp workshop, which was so delightful to me, seems to be a continuation of that trajectory.

Even more so, I feel like I'm moving a bit away from sound recordings in general and more into physical sound environments that can be manipulated and played with. I'll probably still be interesting in documenting my "sonic experiences", but as my interest in drawing makes apparent, how I chose to document these experiences will constantly change and evolve.

One thing I am excited to do is improve my workshops. Having done so many now, I feel confident in facilitating them. I am already looking to improve my workshops by creating pictorial instructional guides that can be understood and enjoyed by anyone regardless of language. It would be much more time efficient and helpful if I can provide participants an instructional guide that they can go off with and use, allowing me to occasionally hop in when they need specific assistance.

One thing I am curious about working on more are self-sustained sound installations that use solar power to power speakers for amplification and natural sources of energy (wind, water) for sound activation. I am working on two new works, one for this year's Megapolis Festival and another for a group show at Little Berlin (both in Philadelphia), that have me working through these ideas and with these materials.


I would like to thank the Asian Cultural Council for their financial and institutional support.

I would also like to thank Social Kitchen, especially Sakiko Sugawa, Makato Hamagami, Asuka Okajima, Kumi Wakao, Yuh Wakao, and Shingo Yamasaki.

Finally I would like to thank my collaborator Eisuke Yanagisawa.

Thank you Mehan Jayasuriya for helping to edit this post!


Wire Magazine

Kansai Art Beat



Social Kitchen

Der Stromausfall / Cloudbursting

Published on 05.30.2017 in [music]

During the Everything Without a “Real” is False: APA (aka Xuan Ye) Artists-in-Residence @ Being Generation & JØ (aka Johann Diedrick) @ Friends-at-Home Mirco-Residency (2015.5.30), the two impromptu music performances Der Stromausfall / Cloudbursting happened as a result of unexpected electrical outage and fortuitous power-back on Dundas West, Toronto. Der Stromausfall was improvised completely without wall power. In Cloudbursting, the instrument (eventually the performance) allows the performer to use a one second vocal recording as a sound source for a tactile interface. When touching two contacts, thereby closing a circuit, the vocal sample is played back in a fluttering staccato of pitches based around a pentatonic scale. This allows for a wildly expressive sonic palette that is driven through touch between oneself and others. When the circuit is used in reverse, a group can form a connection together that results in silence. As they start to let go of each other, melodic voices shower down.


Published on 05.30.2017 in [software]

Labocine is a new platform for films from the science new wave. In the iOS app, you can browse Labocine's monthly ISSUES for a special selection of exclusive science films every month and read about the scientists and filmmakers leading the science new wave in SPOTLIGHTS.

In App

In action

Format No. 1

Published on 05.29.2017 in [software]

Format No. 1 is a novel optical sound experience that consists of an iPhone application and visual scores. For this project I developed an iOS application turns the iPhone into an optical sound device and visual scores / installations.


Published on 05.29.2017 in [software]

Transmissions is an iOS application that allows an audience to participate in a musical group performance.

Format No. 2

Published on 05.29.2017 in [software]

Format No. 2 is a novel optical sound experience that consists of an iPhone application and visual scores. For this project I developed an iOS application. The iPhone application uses computer vision algorithms, to recognize circles and plays a soundscape depending on the size and the location of the circles it sees.


Published on 05.29.2017 in [software]

Transient allows you to record sounds and quickly upload them to a map so you can listen to the world around you!

Simply hold down the record button and let go. When you're done, go to the map view and see your sound where you recorded it! You can also listen to sounds other people have recorded.

Transients at Yami Ichi

Published on 12.01.2016 in [transients]

I just came across this article, sent to me from Miyu Hosoi.

Its from a year ago and its pretty short but I think its pretty cool! Link and translation below:

Link to article: インターネットヤミ市 in NY!フォトレポート! by tadahi

TRANSIENTS by Johann Diedrick
「Send audio messages around the world wide web in virtual bottles」

オーディオメッセージをバーチャルなボトルに詰めて、world wide webに投げ込む!

Audio messages are placed inside virual bottles and thrown into the world wide web!


I thought there was a lot of nuance in the work.


The length is determined by the different bottles which have a different prices.


After choosing your bottle, the sound is recorded and the user shakes their
phone, as if they are throwing it, and the sound is placed on a map.


Johann was also an intern at YCAM about two years ago. He is an artist and


Sound installations, which is artwork relating to sound, is the center of his

From Silos to Services

Published on 11.23.2016 in [met]

Link to my presentation at MCN 2016

Rocket Hanabi

Published on 11.13.2016 in [translations]

Rocket Hanabi by Tujiko Noriko


While I was lying on my bed
watching television,
a bomb suddenly came
and dropped down on top of me.
Ba-boom, Ba-boom, Ba-boom, Ba-boom
Such a radiant light...


From below
I start to rise, but
From the explosion
The heat starts to fill everything


From below I start to burn
Into the sky I ascend in a dance


I'll look down from below someday
The green land spreads out below
The green starts to shake and waver
The green land flutters and waves...


If I see someday that,
At a gathering on the fields
People are ascending on a ladder
Towards me, somewhere in the sky,


I'll gently wave my hand...
I'll gently wave my hand...


I am a rocket
Show me the fireworks


The tears stream down, and yet...


I am a rocket
I want to see fireworks


The tears start start to overflow, and yet...

My Favorite Sound is You @ The Galallery

Published on 07.16.2016 in [myfavoritesoundisyou]

More information at The Galallery

Good Vibrations Mobile Listening Kit

Published on 07.02.2016 in [objects]

The Good Vibrations Mobile Listening Device allows users to tap into of the least audible sounds of a city. With the use of a custom handmade contact microphone, the user can tune in to subtle acoustic vibrations in the environment and explore the city's cracks and surfaces. A field guide for urban listening directs users to acoustic 'points of interest.'

The mobile listening kits are custom-designed, featuring hand-made audio amplifier circuits inside orange fanny packs for hands-free usability.

Each mobile listening kit comes with a contact microphone and instructional guide.

Kits are available for purchase at ThinkingHz


Published on 07.02.2016 in [objects]

Harvester is a hand-held, portable live sampler that lets you make music with everyday sounds. With the instrument, you are able to capture sounds around you (your voice, another musical instrument, environmental noises etc.). The instrument provides an interface that lets you play back the sampled sound based around a musical, pentatonic scale. This allows for a wildly expressive sonic palette that can be used for musical performance and sound art installations.

Harvester on Github


Published on 07.02.2016 in [objects]

Sirens are inexpensive (~$3.75) sound installations. They are public,
site-specific, solar powered and easy to assemble.


Solar panel - $0.55

40106/74C14 IC - $0.50

1uf capacitor - $0.20

Photocell - $1.15

Speaker - $1.10

Magnet - $0.75

(not including spare wires, solder, soldering iron, and hot glue)

Inspired by Evan Roth's LED throwies, Laura Plana Gracia's solar sound boths
workshops, and Max Neuhaus.

Group of Sirens


Aeolian Harp Workshop

Published on 07.02.2016 in [workshops]

In this workshop we will build Aeolian Harps, legendary instruments played by the wind. We will bring their natural singing sound to the Kamo River and develop a consciousness around wind speed, path, and direction in the surrounding environment.

Naked Ear

Published on 07.02.2016 in [workshops]

This workshop is for artists and researchers wanting to become more familiar with the potential of sound. Our focus will be on expanding our understanding of sound in an environment. This requires a basic vocabulary for talking about sound as a material, as well as the ability to make and use tools for investigating and manipulating sound for creative purposes.

Our frame of reference will be Good Vibrations, a mobile listening kit that allows “acoustic explorers” to find abandoned sounds in their environment. By using hand-made microphones and amplifiers, listeners can tune into the subtle vibrations that usually go unnoticed. The project encourages listeners to reimagine their sonic environment by playfully exploring the world through their ears.

The workshop will introduce participants to the world of sound art, while providing techniques for making tools for creating these experiences. This will include the fabrication of a hand-made microphone and amplifier for use in installations, performances, and scientific research. The goal of the workshop is to take these tools into the field and use them for artistic investigation and public engagement.

Colorful Waves

Published on 07.02.2016 in [workshops]

In this 2 hour workshop, we will take a handful of wires and resistors and create cacophonous bleeps and bloops. We will work through three basic circuits, making simple square wave oscillators, elevate to 8-bit sounding tones, and graduate to a simple sequencer that you can use for live performances, public installations and any other sound-related projects. This hands-on workshop will provide tips and tricks for making your own instruments or interactive noise machines, with time to experiment and customize as you learn about electronics and sound. Prepare to get noisy and loud.

You don’t need to have any electronics experience to participate! Materials and instructions will be included, and I guarantee that anyone will be able to walk out of this workshop with something pleasantly squealing and shrieking.

Quiet Music and Weak Sound || 静かな音楽、弱い音

Published on 06.28.2016 in [myfavoritesoundisyou]

I recently came across an article written by Sakiko Sugawa about my sound art
workshop at Reverse Space, and its relation to my upcoming residency in at her
art and culture space, Social Kitchen. Japanese and English translations below!







Before at Hanare, I was contacted by sound artist Johann Diedrick, and he shared
with me his workshop. He is a sound artist/programmer, and his recent workshop
was around "personal sounds" — specifically, "small" sounds in one's daily life.
These sounds are collected with a kit he made, consisting of a microphone,
amplifier and earphones. Listeners bring the kids outside to find sounds that we
don't usually associate with every day life, as well as sounds tha are barely
audible. The theme of the worksho was around gathering and collecting these
sounds, and building the kits to do that. In the near future at Social Kitchen,
Johann will work together with a Japanese researcher/sound artist, beginning by
teaching workshops and studying these sounds. It is for these reasons that I
went to observe his workshop.

His workshop reminded me of Hanare member Kumi-san, who says: "Social Kitchen
wants to be a place for quiet sounds, weak sounds". Hanare was once featured
in a Taiwanese art magazine, and those words comes directly from her feature in
the magazine. Kumi-san is an experimental musician, and Social Kitchen is also a
place to meet to hear "quiet music", through our sponsorshop and hosting.
"Sounds without flashy harmony or popular music cliches...", these things are
good. Sounds like this are important, and having a space for them is equally
important. This is what I think Kumi-san really means. Since then, these words
are often on my make a place that preserves and showcases sounds like

For example, what is a weak sound? Disregarded, unimportant...sounds displaced
to the outer fringes...sounds that are not useful...sounds that are not
judged as being beautiful... Are quietness and weakness things we choose? When
collecting sounds, how do we make them more important? There are already sounds
with elevated volume that people are able to hear. In contrast, there are
smaller sounds that few people listen to carefully, along with few people
helping to hear and investigate these sounds. Potential access to these sounds
by making them audible through a device can offer a path to hear, explore and
investigate these sounds.

The original article can be found here.




見る はreg2

reg 2

Stem +られる



stem +eる







記事(きじ) article