Recurse Center - Batch 3 - Cycle 20250108-20250110 - Extension


Extension

I'm back at RC for a six-week half batch, continuing my work on building automatic speech recogniton (ASR) systems from scratch for science and fun :)

During my last 12-week batch, I created Zora, an interpretable machine listening library for voice and speech. I gave a presentation on it during the last week of batch that you can check out here. I demonstrated how you can use this library to build a speech digit recognizer i.e. recognizing someone saying the digits 0-9. My goal for this half batch is to build out a full-spectrum ASR system with my library that should be able to do robust speech-to-text recognition.

Towards those ends I've planning on implementing a transformer-based ASR system based on the Speech-Transformer paper. This cycle was spent mostly getting myself set up, as well as participating in first-week-of-batch activities, meeting all the new Recursers, and catching up with familiar faces.

Day 1

I spent most of my day reading the Speech Transformer paper (a more... :cough:...accessible version of the paper can be found here).

Day 2

I started setting up my environment for development, which meant setting up some scaffolding for testing with pytest. I also learned that conda has this weird way of listing dependencies in your requirements.txt file by appending an @ symbol and a local path to that library, which breaks running pip install -r requirements.txt. The way to get around this was by using pip list --format=freeze to generate requirements in the right format i.e. pip list --format=freeze > requirements.txt

I starting writing code to implement the self-attention mechanism, and afterward I had a nice coffee chat with a Recurser before attending first-week presentations

Day 3:

Today was spent attending the "Building Your Volitional Muscles" workshop. I actually feel really good about what I want to focus on, why I want to focus on it, and what that means for deprioritizing other things that I might be curious about doing but might not be as important to me given all the other things I want to do. I know that my deep passion lies with the triangle of audio/sound/music, software engineering, and ML/deep learning. I ended up organizing everything into these buckets:

ASR

Week 1 - Transformer training

Week 2 - Finetuning

Week 3 - Mechanistic Interpretability

Week 4 - Metrics and Evals

Week 5 - Data upload / model weights

Week 6 - Present!

Stretch goals:

Filling gaps in Software Engineering knowledge

Picking up one of these books and doing something with them (either playing on Heap and/or developing some MLOps skills):

I rounded out the day with a meeting for the Heap Computer Club, which went really well. We made a very short presentation about the cluster, how to use it, and how to get more involved, which can be found here.

Things for next cycle

Along with just continuing my working on implementing the Speech-Transformer paper, I want to see if I can slide in some of my other side quests here and there.

I also want to get better at vim, so I'm going to try to watch one video from Primagen's Vim As Your Editor series a da.