Recurse Center - Batch 2 - Cycle 20241024-20241026 - Interpretation


Interpretation

This cycle was spent starting mechanistic interpretability, training my automatic speech digit recognizer based on Audrey, and contemplating how to best work on and maintain the Heap machines.

Day 1

On Mechanistic Interpretability

I worked on using TransformerLens to look inside of a pretrained GPT-2 model and begin the practice of mechanistic interpretability!

Day 2

Today I worked a bit more on mechanistic interpretability, but quickly realised that there was a lot more required reading I needed to do in order to get through the next section, so I instead decided to procrastinate reaeding at RC to work on Audrey a bit more. I'm delighted to say I got it training and working! It doesn't generalize well to my voice, so I'm going to go back and make a dataset derived just from my voice, so that my trained model works uniquely for me. I'm realizing that it might be very easy to imagine a world where everyone just has their own personal weights for models, and you could just pipe that into the model, given its archiecture, and have it would exceptionally well for you. I'm going to explore this idea more during the rest of my batch...

Day 3:

Today I took some time to reflect and sift through the large amount of Neel Nanda content out there. As I'm working through this material on mechanistic interpretability and reverse-engineering transformers, I'm trying to organize a sequence of things to read in order to keep up.

These two feel like the first places to start / read:

Induction Circuits

Induction heads - illustrated

These are some videos that I think would be next in line to watch:

A Walkthrough of A Mathematical Framework for Transformer Circuits video

A Mathematical Framework for Transformer Circuits paper

A Walkthrough of In-Context Learning and Induction Heads video

In-context Learning and Induction Heads paper

And then these are more around context and open questions in the field - more optional but really help to set the stage and articulate the stakes of working on this problem:

Things for next cycle

For the reset of Chapter 1 in ARENA, we get to choose what to do next, given a set of exercises. I'm interested in superposiiton, so I'm excited to check out resources like this one on Toy Models of Superposition.

I also want to go back and think about transformers a bit more deeply.

I had a really great conversation with two Recursers about audio classification, neural network architectures, and other things. As I'm finishing up this first pass on a simple ASR system, we were thinking about interesting challenges we could work on. One could be solving audido CAPTCHA challenges. As someone who has been on the internet for a very long time, I couldn't believe I had never encountered the audio version of CAPTCHAs before!

I'm wondering if it might be a fun challenge to build a Reinforcement Learning (RL) project that learns to solve these audio CAPTCHA challenges....something to think about for the second half of my batch when we get to RL with ARENA.

Some thing I want to do for Audrey include:

Along with using my own voice, I'd like to spend some time looking at this Audio MNIST dataset.

I'm also thinking about the architecture I should choose in building my transformer-based ASR system. It seems like the Conformer is what I'm looking for. Some resources include:

There are other, non-CNN architechtures as well: