Recurse Center - Batch 3 - Cycle 2025012-20250114 - Stagnation


Stagnation

I had a hard time programming this cycle. I think I've been a bit intimated diving back into doing hard, scary things and I've been procrastinating a bit during the day. I've found a lot more success coding at night, but I want to do better at coding during the day / pairing with others.

I did manage to do some exercises in Vim, and write a little more code towards my speech transformer.

Day 1

Day 2

Day 3:

class SpeechTransformer(nn.Module):

    def __init__(self):
        super().__init__()
        self.cfg = Config()

    def forward(self, x):
        #TODO modify this code to return the right output shape once we determine what that is

        # Conv2d + ReLu - initial feature extraction

        # Conv2d + ReLu - more feature extraction

        # Linear - project to d_model dimension (this is where embedding happens!)

        # Reshape
        x = einops.reshape(x, "b ts d_model-> b (ts fb) feature_dim", feature_dim=self.cfg.d_model) # b ts fb

        # Input Encoding (Positional Encoding) - add positional information to embedded sequence

        # Attention Blocks - process the sequence
            # Layer Norm
            # Multi-Head Attention
            # Layer Norm
            # MLP

        # Layer Norm

        return x
    def forward(
            self, 
            normalized_resid_pre: Float[Tensor, "batch posn d_model"]
            ) -> Float[Tensor, "batch posn d_model"]:
        # linear map


        Q = einops.einsum(normalized_resid_pre,
                          self.W_Q,
                          "b s e, n e h -> b s n h") + self.b_Q

        K = einops.einsum(normalized_resid_pre,
                          self.W_K,
                          "b s e, n e h -> b s n h") + self.b_K

        V = einops.einsum(normalized_resid_pre,
                          self.W_V,
                          "b s e, n e h -> b s n h") + self.b_V

        attn_scores = einops.einsum(Q, 
                                    K, 
                                    "batch seq_q head_index d_head, batch seq_k head_index d_head -> batch head_index seq_q seq_k"
                                    )

        scaled_attn_scores = attn_scores / (self.cfg.d_head ** 0.5)

        masked_attn_scores = self.apply_causal_mask(scaled_attn_scores)

        A = t.softmax(masked_attn_scores, dim=-1) # attention is all we need!

        z = einops.einsum(A, V, "b n sq sk, b sk n h -> b sq n h")

        result = einops.einsum(z, self.W_O, "b sq n h, n h e -> b sq e")

        return result + self.b_O

    def apply_causal_mask(self, attn_scores: Float[Tensor, "batch n_heads query_pos, key_pos"]
                          ) -> Float[Tensor, "batch n_heads query_pos key_pos"]:

        return attn_scores.masked_fill_(t.triu(t.ones_like(attn_scores), diagonal = 1) != 0, self.IGNORE)

Things for next cycle

Like last cycle, just continuing my working on implementing the Speech-Transformer paper, and trying to practice learning generously more next cycle.