Recurse Center - Batch 3 - Cycle 2025012-20250114 - Stagnation
Stagnation
I had a hard time programming this cycle. I think I've been a bit intimated diving back into doing hard, scary things and I've been procrastinating a bit during the day. I've found a lot more success coding at night, but I want to do better at coding during the day / pairing with others.
I did manage to do some exercises in Vim, and write a little more code towards my speech transformer.
Day 1
- Research
Day 2
- Research
Day 3:
-
Introduction exercises in vim from Primagen's Vim As Your Editor series.
-
I outlined the different steps necessary for implementing the speech transformer's forward pass:
class SpeechTransformer(nn.Module):
def __init__(self):
super().__init__()
self.cfg = Config()
def forward(self, x):
#TODO modify this code to return the right output shape once we determine what that is
# Conv2d + ReLu - initial feature extraction
# Conv2d + ReLu - more feature extraction
# Linear - project to d_model dimension (this is where embedding happens!)
# Reshape
x = einops.reshape(x, "b ts d_model-> b (ts fb) feature_dim", feature_dim=self.cfg.d_model) # b ts fb
# Input Encoding (Positional Encoding) - add positional information to embedded sequence
# Attention Blocks - process the sequence
# Layer Norm
# Multi-Head Attention
# Layer Norm
# MLP
# Layer Norm
return x
- And I implemented the Attention layer, based on some past work / resources from ARENA:
def forward(
self,
normalized_resid_pre: Float[Tensor, "batch posn d_model"]
) -> Float[Tensor, "batch posn d_model"]:
# linear map
Q = einops.einsum(normalized_resid_pre,
self.W_Q,
"b s e, n e h -> b s n h") + self.b_Q
K = einops.einsum(normalized_resid_pre,
self.W_K,
"b s e, n e h -> b s n h") + self.b_K
V = einops.einsum(normalized_resid_pre,
self.W_V,
"b s e, n e h -> b s n h") + self.b_V
attn_scores = einops.einsum(Q,
K,
"batch seq_q head_index d_head, batch seq_k head_index d_head -> batch head_index seq_q seq_k"
)
scaled_attn_scores = attn_scores / (self.cfg.d_head ** 0.5)
masked_attn_scores = self.apply_causal_mask(scaled_attn_scores)
A = t.softmax(masked_attn_scores, dim=-1) # attention is all we need!
z = einops.einsum(A, V, "b n sq sk, b sk n h -> b sq n h")
result = einops.einsum(z, self.W_O, "b sq n h, n h e -> b sq e")
return result + self.b_O
def apply_causal_mask(self, attn_scores: Float[Tensor, "batch n_heads query_pos, key_pos"]
) -> Float[Tensor, "batch n_heads query_pos key_pos"]:
return attn_scores.masked_fill_(t.triu(t.ones_like(attn_scores), diagonal = 1) != 0, self.IGNORE)
Things for next cycle
Like last cycle, just continuing my working on implementing the Speech-Transformer paper, and trying to practice learning generously more next cycle.