Attention Probing
What the Model Sees
For each Sumerian word, we extracted attention weights from all 4 layers of our GPT. Attends to = what the word looks at in its context. Attended by = what later words look back at it for.
For each Sumerian word, we extracted attention weights from all 4 layers of our GPT. Attends to = what the word looks at in its context. Attended by = what later words look back at it for.