A GPT That Thinks in Sumerian
We trained a 6.8M parameter transformer from scratch on 66,212 Sumerian literary sentences. It can't translate. It can't chat. But it has internalized the distributional structure of the language — and its generations independently confirm our semantic findings.
Architecture
What It Confirms
Three independent methods — PMI co-occurrence, Word2Vec embeddings, and now autoregressive generation — converge on the same conclusions.
me-lam₂ → Terror, Not Light
me-lem₄ ḫuš
me-lem₄ ḫuš
ur-saĝ mah
ur-saĝ gal
ur-saĝ en-lil₂
ur-saĝ kur ad₆
When prompted with me-lam₂, the model generates ḫuš (fury), ni₂ su zig₃ (terror-flesh-rise), pirij (lion), and warrior imagery. Never light words (zalag, babbar). It learned that me-lam₂ belongs to the terror/awe cluster.
nam-tag → Heavy Burden
ad gi₄
nam-tag dugud šu gi₄
nam-tag dugud ka garāš₂ zig₃
nam-tag dugud de₆
nam-tag dugud gal
nam-tag dugud šu gi₄
At every temperature, nam-tag is followed by dugud (heavy). The model learned this isn't "sin" — it's a weight that can be carried (de₆), returned (šu gi₄), or released (du₈).
nam-erim₂ → Oath-Cutting
nam-erim₂ kuḍ
ašₓ dug₄ nam-erim₂ kuḍ
nam-erim₂ kuḍ
nam-erim₂ kuḍ
The model has one overwhelming association: nam-erim₂ kuḍ — "cut the oath." This is juridical procedure, not abstract "wickedness." Also generates munus nam-erim₂ kuḍ (woman oath-cut) showing legal context.
nam-tar → Polysemy Visible
lu₂ niĝ₂ gig niĝ₂ gig jar
lu₂ lu₂-ulu₃ dili a-na taḫ
Two completions, two meanings: nam-tar gig = the demon Namtar + illness; nam-tar lu₂ a-na dug₄ = fate/destiny, what does man say? The model mirrors the polysemy we identified in our deep dive.
Limitations & Honesty
🔄 Repetition loops
At low temperatures (T≤0.5), the model often gets stuck repeating patterns. This is a known issue with small LMs and overfitting on limited data.
📊 Overfitting
Best checkpoint is epoch 1 (val_loss=9.08). By epoch 10, train_loss=0.63 but val_loss=10.46. The model memorizes more than it generalizes.
❓ Unknown tokens
Many <UNK> tokens in generations, reflecting damaged or unreadable signs in the original tablets that entered the training data.
🧠 Pattern recall, not understanding
This model has learned statistical regularities. It doesn't "understand" Sumerian — it has internalized which words tend to follow which. That's exactly what makes it useful as a validation tool.
Training Details
Loss Curve
| Epoch | Train Loss | Val Loss | Time |
|---|---|---|---|
| 1 | 2.91 | 9.08 ★ | 3.6 min |
| 2 | 1.43 | 9.78 | 3.4 min |
| 3 | 1.14 | 10.03 | 3.7 min |
| 5 | 0.88 | 10.24 | 3.8 min |
| 10 | 0.63 | 10.46 | 3.3 min |
Val loss increases monotonically after epoch 1 — classic overfitting on a small corpus (348K tokens). We use the epoch 1 checkpoint for all generations.
Filling the Gaps
Sumerian tablets are often damaged — broken edges, worn surfaces, missing signs. A language model trained on the literary corpus can predict what's most likely in the gaps, not by "understanding" but by knowing which words statistically follow which. This could be a genuine tool for epigraphists working on fragmentary texts.
Try It
Enter a prompt above and click Generate.
⚠️ Generation runs server-side. This demo uses pre-computed samples — a live API is planned.