Skip to content
← Research Lab
Language Model

A GPT That Thinks in Sumerian

We trained a 6.8M parameter transformer from scratch on 66,212 Sumerian literary sentences. It can't translate. It can't chat. But it has internalized the distributional structure of the language — and its generations independently confirm our semantic findings.

Architecture

Model
GPT-2 style
RMSNorm, GELU, weight tying
Parameters
6.8M
4 layers, 4 heads, 256 dim
Vocabulary
14,086
Word-level tokens (no BPE)
Context
128 tokens
Learned positional embeddings
Corpus
66K sentences
ETCSL + SumTablets literary
Training
~4 minutes
1 epoch on V100 (best checkpoint)
Why word-level? Sumerian is already segmented into morphological units in the transliterations. Each token is a meaningful unit — a stem, a grammatical marker, a determinative. BPE would destroy this structure.

What It Confirms

Three independent methods — PMI co-occurrence, Word2Vec embeddings, and now autoregressive generation — converge on the same conclusions.

me-lam₂ → Terror, Not Light

[me-lem4 ni2] →
me-lem₄ ḫuš ri
me-lem₄ ḫuš
me-lem₄ ḫuš
ur-saĝ mah
ur-saĝ gal
[me-lem4] T=1.1 →
me-lem₄ kalam uj₃ dul
ur-saĝ en-lil₂
ur-saĝ kur ad₆

When prompted with me-lam₂, the model generates ḫuš (fury), ni₂ su zig₃ (terror-flesh-rise), pirij (lion), and warrior imagery. Never light words (zalag, babbar). It learned that me-lam₂ belongs to the terror/awe cluster.

nam-tag → Heavy Burden

[nam-tag] T=0.5 →
nam-tag zu
ad gi₄
nam-tag dugud šu gi₄
nam-tag dugud ka garāš₂ zig₃
[nam-tag] T=0.8 →
nam-tag ka dib
nam-tag dugud de₆
nam-tag dugud gal
nam-tag dugud šu gi₄

At every temperature, nam-tag is followed by dugud (heavy). The model learned this isn't "sin" — it's a weight that can be carried (de₆), returned (šu gi₄), or released (du₈).

nam-erim₂ → Oath-Cutting

[nam-erim2] →
nam-erim₂ kuḍ
nam-erim₂ kuḍ
ašₓ dug₄ nam-erim₂ kuḍ
nam-erim₂ kuḍ
nam-erim₂ kuḍ

The model has one overwhelming association: nam-erim₂ kuḍ — "cut the oath." This is juridical procedure, not abstract "wickedness." Also generates munus nam-erim₂ kuḍ (woman oath-cut) showing legal context.

nam-tar → Polysemy Visible

[nam-tar] T=0.5 →
nam-tar gig
lu₂ niĝ₂ gig niĝ₂ gig jar
[nam-tar] T=0.5 →
nam-tar lu₂ a-na dug₄
lu₂ lu₂-ulu₃ dili a-na taḫ

Two completions, two meanings: nam-tar gig = the demon Namtar + illness; nam-tar lu₂ a-na dug₄ = fate/destiny, what does man say? The model mirrors the polysemy we identified in our deep dive.

Limitations & Honesty

🔄 Repetition loops

At low temperatures (T≤0.5), the model often gets stuck repeating patterns. This is a known issue with small LMs and overfitting on limited data.

📊 Overfitting

Best checkpoint is epoch 1 (val_loss=9.08). By epoch 10, train_loss=0.63 but val_loss=10.46. The model memorizes more than it generalizes.

❓ Unknown tokens

Many <UNK> tokens in generations, reflecting damaged or unreadable signs in the original tablets that entered the training data.

🧠 Pattern recall, not understanding

This model has learned statistical regularities. It doesn't "understand" Sumerian — it has internalized which words tend to follow which. That's exactly what makes it useful as a validation tool.

Training Details

Loss Curve

EpochTrain LossVal LossTime
12.919.08 ★3.6 min
21.439.783.4 min
31.1410.033.7 min
50.8810.243.8 min
100.6310.463.3 min

Val loss increases monotonically after epoch 1 — classic overfitting on a small corpus (348K tokens). We use the epoch 1 checkpoint for all generations.

Why so small? The entire Sumerian literary corpus is ~348K tokens — roughly 1/10,000th of what modern LLMs train on. A 6.8M parameter model is already oversized for this data. We chose this deliberately: the goal isn't generalization, it's pattern internalization. The model should learn exactly what co-occurs with what in Sumerian literature.

Filling the Gaps

Sumerian tablets are often damaged — broken edges, worn surfaces, missing signs. A language model trained on the literary corpus can predict what's most likely in the gaps, not by "understanding" but by knowing which words statistically follow which. This could be a genuine tool for epigraphists working on fragmentary texts.

Lament context
nam-tag [?] šu gi₄
"The burden [?] was returned" — what adjective?
dugudheavy8.9%
du₈release4.3%
sug₄empty3.4%
jarplace3.0%
The model's #1 prediction — dugud (heavy) — matches our finding that nam-tag's dominant collocate is "heavy." It learned this independently.
Aura context
me-lem₄ [?] guru₃ an ki
"The aura [?] bears down on heaven and earth"
dulcover14.9%
anheaven5.4%
kalamland4.5%
ḫušfury3.8%
Top prediction: dul (to cover). me-lam₂ covers, blankets, envelops — it's a radiative force that spreads over territory. ḫuš (fury) at #4 confirms the terror cluster.
Royal context
šu-suen lugal [?] mah dug₄
"Šu-Suen, king [?] spoke majestically"
anheaven22.4%
urim₂Ur4.7%
mepowers4.7%
en-lil₂Enlil4.0%
King of heaven, king of Ur, king of ME — all plausible royal epithets. The model generates the correct register.
Temple context
e₂ [?] me gal šu du₇
"Temple [?], great ME perfected"
du₃build4.3%
kugpure/holy1.6%
mepowers1.6%
Temple built, temple pure — both are standard Sumerian collocations for e₂.

Try It

Enter a prompt above and click Generate.

⚠️ Generation runs server-side. This demo uses pre-computed samples — a live API is planned.

Filter:
↑↓ navigate select
Full search →
v1.5.0

Liang Yi Museum

  • New article: 'Where Touch Is Allowed' on Hong Kong's Liang Yi Museum
  • Explores the philosophy of tactile museum experiences and Ming dynasty furniture
View all updates
New Article

Mar 11, 2026

What a Neural Network Sees in Sumerian

We trained a 6.8M parameter GPT on 66K Sumerian sentences and probed its attention weights. The results confirm some philological claims, challenge others, and reveal semantic associations invisible to traditional methods.

Read Article