Language Model

A GPT That Thinks in Sumerian

We trained a 6.8M parameter transformer from scratch on 66,212 Sumerian literary sentences. It can't translate. It can't chat. But it has internalized the distributional structure of the language — and its generations independently confirm our semantic findings.

Architecture

Model

GPT-2 style

RMSNorm, GELU, weight tying

Parameters

6.8M

4 layers, 4 heads, 256 dim

Vocabulary

14,086

Word-level tokens (no BPE)

Context

128 tokens

Learned positional embeddings

Corpus

66K sentences

ETCSL + SumTablets literary

Training

~4 minutes

1 epoch on V100 (best checkpoint)

Why word-level? Sumerian is already segmented into morphological units in the transliterations. Each token is a meaningful unit — a stem, a grammatical marker, a determinative. BPE would destroy this structure.

What It Confirms

Three independent methods — PMI co-occurrence, Word2Vec embeddings, and now autoregressive generation — converge on the same conclusions.

me-lam₂ → Terror, Not Light

[me-lem4 ni2] →

me-lem₄ ḫuš ri
me-lem₄ ḫuš
me-lem₄ ḫuš
ur-saĝ mah
ur-saĝ gal

[me-lem4] T=1.1 →

me-lem₄ kalam uj₃ dul
ur-saĝ en-lil₂
ur-saĝ kur ad₆

When prompted with me-lam₂, the model generates ḫuš (fury), ni₂ su zig₃ (terror-flesh-rise), pirij (lion), and warrior imagery. Never light words (zalag, babbar). It learned that me-lam₂ belongs to the terror/awe cluster.

nam-tag → Heavy Burden

[nam-tag] T=0.5 →

nam-tag zu
ad gi₄
nam-tag dugud šu gi₄
nam-tag dugud ka garāš₂ zig₃

[nam-tag] T=0.8 →

nam-tag ka dib
nam-tag dugud de₆
nam-tag dugud gal
nam-tag dugud šu gi₄

At every temperature, nam-tag is followed by dugud (heavy). The model learned this isn't "sin" — it's a weight that can be carried (de₆), returned (šu gi₄), or released (du₈).

nam-erim₂ → Oath-Cutting

[nam-erim2] →

nam-erim₂ kuḍ
nam-erim₂ kuḍ
ašₓ dug₄ nam-erim₂ kuḍ
nam-erim₂ kuḍ
nam-erim₂ kuḍ

The model has one overwhelming association: nam-erim₂ kuḍ — "cut the oath." This is juridical procedure, not abstract "wickedness." Also generates munus nam-erim₂ kuḍ (woman oath-cut) showing legal context.

nam-tar → Polysemy Visible

[nam-tar] T=0.5 →

nam-tar gig
lu₂ niĝ₂ gig niĝ₂ gig jar

[nam-tar] T=0.5 →

nam-tar lu₂ a-na dug₄
lu₂ lu₂-ulu₃ dili a-na taḫ

Two completions, two meanings: nam-tar gig = the demon Namtar + illness; nam-tar lu₂ a-na dug₄ = fate/destiny, what does man say? The model mirrors the polysemy we identified in our deep dive.

Limitations & Honesty

🔄 Repetition loops

At low temperatures (T≤0.5), the model often gets stuck repeating patterns. This is a known issue with small LMs and overfitting on limited data.

📊 Overfitting

Best checkpoint is epoch 1 (val_loss=9.08). By epoch 10, train_loss=0.63 but val_loss=10.46. The model memorizes more than it generalizes.

❓ Unknown tokens

Many <UNK> tokens in generations, reflecting damaged or unreadable signs in the original tablets that entered the training data.

🧠 Pattern recall, not understanding

This model has learned statistical regularities. It doesn't "understand" Sumerian — it has internalized which words tend to follow which. That's exactly what makes it useful as a validation tool.

Training Details

Loss Curve

Epoch	Train Loss	Val Loss	Time
1	2.91	9.08 ★	3.6 min
2	1.43	9.78	3.4 min
3	1.14	10.03	3.7 min
5	0.88	10.24	3.8 min
10	0.63	10.46	3.3 min

Val loss increases monotonically after epoch 1 — classic overfitting on a small corpus (348K tokens). We use the epoch 1 checkpoint for all generations.

Why so small? The entire Sumerian literary corpus is ~348K tokens — roughly 1/10,000th of what modern LLMs train on. A 6.8M parameter model is already oversized for this data. We chose this deliberately: the goal isn't generalization, it's pattern internalization. The model should learn exactly what co-occurs with what in Sumerian literature.

Filling the Gaps

Sumerian tablets are often damaged — broken edges, worn surfaces, missing signs. A language model trained on the literary corpus can predict what's most likely in the gaps, not by "understanding" but by knowing which words statistically follow which. This could be a genuine tool for epigraphists working on fragmentary texts.

Lament context

nam-tag [?] šu gi₄

"The burden [?] was returned" — what adjective?

dugudheavy8.9%

du₈release4.3%

sug₄empty3.4%

jarplace3.0%

The model's #1 prediction — dugud (heavy) — matches our finding that nam-tag's dominant collocate is "heavy." It learned this independently.

Aura context

me-lem₄ [?] guru₃ an ki

"The aura [?] bears down on heaven and earth"

dulcover14.9%

anheaven5.4%

kalamland4.5%

ḫušfury3.8%

Top prediction: dul (to cover). me-lam₂ covers, blankets, envelops — it's a radiative force that spreads over territory. ḫuš (fury) at #4 confirms the terror cluster.

Royal context

šu-suen lugal [?] mah dug₄

"Šu-Suen, king [?] spoke majestically"

anheaven22.4%

urim₂Ur4.7%

mepowers4.7%

en-lil₂Enlil4.0%

King of heaven, king of Ur, king of ME — all plausible royal epithets. The model generates the correct register.

Temple context

e₂ [?] me gal šu du₇

"Temple [?], great ME perfected"

du₃build4.3%

kugpure/holy1.6%

mepowers1.6%

Temple built, temple pure — both are standard Sumerian collocations for e₂.

Try It

Prompt (Sumerian words)

Temperature: 0.8

Enter a prompt above and click Generate.

⚠️ Generation runs server-side. This demo uses pre-computed samples — a live API is planned.