Research Lab
The ME Project
Computational re-analysis of Sumerian literary texts — bypassing Akkadian-mediated translations to recover original meanings through distributional semantics.
Key Findings
ME ≠ "Divine Decree"
Distributional analysis of 1,584 occurrences across 394 literary texts suggests ME behaves more like an operational parameter — scalar, manipulable, storable, transferable. The evidence is suggestive but not conclusive: ME's verbal profile isn't unique among abstract nouns.
ME-LAM₂ ≠ Light
The "radiance" of ME (melammu) is not in the light semantic cluster. Its nearest neighbors are ni₂ (terror), dul (to cover), and izi (fire). It's a radiative emanation that causes physical reactions — closer to an energy field than to brightness.
NAM-ERIM₂ ≠ "Wickedness"
Conventionally translated as "wickedness" or "evil," nam-erim₂'s nearest embedding neighbor is Ištaran (god of justice), followed by di (judgment) and ka-aš (oath). It's a juridical concept — oath-violation, not moral evil.
NAM-TAG ≠ "Sin"
Conventionally "sin/transgression." But nam-tag is heavy (dugud, 23%), releasable (du₈, 16%), and universal — "never was a child without nam-tag born from its mother." Closer to karmic weight than moral failing.
INANA = Holy, Not Warrior
Attention probing reveals Inana's dominant trait: kug (pure/holy) at 0.96 attention weight — nearly saturated. The warrior and sexual narratives are present but secondary. The statistical texture of the corpus says: Inana is first and foremost ritually pure.
NAM-LUGAL = Physical Insignia
Kingship in Sumerian isn't abstract virtue or divine mandate. The model attends to gu-za (throne, 0.10), aga (crown, 0.06), barag (dais, 0.05). NAM-LUGAL is a set of transferable objects — whoever holds the insignia holds the kingship.
Method
Corpus Assembly
394 ETCSL literary texts + 1,000 SumTablets literary + 72,873 ETCSANS annotated + 82,452 SumTablets. Master corpus: 526,030 sentences, 5M tokens, 194K unique forms. Literary subset: 66K sentences, 8,868 vocabulary.
Distributional Analysis
PMI co-occurrence matrices, morphological decomposition, frequency analysis. No Akkadian translations consulted — let the Sumerian speak.
Word Embeddings
Skip-Gram Word2Vec (100d, window=5) trained on combined literary corpus. 8,868 vocabulary, 66,212 sentences. Reveals semantic clusters invisible to close reading.
Visualization
t-SNE and UMAP dimensionality reduction map the full semantic space into 2D. Color-coded by category: ME, NAM- compounds, divine names, light terms, spatial terms.
Language Model
6.8M parameter GPT-2 style transformer (4 layers, 4 heads, 256d) trained from scratch on 348K literary tokens. Generates Sumerian text, predicts missing words in damaged tablets, and provides independent validation of distributional findings.
Attention Probing
Extract and analyze attention weights from all 16 heads (4 layers × 4 heads) of the trained GPT across 60+ terms. Reveals what the model has learned to associate with each word — a third independent method confirming or challenging distributional findings.
Tools
Word Explorer
Search 8,868 words — semantic neighbors, similarity scores, usage examples.
Language Model
6.8M param GPT — generates Sumerian text and predicts missing words in damaged tablets.
Attention Probing
What the neural network sees — explore attention patterns across 60+ terms.
Constellation Map
Navigate 500+ words as a force-directed galaxy. Click any star to explore its connections.
Corpus Browser
Dictionary of 2,945 words with glosses, POS, collocations, and usage in 394 texts.
Text Reader
Inana's Descent with computational annotations. Hover any term for conventional vs. distributional reading.
Semantic Map
8,868 Sumerian words projected into 2D. Each dot is a word; proximity = semantic similarity. Use the search box to find specific terms. Key terms are highlighted by default.
Translation Distortions
The Akkadian translations consistently convert operational Sumerian concepts into static ones. The dynamism — scalar, manipulable, transferable — is systematically flattened.