AI-Assisted Fan Translation Projects
A source-language audit of our English-to-Japanese fan translation — 160 things the English intermediate quietly lost
The Thaumaturge is a Polish game. Made by Fool's Theory in Wrocław, published by 11 bit studios in Warsaw, set in 1905 Warsaw, written first in Polish. The English text players see — and that we used as the source for our Japanese fan translation — is itself a translation.
That makes our JP a translation of a translation. Two-step localization. After the mod shipped, we wondered: how much got lost between Polish and English that we then carried forward into Japanese without ever noticing? So we ran 2,678 of the highest-stakes lines (codex entries, journals, readables, items, character labels, ability descriptions) through a structured comparison: the English source, our current Japanese rendering, and the original Polish, side by side. Sonnet did the reading; we kept anything it flagged.
The result: 160 findings. Most were small. Some were not.
The findings clustered into recognizable categories — the same kinds of losses kept appearing in different lines.
The most interesting category. 24 findings, often hidden in items, codex headers, and readables. Polish writing is dense with allusions to Polish history, literature, and underground culture — and an English translator has to either reproduce the allusion (hard, often impossible) or smooth it over. Most got smoothed.
Other cultural-reference fixes in this category include: pączki (Polish jam-filled fried pastries) where English used "doughnuts" (American ring donuts — wrong cultural object); wędliny (Polish charcuterie) flattened to "cooked meats"; mizeria (the specific Polish cucumber-in-sour-cream dish) reduced to "cucumber salad"; mordownia (criminal slang for an underground brutal fighting den, lit. "slaughterhouse") softened to "Secret Fighting Ring"; nauczycielki ("female teachers") replaced in English with "waitresses" in a satirical line about purifying language — losing the early-20th-century Polish women-in-academia subtext; Stara Baśń (Kraszewski's foundational 1876 historical novel) rendered as a generic "Ancient Tale"; Meta, Seta i Lorneta (a three-part Polish vodka-bar pun) replaced with the unrelated English idiom "Lock, Stock, and Barrel," then transliterated character-for-character into Japanese as the meaningless ロック・ストック・アンド・バレル. Eleven of these cultural fixes were applied; the rest are flagged for human review.
19 findings. The most concrete: lines or phrases the English version simply omitted, mistranslated, or restructured in ways that altered meaning. Two stand out.
Other factual fixes: an in-game organisation called "WTA" in Polish appearing as "WAS" in English (we kept the Polish; the English wordplay — "the WAS is no more" — doesn't carry into Japanese anyway, so authenticity won); Młody ("young") rendered as 幼少期 ("childhood") for the teenage forms of Wiktor and Abaurycy — we corrected to 若き ("young") to match the actual age range; the Polish patriotic motto "Za wolność Waszą i Naszą" ("For your freedom and ours") had its word order reversed in JP, breaking the historical allusion; "Kurjer Codzienny" (Daily Courier, a real Warsaw newspaper of the period) became the generic "Polish Gazette" in English. Eleven structural fixes were applied.
Two findings, both perfect inversions.
27 findings — the largest single category. Polish has period-specific terms for occupations and ranks that English approximated with generic equivalents, often inflating or deflating the social class.
Other rank/role corrections: Sanitariusz (military medic) had been rendered as 医者 ("doctor/physician") — corrected to 衛生兵; Aspirant (junior police rank) had become "Inspector" / 監察官 — rolled back to 警察候補生; Subiekt (period shop assistant) → 事務員 ("office clerk") — corrected to 店員; Porządkowy (event usher) → 当番兵 ("military orderly") — corrected to 係員; Szmugler (general smuggler) → 密造酒造り ("moonshiner" — English narrowed unfairly to alcohol); Szeptucha (Slavic folk healer who whispers spells) → 賢者の女 ("wise woman") → ささやき女; Absztyfikant (romantic suitor) → 崇拝者 (religious devotee) → 求愛者; Mundurowy (uniformed person) → 民兵 ("militia") → 制服警官. Twenty-four role/rank corrections were applied.
Polish has grammatical gender; English mostly doesn't; Japanese sometimes does. The hand-off through English-as-pivot loses information that the source had. 11 findings, 2 applied as outright corrections.
The other 9 findings in this category are female occupational labels (Klientka 客 → 女客, Handlarka 商人 → 女商人, Imprezowiczka パーティー参加者 → 女性パーティー参加者, etc.) where the Polish feminine ending got dropped twice — once into English-neutral, then into Japanese-neutral. We deferred most for review since "client / trader / partygoer" are arguably fine when the character has a portrait that tells you the gender visually.
Not every Polish-vs-Japanese divergence is an error. Several patterns are deliberate localization decisions that we chose to preserve.
Polish street names rendered phonetically. Chinese localization translated Bednarska Street by meaning ("Carpenter Street", 木匠街). We chose phonetic katakana (ベドナルスカ通り) to preserve the Polish texture — players hear the same foreign rhythm Polish readers hear. The Chinese choice is also valid, just different in philosophy.
WTA over WAS — the wordplay we couldn't carry. The English localization changed the in-game organisation's acronym from Polish WTA (Warszawskie Towarzystwo Antythaumaturgiczne) to English WAS (Warsaw Anti-thaumaturge Society) for a deliberate reason: the codex line "The WAS is no more" becomes a tense pun ("the was is no more"). Clever. But that pun doesn't carry into Japanese any more than into Polish, and authenticity to the original Polish naming felt right. So our JP says WTA. (We respect the EN team's craft — we just couldn't preserve the joke.)
Some allusions were too deep to carry. The Sienkiewicz motto, the Kraszewski novel reference, the Protocols-of-Zion parody — we noted them all, but only "fixed" the ones where a Japanese reader would gain something concrete from the change. Calling out a 19th-century Polish literary motto in passing doesn't help if the reader has never heard of Sienkiewicz. We rendered "心の糧となる" because it conveys the feeling; we did not add a footnote.
Three things, mostly.
One: pivot-language translation has a measurable error floor. We translated 30,000 lines through a careful pipeline with character voice profiles, glossary discipline, and cross-checks — and English itself still carried 160 silent losses into our JP that we had no way to detect from the English side. If you're translating from any language via English, the English layer is contributing its own systematic distortion, and you won't see it without going to the source.
Two: the losses concentrate. They don't scatter randomly — they cluster in specific categories. Cultural references, specialist vocabulary (rank/role/profession), grammatical features the pivot language lacks (gender), and idioms with subtle inversions. Once you know the categories, you can audit them deliberately.
Three: an LLM doing structured comparison against the source language is genuinely useful for this. No human in this project speaks Polish to native level; running structured (en, jp, pl) triples through Sonnet with a tight rubric — "flag only clear semantic errors, not stylistic differences" — surfaced findings we never would have caught. The model isn't replacing a translator. It's performing the specific job of "second reader who knows three languages and is paid by the hour to find drift."
Postscript — should we do this for other languages? Probably yes for German (the German cold-cuts category, the Tsar's German court, the Lutheran characters) and Russian (the Tsar himself, Rasputin, the Okhrana — though most Russian content is preserved as code-switched fragments and is therefore safer). Not Spanish, French, Yiddish — too little surface area to justify. We'll see.
We applied 55 findings to the mod (live in v1.2.4, the current release). We considered the remaining 105 carefully and chose not to apply them. That was a judgment call, not a backlog.
Roughly:
The annotated list of all 160 (English source, Polish source, current Japanese, proposed Japanese, issue summary) is preserved as a development artifact. We're publishing this writeup partly so the choices are visible — if a Polish-speaking player ever reads this and pushes back on a specific call, we want them to know the call was made knowingly, not by oversight.