Kana / romaji — CJK / Unicode bugs, repros & fixes (8 cases)

Katakana ン loses the syllabic-n apostrophe in Hepburn romanization
hepburn JS open

Katakana ン before a vowel or Y is romanized without the apostrophe, unlike hiragana ん. シンヨウ becomes SHINYOU (should be SHIN'YOU), so it collides with シニョウ.
pykakasi: missing っでぃ (ddi) sokuon in Hepburn/Kunrei romaji
pykakasi Python open

The geminated d + small i sequence っでぃ (ddi) has no Hepburn or Kunrei entry, so loanword spellings that use it romanize incorrectly.
Historical kana ゐ/ゑ (wi/we) missing from romaji conversion
romaji-conv JS open

The historical kana ゐ (wi) and ゑ (we) are absent from the mapping, so text containing them is dropped or left unconverted.
jaconv kana2alphabet does not romanize small katakana ヵ/ヶ
jaconv Python open

kana2alphabet does not handle the small katakana ヵ/ヶ (small ka/ke), so counters like 一ヶ月 are mis-romanized.
Reversed ヲ/ヺ dakuten mapping when adding voiced marks (jaco-js)
jaco-js JS open

The ヲ to ヺ voiced-mark mapping is reversed, so adding or stripping a dakuten on ヲ produces the wrong character.
Romaji conversion drops the z in づ (outputs u instead of zu)
kana-romaji JS open

kana-romaji library drops the 'z' consonant when romanizing づ, outputting 'u' instead of 'zu'.
pykakasi fails to romanize half-width katakana with voiced marks
pykakasi Python closed

pykakasi does not romanize half-width katakana correctly, particularly when a half-width voiced or semi-voiced mark (U+FF9E / U+FF9F) follows the base kana.
Python unidecode mangles half-width katakana with dakuten/handakuten
unidecode Python closed

Python unidecode transliterates half-width katakana carrying dakuten/handakuten incorrectly, producing artifacts, while hiragana and full-width katakana romanize correctly.

Other categories

IME composition Width / normalization Surrogate & grapheme Segmentation / word count Numerals Locale data Unicode range Regex roundtrip Codegen escape Encoding & BOM

← back to all 93 entries