Kana / romaji 4
-
pykakasi: missing っでぃ (ddi) sokuon in Hepburn/Kunrei romaji
The geminated d + small i sequence っでぃ (ddi) has no Hepburn or Kunrei entry, so loanword spellings that use it romanize incorrectly.
-
jaconv kana2alphabet does not romanize small katakana ヵ/ヶ
kana2alphabet does not handle the small katakana ヵ/ヶ (small ka/ke), so counters like 一ヶ月 are mis-romanized.
-
pykakasi fails to romanize half-width katakana with voiced marks
pykakasi does not romanize half-width katakana correctly, particularly when a half-width voiced or semi-voiced mark (U+FF9E / U+FF9F) follows the base kana.
-
Python unidecode mangles half-width katakana with dakuten/handakuten
Python unidecode transliterates half-width katakana carrying dakuten/handakuten incorrectly, producing artifacts, while hiragana and full-width katakana romanize correctly.
Segmentation / word count 1
-
split() word count treats a spaceless CJK answer as one word (omi)
Onboarding decides whether a spoken answer has enough content with len(transcript.split()) >= 2. str.split() returns 1 for CJK text that has no spaces, so a full answer like 東京に住んでいます is counted as a single word, never reaches the LLM check, and the question stays marked unanswered for Japanese, Chinese, and Korean speakers.
Numerals 1
-
kanji2number cannot parse 萬, the daiji form of 万 (Kanjize)
kanji2number cannot parse 萬, the daiji (大字) traditional form of 万 (10,000), so legal and financial documents that use 大字 numerals fail to convert.
Other stacks
← back to all 93 entries