Kana / romaji Python cited closed

Python unidecode mangles half-width katakana with dakuten/handakuten

Faulty transliteration of half-width katakana with dakuten and handakuten

unidecode · avian2/unidecode

Symptom

Python unidecode transliterates half-width katakana carrying dakuten/handakuten incorrectly, producing artifacts, while hiragana and full-width katakana romanize correctly.

Minimal repro
unidecode on half-width ba bi bu be bo returns a wrong string instead of 'babibubebo'.
Fix

Pre-compose half-width katakana plus combining voiced marks (NFKC) before the transliteration lookup.

Upstream issue → #unidecode-halfwidth-dakuten

Also in: Python

← all 93 entries