Kana / romaji
Python
cited
closed
Python unidecode mangles half-width katakana with dakuten/handakuten
Faulty transliteration of half-width katakana with dakuten and handakuten
unidecode · avian2/unidecode
Symptom
Python unidecode transliterates half-width katakana carrying dakuten/handakuten incorrectly, producing artifacts, while hiragana and full-width katakana romanize correctly.
Minimal repro
unidecode on half-width ba bi bu be bo returns a wrong string instead of 'babibubebo'.
Fix
Pre-compose half-width katakana plus combining voiced marks (NFKC) before the transliteration lookup.