Width / normalization — CJK / Unicode bugs, repros & fixes (5 cases)

Long-vowel mark ー expands with the wrong vowel after katakana ヒ/ビ
normal-jp JS open

During normalization, the chōonpu (ー) is expanded with the wrong vowel after katakana ヒ and ビ.
Full-width/kana conversion drops the first and last char of each range (moji)
moji JS open

Full-width/half-width and kana range conversions skip the boundary characters of each range (！～ぁゖ ...), so edge code points like ！ (U+FF01) are not converted.
Meilisearch charabia mis-detects half-width katakana script (Japanese search)
charabia Rust open

Meilisearch's charabia tokenizer incorrectly classifies halfwidth katakana (U+FF65-U+FF9F) and some fullwidth forms as wrong scripts, causing them to be processed by the wrong tokenizer and failing Japanese search.
tabled splits combining marks from their base grapheme when wrapping width
tabled Rust open

tabled's text-wrapping splits combining marks (diacritics, dakuten, etc.) away from their base grapheme when calculating width for terminal table cells.
Zed block cursor is misaligned over ambiguous-width Unicode characters
zed Rust open

In Zed editor, the block cursor over Unicode ambiguous-width characters (East Asian Width 'A' category, e.g., some symbols, box-drawing) is misaligned — the cursor glyph is not centered in the cell.

Other categories