Category

Surrogate & grapheme

Code that walks text by UTF-16 code unit or bare code point instead of by grapheme cluster. Surrogate pairs and non-BMP characters get split, ZWJ emoji and variation selectors are mis-detected, and combining marks or conjunct clusters drift away from their base.

11 cases in the corpus

Other categories

← back to all 93 entries