JavaScript — CJK, IME & Unicode bugs and fixes (26 cases)

Kana / romaji 4

Katakana ン loses the syllabic-n apostrophe in Hepburn romanization
hepburn JS open

Katakana ン before a vowel or Y is romanized without the apostrophe, unlike hiragana ん. シンヨウ becomes SHINYOU (should be SHIN'YOU), so it collides with シニョウ.
Historical kana ゐ/ゑ (wi/we) missing from romaji conversion
romaji-conv JS open

The historical kana ゐ (wi) and ゑ (we) are absent from the mapping, so text containing them is dropped or left unconverted.
Reversed ヲ/ヺ dakuten mapping when adding voiced marks (jaco-js)
jaco-js JS open

The ヲ to ヺ voiced-mark mapping is reversed, so adding or stripping a dakuten on ヲ produces the wrong character.
Romaji conversion drops the z in づ (outputs u instead of zu)
kana-romaji JS open

kana-romaji library drops the 'z' consonant when romanizing づ, outputting 'u' instead of 'zu'.

Width / normalization 2

Long-vowel mark ー expands with the wrong vowel after katakana ヒ/ビ
normal-jp JS open

During normalization, the chōonpu (ー) is expanded with the wrong vowel after katakana ヒ and ビ.
Full-width/kana conversion drops the first and last char of each range (moji)
moji JS open

Full-width/half-width and kana range conversions skip the boundary characters of each range (！～ぁゖ ...), so edge code points like ！ (U+FF01) are not converted.

Surrogate & grapheme 7

cli-table3 splits surrogate pairs (emoji / CJK) when truncating wide text
cli-table3 JS open

cli-table3 truncates text by byte/code-unit count rather than code-point count, splitting surrogate pairs in emoji or supplementary CJK characters, producing mojibake in terminal table cells.
opentype.js does not clamp cmap format 12/13 codes to U+10FFFF
opentype.js JS open

opentype.js does not clamp cmap format 12/13 character codes to U+10FFFF; malformed fonts with out-of-range codes cause incorrect glyph lookups for supplementary characters.
markdown-it smart quotes break around non-BMP (U+10000+) characters
markdown-it JS closed

markdown-it's smart quotes replacement does not handle non-BMP punctuation and symbols (U+10000+); surrounding text with supplementary characters causes wrong quote pairing or no conversion.
Slate splits Indic conjunct clusters (UAX #29 GB9c) across graphemes
slate JS open

Slate rich-text editor does not implement Unicode UAX #29 GB9c rule, splitting Indic conjunct clusters (consonant + virama + consonant sequences) across grapheme boundaries, causing incorrect cursor positioning and deletion in Hindi, Bengali, Tamil, etc.
grapheme-splitter breaks ZWJ emoji (flags, skin tones) into pieces
grapheme-splitter JS closed

grapheme-splitter breaks ZWJ-joined emoji into parts instead of one grapheme cluster: the rainbow flag splits into its component glyphs, and skin-tone sequences come apart.
lodash _.toArray splits a tag-sequence flag emoji into code points
lodash JS closed

_.toArray splits an emoji built from a tag sequence (a subdivision flag) into its component code points instead of returning it as one element.
emoji-regex matches a text-presentation char followed by U+FE0E (VS15)
emoji-regex JS open

emoji-regex matches a base character even when it is followed by U+FE0E (the text variation selector), so text-presentation characters are wrongly classified as emoji.

Numerals 1

formatjs relativetimeformat ignores numberingSystem (always Latin digits)
formatjs JS open

formatjs intl-relativetimeformat ignores the numberingSystem locale option (e.g., 'jpan', 'arab'), always producing Latin numerals in relative time strings.

Locale data 9

timeago.js Japanese future times say 以内 (within) instead of 後 (later)
timeago.js JS merged

Future timestamps in the ja locale used 以内 (within) instead of 後 (later), so '3 minutes from now' rendered as 3分以内 (within 3 minutes), which means the opposite.
cronstrue Japanese day-of-month step description is mistranslated
cronstrue JS merged

The Japanese description for a day-of-month step was not scoped to a month, producing a mistranslated cron description.
FilePond Japanese label: 読込中 should be アップロード中 (uploading)
filepond JS open

FilePond Japanese locale uses '読込中' (loading/reading) for the file processing label, which should be 'アップロード中' (uploading) to accurately describe the action.
jsoneditor Japanese locale is incomplete (strings fall back to English)
jsoneditor JS open

jsoneditor's Japanese locale is incomplete; many UI strings remain in English when ja locale is selected.
Uppy Japanese folderAdded smart_count plural placeholder is broken
uppy JS open

Uppy's Japanese (ja_JP) locale has a broken smart_count plural placeholder in the 'folderAdded' string, causing pluralization to fail and show a raw placeholder.
Video.js is missing the Japanese label for Picture-in-Picture
video.js JS open

Video.js Japanese (ja) locale is missing the translation for the 'Playing in Picture-in-Picture' accessibility string, falling back to English.
jp-prefectures.js: Aichi (愛知県) English name wrongly set to 'ehime'
jp-prefectures.js JS open

jp-prefectures.js has the English name of Aichi prefecture (愛知県) set to 'ehime' (which is Ehime prefecture / 愛媛県), causing incorrect prefecture mapping.
date-fns Galician formats June as xuño but cannot parse it back
date-fns JS open

In the gl (Galician) locale the June parse pattern is /^xun/i. It matches the abbreviation "xun" but not the wide form "xuño", because the third character is ñ, not n. So format then parse round-trips fail for June; the locale's own snapshot already records Invalid Date for June while the other eleven months parse.
Select2 Japanese locale is missing the removeItem / search ARIA labels
select2 JS open

The Japanese (ja) locale is missing the removeItem and search keys, so Japanese users get the English fallback for two ARIA labels: the per-item remove button (used in selection/multiple.js) and the search field (used in selection/search.js and dropdown/search.js). Both are part of the canonical set in en.js.

Unicode range 1

validator.js isAlphanumeric el-GR rejects accented Greek that isAlpha accepts
validator.js JS open

isAlpha('el-GR') uses the Greek range [Α-ώ] (U+0391–U+03CE) but isAlphanumeric('el-GR') still ends at ω ([0-9Α-ω], U+03C9). So isAlphanumeric rejects ό, ύ, ώ and the uppercase Ό/Ύ/Ώ even though isAlpha accepts them, and common words like νερό or πρώτα pass isAlpha but fail isAlphanumeric.

Regex roundtrip 1

regexp-tree emits a leading ^ unescaped, turning [a^] into [^a]
regexp-tree JS open

Optimizing or regenerating a character class can move a literal ^ to the front and emit it unescaped, flipping the meaning: [a^] round-trips to [^a] (a negated class).

Codegen escape 1

Markdoc formatter over-escapes a mid-line # (C# becomes C\#)
markdoc JS open

The formatter over-escapes a # in the middle of a line because the heading branch of the escape regex is not anchored to line start: 'C# is a language' becomes 'C\# is a language'.

Other stacks

React Vue Python TypeScript Rust Angular Localization (i18n)Windows Web platform specs Zed

← back to all 93 entries