A field guide to the Japan-shaped holes

CJK / Unicode
Failure Corpus

Real CJK, IME, and Unicode/text-handling bugs in open-source libraries. Repro, affected libs, and the fix. Each entry links to its fix and to cjk-agent-fixtures, the CI fixtures that keep these regressions from coming back.

93 entries  /  87 libraries  /  15 already merged

Most entries are pull requests authored by greymoth, taken verbatim from the GitHub API. A few are cited upstream issues from the wider ecosystem that document the same failures; those are marked cited and link to the original report.

When a Japanese/Chinese/Korean user types, they press Enter to confirm an IME conversion (pick a kanji candidate). That same Enter often fires the component's own keydown / submit / select handler, so the form sends or an item is selected with half-finished text. The guard is one line: skip the handler while the composition is active (event.isComposing, or keyCode 229; in React read event.nativeEvent.isComposing).

IME composition Vue merged

naive-ui: Enter during IME composition adds a tag (Vue dynamic tags)

naive-ui · tusen-ai/naive-ui

Symptom

Pressing Enter to confirm a kana to kanji conversion in an n-dynamic-tags input creates a tag from the in-progress text instead of just finishing the conversion.

Minimal repro
1. Render <n-dynamic-tags> and focus its input.
2. Switch to a Japanese IME, type "とうきょう", press Space to convert to 東京.
3. Press Enter to pick the candidate.
4. A tag is added from the unconfirmed text before the conversion commits.
Fix

Skip tag creation while e.isComposing is true; only act on the Enter that fires after compositionend.

Merged PR → #naive-ui-dynamic-tags-ime
IME composition React open

React chat input sends the message on the Enter that confirms an IME conversion

llm-x · mrdjohnson/llm-x

Symptom

Confirming a Japanese IME candidate with Enter in the chat box sends the message instead of committing the conversion.

Minimal repro
1. Open the chat input.
2. With a CJK IME, type a phrase and press Space to get conversion candidates.
3. Press Enter to choose a candidate.
4. The message sends with the unconfirmed text.
Fix

Return early when e.nativeEvent.isComposing is true. React's synthetic KeyboardEvent has no isComposing field, so read it off nativeEvent (the DOM event), not e.isComposing.

Fix PR → #llm-x-chat-enter-ime
IME composition Svelte open

Svelte command palette runs a command on the IME confirm Enter

surf · deta/surf

Symptom

Typing a CJK query in the command palette and pressing Enter to confirm the IME runs the highlighted command instead of finishing the conversion.

Minimal repro
1. Open the command palette.
2. Compose a Japanese search term with the IME.
3. Press Enter to commit the conversion.
4. The first/highlighted command fires prematurely.
Fix

Add `if (e.isComposing) return` at the top of the palette input keydown handler. Svelte exposes the native KeyboardEvent, so e.isComposing works directly.

Fix PR → #surf-command-palette-ime
IME composition ReactSafari open

Safari: Enter confirming a CJK @-mention picks the wrong option (rc-mentions)

rc-mentions · react-component/mentions

Symptom

On Safari/WebKit, composing an @-mention name in CJK and pressing Enter to confirm the IME replaces the text with whatever mention is highlighted.

Minimal repro
1. In Safari, type @ then compose a Japanese name with the IME.
2. Press Enter to confirm the kanji conversion.
3. The composing text is replaced by the highlighted mention option.
Fix

In the ENTER branch, return early on event.nativeEvent.isComposing before preventDefault. WebKit reports the commit keydown as which===Enter + isComposing:true; Chromium reports keyCode 229 and never enters this branch.

Fix PR → #rc-mentions-safari-ime-select
IME composition React open

rc-select: the Enter that confirms IME composition also selects an option

rc-select · react-component/select

Symptom

The Enter used to commit an IME composition also selects the currently highlighted option in a searchable Select.

Minimal repro
1. Open a searchable <Select>.
2. Type a CJK query with the IME and press Enter to confirm the conversion.
3. The active option is selected with the unfinished query.
Fix

Track composition state (compositionstart/compositionend) and skip option selection while composing.

Fix PR → #rc-select-enter-ime-option
IME composition Vue open

Element Plus time-picker reacts to keystrokes during IME composition

element-plus · element-plus/element-plus

Symptom

The time-picker's key handler fires while an IME composition is active, so keystrokes meant for the IME mutate the time value.

Minimal repro
1. Focus the time-picker input.
2. Begin an IME composition.
3. Composition keystrokes are intercepted by the picker's key handler instead of the IME.
Fix

Skip the picker's keydown handling while the composition is active (isComposing guard).

Fix PR → #element-plus-time-picker-ime
IME composition VueNuxt open

Nuxt UI defineShortcuts fire while typing with a Japanese IME

nuxt/ui · nuxt/ui

Symptom

Single-key shortcuts registered with defineShortcuts fire while composing text, so romaji keystrokes trigger app shortcuts mid-composition.

Minimal repro
1. Register a single-key shortcut (e.g. 'g').
2. In an input, compose Japanese text whose romaji includes that key.
3. The shortcut fires during composition.
Fix

Ignore key events when e.isComposing (or keyCode 229) before dispatching shortcuts.

Fix PR → #nuxt-ui-shortcuts-ime
IME composition ReactElectron open

Inline file rename submits during IME composition (Cherry Studio, Electron)

cherry-studio · CherryHQ/cherry-studio

Symptom

Renaming a file inline and pressing Enter to confirm a CJK name submits the rename mid-composition.

Minimal repro
1. Start an inline file rename.
2. Compose a Japanese filename with the IME.
3. Press Enter to confirm the conversion; the rename submits with the unfinished name.
Fix

Skip the rename submit while the IME is composing (isComposing guard).

Fix PR → #cherry-studio-rename-ime
IME composition Angular open

CopilotKit Angular chat submits on the Enter that confirms an IME candidate

CopilotKit · CopilotKit/CopilotKit

Symptom

The Angular chat input submits the message on the Enter that confirms an IME composition.

Minimal repro
1. Type a CJK message in the Angular chat input.
2. Press Enter to confirm the IME conversion; the message sends early.
Fix

Check event.isComposing (Angular passes the native KeyboardEvent) before submitting.

Fix PR → #copilotkit-angular-chat-ime
IME composition open

SiYuan session rename input submits during IME composition (Enter)

siyuan · siyuan-note/siyuan

Symptom

Pressing Enter to confirm CJK composition in the agent session rename input also triggers submit/rename, discarding the composed text or submitting prematurely.

Minimal repro
Use Japanese/Chinese/Korean IME in siyuan's session rename field; press Enter to confirm composition; rename fires before composition is complete.
Fix

Check event.isComposing (and event.keyCode===229 for legacy browsers) before acting on keydown/keyup Enter. Ignore Enter events during composition.

Fix PR → #siyuan-session-rename-ime
IME composition React open

llmchat: ignore the Enter that confirms IME composition in the chat input

llmchat · trendy-design/llmchat

Symptom

CJK users pressing Enter to accept an IME candidate also submits the chat message, sending an empty or partial message.

Minimal repro
Type Japanese in llmchat chat input using IME; press Enter to select a candidate; message sends immediately.
Fix

Guard Enter keydown handler with if (event.isComposing || event.nativeEvent?.isComposing) return.

Fix PR → #llmchat-chat-enter-ime
IME composition React cited closed

React onChange fires during IME composition in controlled inputs

React · facebook/react

Symptom

In a controlled <input>/<textarea>, onChange fires for the intermediate keystrokes of an IME composition, so any onChange-driven search or filter runs on half-finished CJK text.

Minimal repro
Type Chinese/Japanese through an IME into a controlled input that filters on onChange; the handler runs on every uncommitted composition keystroke.
Fix

Track compositionstart/compositionend and suppress onChange handling while a composition is active.

Upstream issue → #react-controlled-ime-onchange
IME composition MAUIWindows cited open

.NET MAUI Entry.Completed fires on the Enter that confirms an IME candidate (Windows)

dotnet/maui · dotnet/maui

Symptom

On Windows, Entry.Completed is raised when the user presses Enter inside the IME conversion candidate window, so completion fires before the CJK conversion is committed.

Minimal repro
Focus a MAUI Entry on Windows, use a Chinese/Japanese IME, press Enter to pick a candidate; Completed fires mid-conversion.
Fix

Do not raise Completed while the IME conversion window is open; fire only once the composition is fully committed.

Upstream issue → #maui-entry-completed-ime
IME composition spec cited open

compositionend vs input event order differs across browsers (isComposing guard)

w3c/uievents · w3c/uievents

Symptom

The spec orders input before compositionend (Chrome and Safari follow it), but Firefox and Edge fire input after compositionend, so isComposing-based guards behave differently per browser.

Minimal repro
Log the input and compositionend events while committing a CJK composition in Chrome versus Firefox; their relative order differs.
Fix

Define a canonical order in the spec; frameworks should not assume input fires before compositionend.

Upstream issue → #uievents-compositionend-input-order
IME composition spec cited open

insertCompositionText / isComposing can't single out the IME commit input event

w3c/input-events · w3c/input-events

Symptom

Every input event during a composition, including the one that commits it on Enter, carries isComposing=true and inputType insertCompositionText, so code cannot detect the commit without also listening for compositionend.

Minimal repro
Type s, i, Space to convert to a kanji, change the candidate, then press Enter to commit; the input events for each step are indistinguishable.
Fix

Assign a distinct inputType (or property) to the input event that commits the composition.

Upstream issue → #input-events-insertcompositiontext
IME composition Zed cited open

Zed Vim jk escape depends on the CJK IME input mode

zed · zed-industries/zed

Symptom

With Vim mode and a CJK IME, the jk insert-mode escape only fires while the IME is composing in Chinese mode; in direct English input it inserts a literal j and k instead of escaping.

Minimal repro
Enable Vim mode, map jk to escape, use a CJK IME such as macOS Pinyin; jk escapes while composing in Chinese mode but not in English mode.
Fix

Detect the jk key sequence consistently regardless of the IME composition state.

Upstream issue → #zed-vim-jk-escape-ime
IME composition ZedWindows cited open

Zed: text shifts vertically while composing with a Chinese IME on Windows

zed · zed-industries/zed

Symptom

While composing Chinese text with an IME on Windows, the line of text jumps vertically as the composition updates, which disrupts reading and editing.

Minimal repro
Open Zed (Vim mode) on Windows 11 and type Chinese characters through an IME; the text shifts vertically during composition.
Fix

Keep the line baseline stable while updating the IME marked-text region during composition.

Upstream issue → #zed-chinese-ime-text-shift
IME composition WarpmacOS cited open

Warp does not render IME preedit (marked) text — blind composition

Warp · warpdotdev/warp

Symptom

Warp does not render IME marked (preedit) text, so dead keys and CJK composition show nothing until committed and the user composes blind.

Minimal repro
In Warp on macOS, press Option+E (or start a CJK composition); no marked text is shown to indicate the in-progress input.
Fix

Render marked text inline via the platform IME API (NSTextInputClient on macOS) so the preedit is visible at the cursor.

Upstream issue → #warp-marked-text-ime

Transliteration tables that drop or reverse kana. Round-trip is the oracle: kana to romaji and back should be stable, and a sibling (hiragana vs katakana) usually already does it right.

Kana / romaji JS open

Katakana ン loses the syllabic-n apostrophe in Hepburn romanization

hepburn · lovell/hepburn

Symptom

Katakana ン before a vowel or Y is romanized without the apostrophe, unlike hiragana ん. シンヨウ becomes SHINYOU (should be SHIN'YOU), so it collides with シニョウ.

Minimal repro
const { fromKana } = require('hepburn')
fromKana('しんよう') // SHIN'YOU
fromKana('シンヨウ') // SHINYOU  <- apostrophe dropped
Fix

Map katakana ン to N' (matching hiragana) and add the ンー long-vowel digraph as N', so N is never a katakana map key and toKatakana still round-trips (PAN to パン).

Fix PR → #hepburn-katakana-n-apostrophe
Kana / romaji Python cited closed

pykakasi fails to romanize half-width katakana with voiced marks

pykakasi · miurahr/pykakasi

Symptom

pykakasi does not romanize half-width katakana correctly, particularly when a half-width voiced or semi-voiced mark (U+FF9E / U+FF9F) follows the base kana.

Minimal repro
Convert half-width katakana such as a half-width ka followed by a half-width dakuten; the output is wrong instead of 'ga'.
Fix

NFKC-normalize half-width katakana and combining voiced marks to their full-width equivalents before romanization.

Upstream issue → #pykakasi-halfwidth-katakana
Kana / romaji Python cited closed

Python unidecode mangles half-width katakana with dakuten/handakuten

unidecode · avian2/unidecode

Symptom

Python unidecode transliterates half-width katakana carrying dakuten/handakuten incorrectly, producing artifacts, while hiragana and full-width katakana romanize correctly.

Minimal repro
unidecode on half-width ba bi bu be bo returns a wrong string instead of 'babibubebo'.
Fix

Pre-compose half-width katakana plus combining voiced marks (NFKC) before the transliteration lookup.

Upstream issue → #unidecode-halfwidth-dakuten

Full-width to half-width, long-vowel marks, and kana range boundaries. Off-by-one range tables and missing digraphs silently corrupt text.

Width / normalization JS open

Full-width/kana conversion drops the first and last char of each range (moji)

moji · niwaringo/moji

Symptom

Full-width/half-width and kana range conversions skip the boundary characters of each range (! ~ ぁ ゖ ...), so edge code points like ! (U+FF01) are not converted.

Minimal repro
moji('!~').convert('ZE', 'HE').toString()  // boundary chars at range edges are skipped
Fix

Convert the first and last character of each range, not just the interior.

Fix PR → #moji-range-boundaries
Width / normalization Rust open

Meilisearch charabia mis-detects half-width katakana script (Japanese search)

charabia · meilisearch/charabia

Symptom

Meilisearch's charabia tokenizer incorrectly classifies halfwidth katakana (U+FF65-U+FF9F) and some fullwidth forms as wrong scripts, causing them to be processed by the wrong tokenizer and failing Japanese search.

Minimal repro
Index halfwidth katakana text in Meilisearch; search for the same terms; results missing because script detection classifies halfwidth katakana as non-Japanese.
Fix

Extend script detection to recognize U+FF65–U+FF9F (halfwidth katakana) as Japanese script.

Fix PR → #charabia-halfwidth-katakana-script
Width / normalization Rust open

tabled splits combining marks from their base grapheme when wrapping width

tabled · zhiburt/tabled

Symptom

tabled's text-wrapping splits combining marks (diacritics, dakuten, etc.) away from their base grapheme when calculating width for terminal table cells.

Minimal repro
Create a tabled table with text containing combining marks (e.g., 'が' as 'か' + combining voiced mark U+3099); wrapping separates the base and combining mark onto different lines.
Fix

Use a grapheme cluster iterator (Unicode UAX #29) when wrapping, never splitting within a grapheme cluster.

Fix PR → #tabled-combining-mark-wrap
Width / normalization Rust open

Zed block cursor is misaligned over ambiguous-width Unicode characters

zed · zed-industries/zed

Symptom

In Zed editor, the block cursor over Unicode ambiguous-width characters (East Asian Width 'A' category, e.g., some symbols, box-drawing) is misaligned — the cursor glyph is not centered in the cell.

Minimal repro
Open Zed with a CJK font; position cursor on an ambiguous-width character; block cursor appears shifted or misaligned.
Fix

Center the block cursor glyph using the rendered cell width rather than the glyph's intrinsic width for ambiguous-width characters.

Fix PR → #zed-block-cursor-ambiguous-width

Code that walks text by UTF-16 code unit or bare code point instead of by grapheme cluster. Surrogate pairs and non-BMP characters get split, ZWJ emoji and variation selectors are mis-detected, and combining marks or conjunct clusters drift away from their base.

Surrogate & grapheme JS open

cli-table3 splits surrogate pairs (emoji / CJK) when truncating wide text

cli-table3 · cli-table/cli-table3

Symptom

cli-table3 truncates text by byte/code-unit count rather than code-point count, splitting surrogate pairs in emoji or supplementary CJK characters, producing mojibake in terminal table cells.

Minimal repro
Create a cli-table3 table with a column containing emoji (e.g., 🎉) or supplementary CJK characters; set a column width that truncates mid-emoji; output shows garbled characters.
Fix

Use a Unicode-aware splitter (spread operator or Array.from) to iterate code points rather than code units when truncating.

Fix PR → #cli-table3-surrogate-truncate
Surrogate & grapheme TS open

Clerk truncate splits surrogate pairs in emoji / non-BMP characters

clerk · clerk/javascript

Symptom

Clerk UI's truncateWithEndVisible function uses substring/slice on raw code units in its short-width fallback, splitting surrogate pairs in emoji or non-BMP characters.

Minimal repro
A Clerk UI component displaying an email/name containing emoji in the short-width fallback path; the truncated string ends mid-surrogate-pair, showing '?' or garbled chars.
Fix

Use Array.from() or spread to split by code points, or use Intl.Segmenter, before truncating.

Fix PR → #clerk-truncate-surrogate
Surrogate & grapheme JS open

opentype.js does not clamp cmap format 12/13 codes to U+10FFFF

opentype.js · opentypejs/opentype.js

Symptom

opentype.js does not clamp cmap format 12/13 character codes to U+10FFFF; malformed fonts with out-of-range codes cause incorrect glyph lookups for supplementary characters.

Minimal repro
Load a font with a cmap subtable containing entries beyond U+10FFFF; glyph lookup for supplementary characters (emoji, CJK Extension B+) returns wrong glyph.
Fix

Clamp all format 12/13 startCharCode/endCharCode values to 0x10FFFF during parsing.

Fix PR → #opentype-cmap-clamp
Surrogate & grapheme JS closed

markdown-it smart quotes break around non-BMP (U+10000+) characters

markdown-it · markdown-it/markdown-it

Symptom

markdown-it's smart quotes replacement does not handle non-BMP punctuation and symbols (U+10000+); surrounding text with supplementary characters causes wrong quote pairing or no conversion.

Minimal repro
markdown-it smartquotes on text adjacent to emoji or supplementary Unicode symbols (e.g., '𝕳ello'); smart quote pairing is incorrect.
Fix

Use a regex that is Unicode-aware for the 'whitespace' and 'punctuation' character class checks, or use Array.from for code-point iteration.

Closed PR → #markdown-it-nonbmp-smartquotes
Surrogate & grapheme JS open

Slate splits Indic conjunct clusters (UAX #29 GB9c) across graphemes

slate · ianstormtaylor/slate

Symptom

Slate rich-text editor does not implement Unicode UAX #29 GB9c rule, splitting Indic conjunct clusters (consonant + virama + consonant sequences) across grapheme boundaries, causing incorrect cursor positioning and deletion in Hindi, Bengali, Tamil, etc.

Minimal repro
Type a conjunct consonant in Hindi (e.g., 'क्ष') in Slate; press Backspace; only one codepoint is deleted instead of the full cluster.
Fix

Apply Unicode GB9c rule: treat <Indic_Conjunct_Break=Linker> sequences as a single grapheme cluster.

Fix PR → #slate-indic-conjunct-grapheme
Surrogate & grapheme merged

Combining marks/ZWJ wrongly treated as punctuation in emphasis (wenmode)

wenmode · lepture/wenmode

Symptom

wenmode's is_punctuation function treats Unicode combining marks (Mn category) and format characters (Cf/ZWJ) as punctuation per the CommonMark spec, incorrectly suppressing valid emphasis around CJK text with diacritics or ZWJ sequences.

Minimal repro
Markdown text using emphasis (* or _) adjacent to a combining mark or ZWJ character; emphasis fails to render because combining marks are classified as punctuation.
Fix

Exclude Unicode General Categories Mn, Mc, Cf from the punctuation classification; add ASCII fast-path to avoid performance regression.

Merged PR → #wenmode-emphasis-flanking-marks
Surrogate & grapheme JS cited closed

grapheme-splitter breaks ZWJ emoji (flags, skin tones) into pieces

grapheme-splitter · orling/grapheme-splitter

Symptom

grapheme-splitter breaks ZWJ-joined emoji into parts instead of one grapheme cluster: the rainbow flag splits into its component glyphs, and skin-tone sequences come apart.

Minimal repro
new GraphemeSplitter().splitGraphemes('🏳️‍🌈') returns two elements instead of one.
Fix

Implement the Unicode emoji ZWJ sequence rules (UTS #51) so a ZWJ-joined emoji stays a single cluster.

Upstream issue → #grapheme-splitter-zwj-emoji
Surrogate & grapheme JS cited closed

lodash _.toArray splits a tag-sequence flag emoji into code points

lodash · lodash/lodash

Symptom

_.toArray splits an emoji built from a tag sequence (a subdivision flag) into its component code points instead of returning it as one element.

Minimal repro
_.toArray for the England flag emoji (a base flag plus tag characters) returns seven pieces instead of one.
Fix

Use a Unicode-aware iterator such as Intl.Segmenter that handles tag sequences when converting a string to an array.

Upstream issue → #lodash-toarray-tag-sequence
Surrogate & grapheme Windows Terminal cited closed

Windows Terminal ignores VS15 (U+FE0E) and forces the emoji style

microsoft/terminal · microsoft/terminal

Symptom

Windows Terminal ignores the text-presentation variation selector U+FE0E, rendering the color-emoji form even when text style is explicitly requested.

Minimal repro
Print a text-style sequence such as U+23CF followed by U+FE0E; it renders as a color emoji instead of the text glyph.
Fix

Honor VS-15 (U+FE0E) for text presentation and VS-16 (U+FE0F) for emoji presentation, per the Unicode emoji variation sequences.

Upstream issue → #windows-terminal-vs15
Surrogate & grapheme JS cited open

emoji-regex matches a text-presentation char followed by U+FE0E (VS15)

emoji-regex · mathiasbynens/emoji-regex

Symptom

emoji-regex matches a base character even when it is followed by U+FE0E (the text variation selector), so text-presentation characters are wrongly classified as emoji.

Minimal repro
emojiRegex().test('\u2757\uFE0E') returns true even though U+FE0E requests text presentation.
Fix

Exclude a match followed by VS-15 (U+FE0E); treat only a trailing VS-16 (U+FE0F) or no selector as emoji.

Upstream issue → #emoji-regex-text-vs15
Surrogate & grapheme TS open

kaplay styled-text styles desync after emoji / astral CJK (grapheme vs UTF-16)

kaplay · kaplayjs/kaplay

Symptom

compileStyledText builds charStyleMap keyed by UTF-16 code-unit length, but formatText later applies the styles by grapheme index (via runes()). The two indexings match for ASCII, but drift apart after any character longer than one code unit: an emoji, a ZWJ sequence, or an astral-plane CJK ideograph (CJK Extension B, e.g. names written with 𠮷). Every style after such a character lands on the wrong grapheme or is dropped.

Minimal repro
In styled text, "😀[c]x[/c]" keys the colour style at code unit 2, but runes("😀x") puts x at grapheme index 1, so the style is lost.
Fix

Make compileStyledText walk grapheme clusters with the same runes() helper formatText already uses, keying charStyleMap by grapheme index. Normalize the input to NFC up front and consume a whole grapheme per escape so the slice lengths stay consistent.

Fix PR → #kaplay-styled-text-grapheme

Word counts and reading-time estimates that split on spaces. CJK scripts put no spaces between words, so an entire Japanese or Chinese paragraph counts as one word: content gates reject valid answers and "min read" labels read as 1. The fix counts CJK characters separately from space-delimited words.

Segmentation / word count Python open

split() word count treats a spaceless CJK answer as one word (omi)

omi · BasedHardware/omi

Symptom

Onboarding decides whether a spoken answer has enough content with len(transcript.split()) >= 2. str.split() returns 1 for CJK text that has no spaces, so a full answer like 東京に住んでいます is counted as a single word, never reaches the LLM check, and the question stays marked unanswered for Japanese, Chinese, and Korean speakers.

Minimal repro
len('東京に住んでいます'.split())  # 1, so word_count >= 2 is False and the answer is rejected
Fix

Use the existing CJK-aware _word_count helper (already used by should_discard_conversation) instead of plain split(); it falls back to split() for non-CJK text, so English input is unchanged.

Fix PR → #omi-onboarding-cjk-wordcount
Segmentation / word count TipTap open

TipTap word count treats a whole CJK paragraph as 1 word / 1 min read

emdash · emdash-cms/emdash

Symptom

The editor footer's word count and reading time come from TipTap's CharacterCount, whose default wordCounter splits on spaces. CJK scripts have no spaces between words, so a long Japanese or Chinese draft shows "1 word" and "1 min read" even though the published page, which already counts CJK separately, reports the correct time.

Minimal repro
Type a CJK paragraph into the editor; CharacterCount's default text.split(' ') returns 1, so the footer reports 1 word and 1 min read.
Fix

Configure CharacterCount with a wordCounter that counts CJK characters individually, and derive reading time from the text using the same word/CJK split and rates (200 words/min, 500 CJK chars/min) as the published reading-time util, so the editor and the rendered page agree.

Fix PR → #emdash-readingtime-cjk-wordcount

Kanji numerals, including the daiji (traditional) forms used in legal and financial documents.

Numerals JS open

formatjs relativetimeformat ignores numberingSystem (always Latin digits)

formatjs · formatjs/formatjs

Symptom

formatjs intl-relativetimeformat ignores the numberingSystem locale option (e.g., 'jpan', 'arab'), always producing Latin numerals in relative time strings.

Minimal repro
new Intl.RelativeTimeFormat('ja-JP-u-nu-jpan', {}).format(-1, 'day') returns '1日前' with Latin '1' instead of Japanese numeral '一日前'.
Fix

Apply the numberingSystem extension from the locale tag when formatting relative time numbers.

Fix PR → #formatjs-numbering-system

Missing or wrong locale data: untranslated placeholders, mistranslations that flip meaning, labels left in English, and parse tables that drop a locale's diacritics so a formatted month will not parse back.

Locale data React merged

Wrong Japanese expand/collapse labels in Ant Design Typography

ant-design · ant-design/ant-design

Symptom

The ja-JP labels for Typography's expand/collapse control were incorrect, so Japanese users saw the wrong 展開/折りたたみ text.

Minimal repro
1. Set ConfigProvider locale to ja_JP.
2. Render <Typography.Paragraph ellipsis={{ expandable: true }}>.
3. The expand/collapse label is wrong.
Fix

Correct the ja-JP expand/collapse label strings.

Merged PR → #ant-design-typography-ja-labels
Locale data JS merged

timeago.js Japanese future times say 以内 (within) instead of 後 (later)

timeago.js · hustcc/timeago.js

Symptom

Future timestamps in the ja locale used 以内 (within) instead of 後 (later), so '3 minutes from now' rendered as 3分以内 (within 3 minutes), which means the opposite.

Minimal repro
timeago a future timestamp with the ja locale; it renders '3分以内' instead of '3分後'.
Fix

Use 後 for future time strings in the ja locale.

Merged PR → #timeago-ja-future-go
Locale data i18n merged

PrimeLocale Japanese: filterConstraint mistranslated 成約 → 制約

primelocale · primefaces/primelocale

Symptom

PrimeLocale Japanese (ja) locale uses 成約 (meaning 'conclusion of a contract') for 'filterConstraint' aria label, which should be 制約 ('constraint/restriction').

Minimal repro
PrimeFaces component with aria-label for filter constraint in Japanese reads '成約' (contract) instead of '制約' (constraint).
Fix

Change filterConstraint value from '成約' to '制約' in the ja.json locale file.

Merged PR → #primelocale-ja-filterconstraint
Locale data Vue open

Vant Japanese (ja-JP) locale translation errors

vant · youzan/vant

Symptom

Vant component library Japanese (ja-JP) locale contains incorrect translations for multiple UI strings.

Minimal repro
Vant components with ja-JP locale display mistranslated strings for picker, datetime, and other components.
Fix

Correct multiple mistranslated strings in vant/src/locale/lang/ja-JP.ts.

Fix PR → #vant-ja-jp-locale
Locale data Angular open

NG-ZORRO Japanese (ja_JP) missing quarter placeholders break the date picker

ng-zorro-antd · NG-ZORRO/ng-zorro-antd

Symptom

NG-ZORRO Angular Japanese (ja_JP) locale is missing quarter placeholder strings, causing runtime errors or blank UI for date range pickers.

Minimal repro
NG-ZORRO DatePicker quarter mode with ja_JP locale throws error or shows blank quarter placeholders.
Fix

Add missing quarter i18n keys to ja_JP locale file.

Fix PR → #ng-zorro-ja-quarter-placeholders
Locale data JS open

FilePond Japanese label: 読込中 should be アップロード中 (uploading)

filepond · pqina/filepond

Symptom

FilePond Japanese locale uses '読込中' (loading/reading) for the file processing label, which should be 'アップロード中' (uploading) to accurately describe the action.

Minimal repro
FilePond with Japanese locale shows '読込中' during file upload, misleading users into thinking a read is occurring.
Fix

Change labelFileProcessing from '読込中' to 'アップロード中' in the Japanese locale.

Fix PR → #filepond-ja-file-processing-label
Locale data JS open

Uppy Japanese folderAdded smart_count plural placeholder is broken

uppy · transloadit/uppy

Symptom

Uppy's Japanese (ja_JP) locale has a broken smart_count plural placeholder in the 'folderAdded' string, causing pluralization to fail and show a raw placeholder.

Minimal repro
Uppy file picker in Japanese displays 'folderAdded' with a visible placeholder token instead of correct plural text.
Fix

Fix the smart_count placeholder syntax in ja_JP's folderAdded locale string.

Fix PR → #uppy-ja-smartcount-folderadded
Locale data JS open

Video.js is missing the Japanese label for Picture-in-Picture

video.js · videojs/video.js

Symptom

Video.js Japanese (ja) locale is missing the translation for the 'Playing in Picture-in-Picture' accessibility string, falling back to English.

Minimal repro
Video.js in Japanese locale; PiP mode accessible label shows 'Playing in Picture-in-Picture' in English.
Fix

Add 'ピクチャーインピクチャーで再生中' or equivalent to the ja.json locale.

Fix PR → #videojs-ja-pip-label
Locale data JS open

jp-prefectures.js: Aichi (愛知県) English name wrongly set to 'ehime'

jp-prefectures.js · hatsu38/jp-prefectures.js

Symptom

jp-prefectures.js has the English name of Aichi prefecture (愛知県) set to 'ehime' (which is Ehime prefecture / 愛媛県), causing incorrect prefecture mapping.

Minimal repro
jpPrefectures.findByCode(23).enName returns 'ehime' instead of 'aichi'.
Fix

Change enName for 愛知県 (code 23) from 'ehime' to 'aichi'.

Fix PR → #jp-prefectures-aichi-enname
Locale data JS open

date-fns Galician formats June as xuño but cannot parse it back

date-fns · date-fns/date-fns

Symptom

In the gl (Galician) locale the June parse pattern is /^xun/i. It matches the abbreviation "xun" but not the wide form "xuño", because the third character is ñ, not n. So format then parse round-trips fail for June; the locale's own snapshot already records Invalid Date for June while the other eleven months parse.

Minimal repro
const s = format(new Date(2021, 5, 1), 'MMMM', { locale: gl }); // 'xuño'
parse(s, 'MMMM', new Date(), { locale: gl });               // Invalid Date
Fix

Widen the June pattern to /^xu[nñ]/i so it matches both "xun" and "xuño" while staying distinct from July ("xul"), mirroring locales such as Catalan that already fold diacritics into their patterns (e.g. /^març/i).

Fix PR → #date-fns-gl-xuno-month-parse
Locale data React open

MUI Pagination aria-labels fall back to English in the zh-CN locale

material-ui · mui/material-ui

Symptom

The zh-CN locale is missing the MuiPagination block, so the pagination aria-label and getItemAriaLabel text fall back to English even though every other component in the file is localized. Among all locales only the three Chinese ones omit it; ja-JP and ko-KR already localize this block.

Minimal repro
Render a MUI <Pagination> under the zhCN locale and inspect it with a screen reader; the navigation aria-label and per-page item labels are announced in English.
Fix

Add the MuiPagination block to zhCN, reusing the noun phrases already in this file's MuiTablePagination.getItemAriaLabel (第一页 / 最后一页 / 下一页 / 上一页) plus 转到 ("Go to") to match the English source.

Fix PR → #material-ui-zhcn-pagination
Locale data JS open

Select2 Japanese locale is missing the removeItem / search ARIA labels

select2 · select2/select2

Symptom

The Japanese (ja) locale is missing the removeItem and search keys, so Japanese users get the English fallback for two ARIA labels: the per-item remove button (used in selection/multiple.js) and the search field (used in selection/search.js and dropdown/search.js). Both are part of the canonical set in en.js.

Minimal repro
Use a multi-select Select2 under the ja locale with a screen reader; the remove-item button and the search field announce their English fallback labels.
Fix

Add removeItem (アイテムを削除) and search (検索) to the ja locale, following the existing removeAllItems wording (すべてのアイテムを削除).

Fix PR → #select2-ja-removeitem-search
Locale data Vue open

naive-ui weekPlaceholder untranslated in Korean and Traditional Chinese (Select Week)

naive-ui · tusen-ai/naive-ui

Symptom

weekPlaceholder was left as the English fallback "Select Week" in the Korean (koKR) and Traditional Chinese (zhTW) DatePicker locales, so the week-mode date picker showed an English placeholder. zhCN and jaJP were already translated; these were the remaining CJK locales still showing the fallback.

Minimal repro
Use the date picker in week mode under the koKR or zhTW locale; the placeholder renders "Select Week" in English.
Fix

Translate weekPlaceholder to 주 선택 (koKR, following the existing 월 선택 / 년 선택 pattern) and 選擇週 (zhTW, the Traditional form of zhCN's 选择周), propagating the jaJP fix from #8114 to the remaining CJK locales.

Fix PR → #naive-ui-week-placeholder-cjk

Character-class ranges that drift out of sync, so valid non-ASCII letters fall just outside the accepted set. An alpha rule accepts a letter that the matching alphanumeric rule rejects, and common accented characters fail validation even though their unaccented neighbours pass.

Unicode range JS open

validator.js isAlphanumeric el-GR rejects accented Greek that isAlpha accepts

validator.js · validatorjs/validator.js

Symptom

isAlpha('el-GR') uses the Greek range [Α-ώ] (U+0391–U+03CE) but isAlphanumeric('el-GR') still ends at ω ([0-9Α-ω], U+03C9). So isAlphanumeric rejects ό, ύ, ώ and the uppercase Ό/Ύ/Ώ even though isAlpha accepts them, and common words like νερό or πρώτα pass isAlpha but fail isAlphanumeric.

Minimal repro
isAlpha('νερό', 'el-GR')        // true
isAlphanumeric('νερό', 'el-GR') // false  <- ό (U+03CC) sits past the range end ω
Fix

Bump the alphanumeric Greek range to [0-9Α-ώ] so it matches the alpha class.

Fix PR → #validatorjs-el-gr-alphanumeric-range

Parse to AST and back. A character that is special in one position (a leading caret, a hyphen) can be emitted unescaped and change what the pattern matches.

Regex roundtrip JS open

regexp-tree emits a leading ^ unescaped, turning [a^] into [^a]

regexp-tree · DmitrySoshnikov/regexp-tree

Symptom

Optimizing or regenerating a character class can move a literal ^ to the front and emit it unescaped, flipping the meaning: [a^] round-trips to [^a] (a negated class).

Minimal repro
const rt = require('regexp-tree')
rt.optimize('/[a^]/').toString() // '/[^a]/'  <- now matches the negation
Fix

In the generator, escape a leading literal ^ in a non-negative character class so generate(parse(x)) preserves meaning.

Fix PR → #regexp-tree-leading-caret

Generators that interpolate text without escaping it. Special characters (including CJK and non-ASCII identifiers) leak into the output and break it.

Codegen escape TS open

json-schema-to-typescript: enum names with special chars produce invalid TypeScript

json-schema-to-typescript · bcherny/json-schema-to-typescript

Symptom

Enum member names containing special characters (including non-ASCII / CJK) are emitted unescaped, so the generated enum does not compile.

Minimal repro
Generate types from a schema whose enum values contain quotes or special characters; the emitted enum has invalid member identifiers and tsc errors.
Fix

Escape enum member names so special characters produce valid output.

Fix PR → #json-schema-to-typescript-enum-escape
Codegen escape JS open

Markdoc formatter over-escapes a mid-line # (C# becomes C\#)

markdoc · markdoc/markdoc

Symptom

The formatter over-escapes a # in the middle of a line because the heading branch of the escape regex is not anchored to line start: 'C# is a language' becomes 'C\# is a language'.

Minimal repro
format(parse('C# is a language')) // 'C\# is a language' - the '#' is escaped mid-sentence
Fix

Anchor the heading alternative: change #+\s to ^#+\s, mirroring the already-anchored ^* and ^> cases.

Fix PR → #markdoc-mid-line-hash-escape

Byte-order marks and charset edges. One code path strips a leading U+FEFF and a sibling does not, so the first field name or CSV header key arrives with an invisible BOM glued to it and lookups by that key silently miss.

Encoding & BOM DenoTS cited open

Deno @std/csv CsvParseStream leaves a BOM glued to the first header key

@std/csv · denoland/std

Symptom

The synchronous parse() strips a leading UTF-8 byte-order mark (U+FEFF) but CsvParseStream does not. When a CSV begins with a BOM, common output from Excel and other Windows tools, the first field name arrives as "\uFEFFname" instead of "name", silently corrupting header-based lookups.

Minimal repro
const src = ReadableStream.from(['\uFEFFname,age\n', 'Alice,34\n']);
await Array.fromAsync(src.pipeThrough(new CsvParseStream({ skipFirstRow: true })));
// [{ '\uFEFFname': 'Alice', age: '34' }]  <- BOM leaked into the key
Fix

Strip the BOM from the first line read by StreamLineReader, matching what parse() already does via its BYTE_ORDER_MARK constant.

Upstream issue → #deno-std-csv-bom-stream

No entries match. Try a shorter query.