Segmentation / word count
Python
open
split() word count treats a spaceless CJK answer as one word (omi)
Onboarding answer gate counts a spaceless CJK answer as one word
omi · BasedHardware/omi
Symptom
Onboarding decides whether a spoken answer has enough content with len(transcript.split()) >= 2. str.split() returns 1 for CJK text that has no spaces, so a full answer like 東京に住んでいます is counted as a single word, never reaches the LLM check, and the question stays marked unanswered for Japanese, Chinese, and Korean speakers.
Minimal repro
len('東京に住んでいます'.split()) # 1, so word_count >= 2 is False and the answer is rejected
Fix
Use the existing CJK-aware _word_count helper (already used by should_discard_conversation) instead of plain split(); it falls back to split() for non-CJK text, so English input is unchanged.