The valid chars in titles are configurable in MediaWiki via the $wgLegalTitleChars global. We should expose this through the API (general section I presume), and use it in our tokenizer to recognize the relevant chars.
Version: unspecified
Severity: normal