At the Vietnamese Wikipedia, most pages (and their titles) include words with
precomposed, accented Unicode characters. (See [[Precomposed character]] and
[[Vietnamese alphabet]].) However, users who search for articles at the
Vietnamese Wikipedia often enter queries with the unaccented base characters,
with the expectation that MediaWiki will understand their query. MediaWiki
neither strips combining characters (Bug 1836) nor converts the precomposed
characters in existing pages to their base ASCII characters (i.e., ô→o and ậ→a)
when searching page titles or text, so the search feature consistently returns
disappointing results.
Steps to reproduce:
- Search for "viet nam" or "Viet Nam" (without the quotes) at the Vietnamese
Wikipedia
Expected results:
[[vi:Việt Nam]] is the first result, or at least somewhere in the results.
Actual results:
"Việt Nam" is nowhere to be found.
Version: unspecified
Severity: normal