Page MenuHomePhabricator

Handle Unicode characters outside the Basic Multilingual Plane correctly
Closed, DeclinedPublic

Description

According to the observations in bug 47770 the search treats all characters otside the BMP as pairs of surrogates and removes them as not indexable. Instead the indexer and the frontend should both treat them as normal characters and index and search for them if they are letters according to their Unicode General Category.


Version: unspecified
Severity: minor
Whiteboard: cirrus-fixed

Details

Reference
bz51661

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:48 AM
bzimport set Reference to bz51661.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 51790 has been marked as a duplicate of this bug. ***

Michael: If I wanted to test this, what would be steps and a specific testcase?

Feel free to close as WONTFIX, as I can confirm it is fixed with Cirrus:

https://de.wikipedia.org/w/index.php?search=%F0%AA%9C%80&title=Spezial%3ASuche

shows the error mentioned above with the old backend, but correctly finds Unicodeblock Vereinheitlichte CJK-Ideogramme, Erweiterung C with Cirrus.

Resolving WONTFIX per comment 3 :)