Page MenuHomePhabricator

Normalization at external wikis and non-normalized utf8
Closed, ResolvedPublic

Description

Mediawiki makes a silent UTF-8 normalization into NFC and this can/will make the later identification of the title fail. We can neglect to rewrite into NFC before the call out to the external wiki, but when we unwind the structure we get back we must make sure we test _with_ NFC as the result is silently turned into NFC.


Version: unspecified
Severity: normal
Whiteboard: storypoints: 2
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=27849

Details

Reference
bz40017

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:05 AM
bzimport set Reference to bz40017.
bzimport added a subscriber: Unknown Object (MLST).

Quick-fix, normalize to NFC before we are sending the title to the external wiki. Should work for all languages except for the usual few failures for languages with illegal character sequences.

Make sure the effect of bug 27849 is understood for Wikidata, and that a workaround is in place and taking care of it.

Patch in Gerrit has the status "Merged" - can this report get RESOLVED FIXED?