Page MenuHomePhabricator

Injecting non-normalized strings in html
Closed, DeclinedPublic

Description

It seems like some non-nomalized (ie non NFC - non "normalized form C") is injected into the html and some validators mark this as a failure. It can be questioned if this is really a failure or just validators being picky, but it should be a decision if we want this to be like this or not.

Note that parsing the text or calling htmlescapespecialchar will not fix this issue.


Version: unspecified
Severity: minor

Details

Reference
bz40056

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 1:08 AM
bzimport set Reference to bz40056.
bzimport added a subscriber: Unknown Object (MLST).

Go to Helium on test (http://wikidata-test-repo.wikimedia.de/wiki/Data:Q2) with Firefox, right click and select all, right click and view selection source, make sure all is selected, right click and copy, go to W3 Validator (http://validator.w3.org/), chose tab validate by direct input, paste content from selected source.

To get the doctype right copy this from the normal source page.

Also mark for verbose output.

There are several errors, some that should be easy to fix, but note errors like

Warning Line 87, Column 2666: Text run is not in Unicode Normalization Form C.
…idhlig","gl":"galego","glk":"گیلکی","gn":"Avañe\'ẽ","got":"

We have bigger fish to fry for a long time to come still.