Page MenuHomePhabricator

Non-printable characters (e.g. Unicode control characters) in wikitext page source
Open, LowPublic

Description

Author: wolfram.schmied

Description:
See

http://en.wikisource.org/w/index.php?title=Index_talk:1965_FBI_monograph_on_Nation_of_Islam.djvu&oldid=1678252#Non-printables_in_OCR_output

I don't know how much of a security concern that is, but I think DELs and somesuch do not belong in source code, and should be stripped automatically.


Version: unspecified
Severity: normal

Details

Reference
bz21767

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:48 PM
bzimport set Reference to bz21767.
bzimport added a subscriber: Unknown Object (MLST).

Yeah, it seems like invisible characters are allowed in wikitext for no good reason. E.g. here[1] I'm cleaning up a left-over LTR character, since my script gave weird results because of it.

[1] https://sv.wiktionary.org/w/index.php?title=papperstidning&diff=1496883&oldid=1035418

See also Bug 3696.