Page MenuHomePhabricator

<span id="characters that should be encoded"> and [[#chara...]] breaks
Closed, ResolvedPublic

Description

Author: avarab

Description:
The HTML spec mandates that id="" be encoded, we follow this in TOC headers but
not in id's specified e.g. with <span>, as a result manually specified links
within a page break if they contain characters that should be encoded.

Testcase:
"""
FORCETOC

multibæt

"""
The inline TOC links will work but not the manually specified backlink

TOC links are encoded as a special case in the parser (see $canonized_headline),
this needs to be put into some general encoding routine in Sanitizer or something.


Version: 1.6.x
Severity: blocker
OS: Linux

Details

Reference
bz4461

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:00 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz4461.
bzimport added a subscriber: Unknown Object (MLST).

Well, if you specify something manually, it's... manual...

avarab wrote:

FIXED in CVS HEAD

This bug has a parsertest called 'Sanitizer: Escaping of spaces, multibyte
characters, colons & other stuff in id=""'