Page MenuHomePhabricator

Improve recognition of broken quoting in HTML attributes
Closed, DuplicatePublic

Description

On this edit
https://pt.wikipedia.org/w/index.php?diff=38540603
I just replaced the "o" by an "a", but Parsoid removed a </div> from other part of the page:
http://parsoid-lb.eqiad.wikimedia.org/_rt/ptwiki/?oldid=38538929


Version: unspecified
Severity: normal

Details

Reference
bz63273

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:57 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz63273.

The more relevant test is http://parsoid-lb.eqiad.wikimedia.org/_rtselser/ptwiki/?oldid=38538929, but that shows the same issue currently. Investigating.

<div style="background: "#ccccff"; color: #000000;" class="NavHead">Cronologia de Mandela (1918–2010) ...

That <div> has bad quoting which cause the <div> to be parsed as plain text and somehow seems to throw off parsing and diffs. I haven't investigated why that causes diffs, but that should explain why the </div> is lost because the opening <div> is now unmatched.

Search for Cronologia de Mandela (1918–2010) on the page in Comment 1 that gwicke pasted.

Wonder if this should be handled similarly to T93769