Page MenuHomePhabricator

empty attribute should be discarded
Closed, ResolvedPublic

Description

http://parsoid.wmflabs.org/_rt/de/Selbstbildnis_%28Leonardo_da_Vinci%29

gallery caption

becomes

gallery caption=""


Version: unspecified
Severity: normal

Details

Reference
bz51954

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:08 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz51954.

http://parsoid.wmflabs.org/_rt/de/Iraklis_Thessaloniki

bgcolor=

becomes

bgcolor

(which _might_ mean another edit causes the 'bgcolor'-> 'bgcolor=""' in the next parse)

bgcolor="" should either be left alone or should be deleted, as bgcolor

In a table, bgcolor="" produces no background color but bgcolor produces a dark red background because it produces the html <td bgcolor="bgcolor"> (is this a bug in the default parser?)

See https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=566470353#Table

This causes problems on the live wiki, see https://en.wikipedia.org/w/index.php?title=Kyle_Busch&diff=566076453&oldid=566076367 (the relevant change is the lines before the Line 709 diff block). Accordingly I've upgraded the severity from "trivial" to "normal".

I've reported that parsing error as bug 52330 in the mediawiki parsing component as the example table in my sandbox was generated in the source editor and so had no involvement from parsoid aiui.

I guess that bgcolor="" and bgcolor= should round-trip to bgcolor="" rather than just 'bgcolor'. Stripping the attribute completely does not seem to be a good solution in general.

Yes. As noted at bug 52330 "bgcolor" generates the html "bgcolor="bgcolor" " which renders (at least in Firefox) the same as "bgcolor="#b00000" " rather than the expected "#f9f9f9" that is the default for tables of class "wikitable"

This is what a modern browser (HTML5 parsing spec) does:

document.body.innerHTML = '<div bgcolor>foo</div>';
"<div bgcolor>foo</div>"
document.body.innerHTML
"<div bgcolor="">foo</div>"

We do the same in parsoid as we are also using the HTML5 parsing algorithm.

So I think bug 52330 is really the issue here.

We should already be round-tripping any kind of attribute perfectly in untouched content. The normalization to bgcolor="" should only happen when something nearby was edited. Can you verify using the visual editor?

PS: When trying http://parsoid.wmflabs.org/_rtselser/dewiki/Selbstbildnis_%28Leonardo_da_Vinci%29 I noticed that there is a diff in ref tags which should not be there. This is reported in bug 60120.

So, Parsoid treats HTML and extension attributes slightly differently. See snippets below:

[subbu@earth lib] echo "A<ref name=>a</ref> B<ref name="">b</ref> C<ref name>c</ref> D<ref name='d'>d</ref>" | node parse --wt2wt
A<ref>a</ref> B<ref>b</ref> C<ref>c</ref> D<ref name="d">d</ref>

[subbu@earth lib] echo "<span title=>a</span><span title="">b</span><span title>c</span><span title="d">d</span>" | node parse --wt2wt
<span title="">a</span><span title="">b</span><span title="">c</span><span title="d">d</span>

For extensions, it drops empty attributes (in whatever form they show up), and for HTML tags, it normalizes empty attributes.

Our HTML attribute behavior conforms with what browsers do. Anything to change / fix here for extensions?

Arlolra set Security to None.
Arlolra claimed this task.
Arlolra subscribed.

T54330 is now resolved, which is all I think we want to do here.

Anything to change / fix here for extensions?

In general, extension tags work the same.

λ (master) echo "<source test=>up here</source>" | node parse --wt2wt --prefix enwiki
<source test="">up here</source>

The native cite extension is different. In 626984757a4babfe0bc2c03ff2343e72c89eafeb, @ssastry seemed to think empty attributes should be dropped (the undefineds there). Also, the only attrs that rt are name and group.

λ (master) echo "Ok<ref test='123' group='hi' name=>ok</ref>" | node parse --wt2wt
Ok<ref group="hi">ok</ref>

<references group="hi" />