Page MenuHomePhabricator

Roundtripping issue leads to infobox deletion
Closed, ResolvedPublic

Description

Any edit to this rev
https://en.wikipedia.org/w/index.php?title=Raven-Symon%C3%A9&oldid=566906720&veaction=edit

leads to the infobox being removed.


Version: unspecified
Severity: critical

Details

Reference
bz52488

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:49 AM
bzimport set Reference to bz52488.

This is because VE seems to have stripped data-mw from a category <link> element.

In this particular instance, because of fostering of a category link from the infobox table, the <link> happens to be the very first element in the transclusion HTML from the infobox and gets the entire roundripping information of the infobox. However, if I understand correctly, these metadata tags are handled differently in VE which might be contributing to the lost of data-mw on these tags (and subsequently Parsoid never sees the infobox when it comes to serialization).

So, this can happen on any page where the infobox generates a table and the specific HTML output emits a category link outside a table-tag. The specific place where category links are placed relative to table tags does not matter for the PHP parser since category link wikitext effectively disappear from the HTML from the place where they appeared (and hence there is nothing left to foster out of the table). But, in the case of Parsoid, because of roundtripping requirements, the links generate an actual HTML element which (depending on where they occur in the HTML flow) can get fostered out and can (depending on how they interact with the surrounding context) can get roundtripping data-mw information.

So, if VE can preserve this information for now, that will fix this problem.

If there is a problem dealing with link tags and preserving data-mw for whatever reason, we should figure out a strategy of dealing with these tags. Note that it does not matter what HTML category links translate to -- as long as a HTML element is generated, they are subject to fostering out of tables.

I will not outline other possible solutions for now -- we can discuss on IRC if we there is anything needed on the Parsoid end to support these scenarios.

As for "outside a table-tag": I really meant to say, outside a table-content tag, i.e. inside a table but outside <td>,<th>, or <caption> tags. So, a <link> that ends up being a direct child of <table>,<tbody>,<th> tags will get moved out of the table.

(In reply to comment #1)

This is because VE seems to have stripped data-mw from a category <link>
element.

That sounds scary. I'll investigate.

In this particular instance, because of fostering of a category link from the
infobox table, the <link> happens to be the very first element in the
transclusion HTML from the infobox and gets the entire roundripping
information
of the infobox. However, if I understand correctly, these metadata tags are
handled differently in VE which might be contributing to the lost of data-mw
on
these tags (and subsequently Parsoid never sees the infobox when it comes to
serialization).

They're only handled differently if they occur in isolation. But mw:Transclusion takes precedence over other things, as does about-grouping. So if there's a <link> tag that's either about-grouped with other things or has mw:Transclusion set, it won't (shouldn't) be treated as metadata. I'll have to see what's going on here.

So, this can happen on any page where the infobox generates a table and the
specific HTML output emits a category link outside a table-tag. The specific
place where category links are placed relative to table tags does not matter
for the PHP parser since category link wikitext effectively disappear from
the
HTML from the place where they appeared (and hence there is nothing left to
foster out of the table). But, in the case of Parsoid, because of
roundtripping requirements, the links generate an actual HTML element which
(depending on where they occur in the HTML flow) can get fostered out and can
(depending on how they interact with the surrounding context) can get
roundtripping data-mw information.

I don't quite understand what kind of fostering behavior you're talking about here exactly, but I'll investigate the linked article and see.

I checked, and VE isn't dirtying the DOM. It's also not stripping the data-mw attribute from the first <link> tag on the page (I'm curious to see where you saw that behavior).

I think this is a selser bug. If I don't make any edits, the returned DOM is the same as what we received and the wikitext diff is empty. If I add a character to a paragraph, the DOM diff is only that paragraph, but the diff has that plus the removal of the infobox.

Confirmed this is a selser bug, see bug 52638.

(In reply to comment #4)

I checked, and VE isn't dirtying the DOM. It's also not stripping the data-mw
attribute from the first <link> tag on the page (I'm curious to see where you
saw that behavior).

That is strange. I did a minor edit, dumped the html from chrome and after doing a dom-diff, noticed the diff and searched for Infobox to verify and didn't find it. I dont have the files with me anymore to confirm or if it was just the late-night debugging effect. I am going to put it down to the latter for now.

I think this is a selser bug. If I don't make any edits, the returned DOM is
the same as what we received and the wikitext diff is empty. If I add a
character to a paragraph, the DOM diff is only that paragraph, but the diff
has that plus the removal of the infobox.

Thanks. I will take a look at the other bug report you filed and investigate.

Change 78242 had a related patch set uploaded by Subramanya Sastry:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242

Change 78242 merged by jenkins-bot:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242

Will be fixed on next parsoid deploy (monday-wednesday next week?)

This is deployed, but the cache is not yet purged. That should happen tomorrow after the next deploy. Until then the fix only applies to cache misses and pages re-rendered by template or image updates.

Resolving as fixed. Please verify tomorrow after the cache purge.