Page MenuHomePhabricator

Parsoid: (Italian versions of) Template:infobox_person messing up with Template:Sister?
Closed, ResolvedPublic

Description

Since other span-inserting threads are considered as fixed, see below.
All we did was changing the name of a place:

http://it.wikipedia.org/w/index.php?title=Fritz_H%C3%B6ger&diff=60166642&oldid=59786853

http://it.wikipedia.org/w/index.php?title=Johann_Carl_Ludwig_Engel&diff=prev&oldid=60227407

but not http://it.wikipedia.org/w/index.php?title=Abraham_Lincoln&diff=prev&oldid=60228000 .

Thanks.


Version: unspecified
Severity: normal

Details

Reference
bz51678

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:49 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz51678.

This looks like a nasty Parsoid bug that I think was fixed last week, where sometimes some templates would get duplicated when re-used. I'm going to tentatively mark this as fixed, but please re-open if it happens again.

I can reproduce on it.wiki, but pasting the posted HTML into my local Parsoid it looks fine. How up-to-date is the deployment on it.wiki?

Either way, looks like a Parsoid bug.

Thanks Ed. FYI, we have found out (sorry, no diff available, the text was a copyvio) that the very same corruption also appears if you don't edit the Template:Bio at all but just change a word in the page (which was Pierre de Fermat on it.wp, featuring both of the templates).

Thanks Elitre. Will take a look today.

This is a baffling bug. When I take html dumps (after edits) from chrome and serialize it a couple different ways, I dont see the diff at all. But, when I click on 'Review changes' button, the diff shows up. Tried on http://it.wikipedia.org/wiki/Szil%C3%A1rd_Ign%C3%A1c_Bogd%C3%A1nffy

Will investigate more tomorrow.

My testing was a little off late y'day night. But, here is what is going on:

  1. Since Parsoid doesn't yet provide a PHP compatible API (bug 48483 tracks this), on template edits, VE fetches new HTML from the mediawiki API whose output differs from Parsoid's, most notably for categories .. Parsoid emits <link..> tags and php parser leaves no trace of them. So, there is a big chunk of missing output when VE sends Parsoid back this DOM. Normally, this shouldn't be an issue since Parsoid simply hops over template html and should not even run into this hole.
  1. But, Parsoid's DOM-diff has a big where it occasionally descends into transclusion HTML and trips itself up and throws off the dom-diff and inserts a dom-diff deletion marker further downstream in a later template and breaks template continuity which then breaks the serializer.

I'll fix the bug in the dom-diff algorithm (and possibly make the serializer more robust against future dom-diff bugs to ignore deletion markers). But, we should try and implement 48483 sooner than later so we dont run into other issues because of differing HTML issues.

Change 77357 had a related patch set uploaded by Subramanya Sastry:
(Bug 51678) Fixed bug in dom-diff algorithm

https://gerrit.wikimedia.org/r/77357

Change 77357 merged by jenkins-bot:
(Bug 51678) Fixed bug in dom-diff algorithm

https://gerrit.wikimedia.org/r/77357

Will this be added to tomorrow's deployment?

Parsoid is deployed independently, but yes, on next Parsoid update (by tomorrow), this fix will go out.

Now deploy. Please verify and close if fixed.

Ok, now at bug 60897 then. Thanks.