Page MenuHomePhabricator

Implement way to round-trip tables with fosterable content that migrates out of the table
Closed, ResolvedPublic

Description

Simple example:

{|
foo

}

This doesn't RT correctly.

Real world test case: [[All-Ireland_Senior_Camogie_Championship_1992]]
Related report: https://en.wikipedia.org/wiki/Wikipedia:VisualEditor/Feedback#A_simple_table_gets_doubled_after_an_unrelated_edit_of_plain_text

Interestingly enough, we have more robust handling for templated content. The example below is handled properly and RTs correctly.

{|
{{echo|foo}}

}

Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=53402

Details

Reference
bz51217

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:47 AM
bzimport added a project: Parsoid-DOM.
bzimport set Reference to bz51217.

Jotting down an idea:

https://gerrit.wikimedia.org/r/#/c/73369/ is going to add tag-ids to dom elements. Since they are added linearly, it will always be the case that tag-id(node) < tag-id(node.nextSibling) and tag-id(node) < tag-id(node.child) except when a child is fostered out.

So, we could actually use these tag ids to robustly detect fostered content and add some encapsulation typeof and about ids on the content and the table so that VE as well as Parsoid's serializer can deal with it.

(In reply to comment #1)

Jotting down an idea:

https://gerrit.wikimedia.org/r/#/c/73369/ is going to add tag-ids to dom
elements. Since they are added linearly, it will always be the case that
tag-id(node) < tag-id(node.nextSibling) and tag-id(node) < tag-id(node.child)
except when a child is fostered out.

http://www.mediawiki.org/w/index.php?title=Parsoid/Todo&oldid=554715#DOM_tree_builder is making a cameo appearance ;)

After detection we'll have to figure out a way to make this work with selser though. Simply enlarging the start tag width to include the fostered content won't work. Fostered content can also come from several places in the table. The best is probably to prevent selser from serializing any of the fostered content, but serialize all of it when a part of it is edited.

Generally we should not try to do more than avoiding dirty diffs. On edit to the fostered content (or even the table in general) we should take the liberty to fix the wikitext to more closely mirror the way that content actually renders.

Looks like this corruption is because content gets fostered out of the table on
load in VE. One more instance of fosterable content that needs to be handled.

https://en.wikipedia.org/w/index.php?title=List_of_Jessie_episodes&diff=564684769&oldid=564554799

Why not just disable VE on pages like these for now. Edit page, parsoid render, "VisualEditor encountered a structure that it currently cannot handle. You will be redirected to the source editor."

Breaking pages == bad.

On the Parsoid end, detecting that the page cannot be handle is equivalent to fixing it. Same analysis work has to be done. On the VE end, perhaps they could use the 've-needscheck' verification to detect something like this -- that is something that VE developers can respond to best. Adding Roan to this ticket who might be better able to answer that part.

The problem on the VE end is that foster-parenting happens already when the browser parses the received HTML. Doing a simple string comparison with the received HTML won't normally work because of quoting differences.

(In reply to comment #7)

The problem on the VE end is that foster-parenting happens already when the
browser parses the received HTML. Doing a simple string comparison with the
received HTML won't normally work because of quoting differences.

This is right. We have no reasonable way to tell whether the browser's HTML parser introduced changes, because inferring anything about the HTML string requires using the browser's HTML parser which... well you get the idea.

On the Parsoid end, this is easier to detect, because you have the original DOM. You could serialize that DOM to a string, then parse it into a DOM again, and compare that against the original DOM using .isEqualNode() (if that exists in your DOM library; alternatively, use your DOM differ).

So, I investigated the bug with this page (https://en.wikipedia.org/w/index.php?title=List_of_Jessie_episodes&diff=564684769&oldid=564554799) (comment #3 above).

This is quite nasty and nothing to do with foster parenting as I mistakenly/hastily believed. The page has since been edited to remove the problematic wikitext (https://en.wikipedia.org/w/index.php?title=List_of_Jessie_episodes&diff=564803163&oldid=564788048) so the problem will no longer manifest there.

However, here is a snippet from

{{Episode list

EpisodeNumber = 1

...
...

ProdCode = 101<ref name="futon" />
Viewers = ...

...
...

}

Note the input to the 'ProdCode' param. Here is a snippet for the template source of Episode list (https://en.wikipedia.org/w/index.php?title=Template:Episode_list&action=edit)

... <td id="pc{{{ProdCode}}}">{{{ProdCode}}}</td> ...

See what that will do with the parameter passed in from the page. Here is the expanded source that Parsoid sees:

... <td id="pc101<ref name="futon" />">101<ref name="futon" /></td> ...

So, that td-tag is totally broken and Parsoid renders that as literal text (rather than a html td tag) which then promptly gets fostered out (by Parsoid itself) which then adds noise to the HTML output which is what you see when you opened that page in VE (see video above added by AzaToth).

The editor Geraldo Perez has already fixed the page as I indicated earlier, so this is not a problem anymore.

But, I dont think we should even bother trying to handle broken wikitext like this. We do go to great lengths to fixup/handler broken wikitext while still preserving it as far as possible to not introduce dirty diffs, but dirty diffs are sometimes inevitable in situations like this. So for now, I am going to classify this example as unfixable in Parsoid, unless someone has smart ideas.

From IRC:
[14:44] <gwicke> subbu, re https://bugzilla.wikimedia.org/show_bug.cgi?id=51217#c9: we should be able to detect when something was fostered out by our treebuilder, and should also be able to ensure that everything fosterable is dropped or fostered at the end of DOMPostProcessing
[14:45] <gwicke> yeah, I wonder what he is up to
[14:45] <subbu> in that example, what do we do with the information at all .. itis not helpful since the output still looks corrupt.
[14:45] <gwicke> subbu: IMO it is fine to drop such content on edit, but if we can avoid dirty diffs with selser then that would be great
[14:46] <gwicke> we could strip the fostered content from the DOM
[14:46] <subbu> is it the right thing to do in general?
[14:46] <gwicke> and rig selser so that it restores it unless the content is touched
[14:47] <gwicke> or we can keep it in the DOM, but mark it so that selser does not serialize it
[14:47] <gwicke> ie, a zero-length dsr
[14:48] <subbu> yes, the rig-selser bit is missing right now.
[14:48] <gwicke> subbu: detecting foster-parented content and then excluding it from dsr calculations might already be pretty effective
[14:49] <gwicke> I guess the gaps in dsr would be spanned by the dsr algorithm
[14:49] <subbu> yes, that should work.
[14:49] <gwicke> (the holes where content disappeared)
[14:50] <subbu> i was wondering if we should mark blocks of fostered content with a special div/marker so that VE can treat it specially if there is a use-case for it (not sure there is ..)
[14:50] <gwicke> subbu: I'd vote for data-parsoid
[14:50] <subbu> data-parsoid is fine for internal use.
[14:50] <gwicke> if the user edits the fostered content, we serialize it
[14:50] <gwicke> i.e., do the dirty diff then
[14:51] <subbu> but, i mean: if VE can do something special if it knew about it. not sure what that would be .. i doubt this is a high priority thing in any case.
[14:51] <gwicke> when something adjacent to the former location is edited, we dirty-diff it there
[14:51] <gwicke> subbu: IMO we should abstract such issues
[14:52] <gwicke> bot authors would not want to deal with it either
[14:52] <gwicke> and the DOM should reflect the rendered semantics
[14:53] <subbu> dom should reflect rendered semantics for sure.

This looks very similar to bug 50754.

  • Bug 50754 has been marked as a duplicate of this bug. ***

Change 75253 had a related patch set uploaded by Subramanya Sastry:
WIP: (Bug 51217) Detect fostered content and ignore in selser

https://gerrit.wikimedia.org/r/75253

  • Bug 51964 has been marked as a duplicate of this bug. ***
  • Bug 51993 has been marked as a duplicate of this bug. ***

Change 75253 merged by jenkins-bot:
(Bug 51217) Detect fostered content and ignore in selser

https://gerrit.wikimedia.org/r/75253

Currently undergoing rt-testing and will be deployed Monday/Tuesday.

Now deployed in production. I tested dummy test cases and it seems to work. Would be good to keep an eye out for this in production.

Change 78242 had a related patch set uploaded by Subramanya Sastry:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242

Change 78242 merged by jenkins-bot:
(Bug 52638) Fix selser regression introduced by fix for bug 51217

https://gerrit.wikimedia.org/r/78242

Fix deployed, but cache is not yet purged. The purge will happen tomorrow.