Page MenuHomePhabricator

Wikitext list after implicit <td> not properly recognized
Closed, ResolvedPublic

Description

I did not add this to closed bug 50012 because while it may be related it's not exactly the same.

This edit to the text of the article added extraneous pipes throughout (http://en.wikipedia.org/w/index.php?title=List_of_male_kickboxers&diff=562103247&oldid=561869162) and also dropped some formatting on the end (|}|}|}|}|}|})

See http://en.wikipedia.org/wiki/Wikipedia:VisualEditor/Feedback#Unwanted_format_changes_throughout_article


Version: unspecified
Severity: normal

Details

Reference
bz50420

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:50 AM
bzimport set Reference to bz50420.

I would consider this a blocker on Monday's deployment. Things not having support (browsers or features) I can deal with, but we cannot have this sort of corruption being introduced.

The problem is that this page had about 13 tables that were unclosed (missing |} wikitext). Parsoid does handle a lot of corner cases, but at this time, unclosed tables (especially when there are lots of them) are harder for Parsoid to recover from. But, we have some ideas for recovering from them in the future. But, till such time, it is easier to fix the wikitext of the page like I did (https://en.wikipedia.org/w/index.php?title=List_of_male_kickboxers&diff=562162447&oldid=562113696)

However, there is definitely a bug there which is about adding "|" after the first table row for which I'll repurpose this bug for fixing in Parsoid.

https://en.wikipedia.org/w/index.php?title=List_of_male_kickboxers&diff=562167553&oldid=562162447 shows an instance where converting an implicit <td> into an explicit <td> on serialization is a semantic error.

Reduced test case:

{|

-

*a

}

This cannot be fixed by selser since on editing the list item, the <td> will go through regular serialization.

We should be able to safely use the autoInserted* flags from data-parsoid in this instance since VE does not have table-editing capability yet.

Change 71314 had a related patch set uploaded by Subramanya Sastry:
(Bug 50420) Tweak selser to reuse td/th/tr wrappers from source

https://gerrit.wikimedia.org/r/71314

I was wrong. This bug was fixed by tweaking selser as in the patch above since VE does not have table-editing capability yet (only table contents, not table structure).

There is an unrelated bug in the parser where leading list wikitext in a td-cell is parsed as plain text rather than as a list item. So, the pipe problem is fixed by the above commit, but there will still be a dirty-diff (the first list item will be wrapped in nowiki on serialization). That will be addressed separately.

Change 71314 merged by jenkins-bot:
(Bug 50420) Tweaked selser to reuse td/th/tr wrappers from source

https://gerrit.wikimedia.org/r/71314

Ok, renamed the bug to reflect the repurposing.

Change 71744 had a related patch set uploaded by Subramanya Sastry:
WIP: (Bug 50420) Recognize sol-content in implicit <td> after a <tr>

https://gerrit.wikimedia.org/r/71744

Change 71744 merged by jenkins-bot:
(Bug 50420) Recognize sol-content in implicit <td> after a <tr>

https://gerrit.wikimedia.org/r/71744