Page MenuHomePhabricator

Bad wikitext lines starting with "| " (not in a table context) getting the pipes removed and replaced by <nowiki> </nowiki> on RT
Closed, ResolvedPublic

Description

http://en.wikipedia.org/w/index.php?title=National_Security_Intelligence&diff=next&oldid=558880809 - the corruption was later reproduced in a sandbox by Rybec "by copy-pasting the previous revision to a sandbox and editing just the lead paragraph in the same way in VE. The unwanted changes are displayed in the review window", so aside from nowikis, we need to find out why it deletes some bits of the text.

Also adding http://fr.wikipedia.org/w/index.php?title=Romain_Alessandrini&diff=95638901&oldid=95638848 which as the previous one happened after a template was left open. I think this should happen to prevent | signs to be shown as they are usually markup and not really wanted in an article, but I still can see them in View mode, so I need to understand more about this behavior. Thanks.


Version: unspecified
Severity: normal

Details

Reference
bz52618

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:58 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz52618.

This looks to be a Parsoid "bug", though frankly the behaviour of the system when given such broken input is somewhat undefined.

Minimum test case:

Foo

Bar

The issue here is that we are tokenizing this to a td token, which is then dropped by the treebuilder when that does not end up inside a table. We should be able to detect this on the DOM based on shadow info. When detected, we can re-insert the original pipe so that it is not lost.

That might not yet avoid the <nowiki> insertion, but would at least preserve the content. It is also possible that selser avoids the nowiki.

Change 114897 had a related patch set uploaded by GWicke:
WIP Bug 52618: Rescue stripped tds outside of table context

https://gerrit.wikimedia.org/r/114897

Change 114897 merged by jenkins-bot:
Bug 52618: Rescue stripped tds outside of table context

https://gerrit.wikimedia.org/r/114897

TODO from the commit summary:

  • Avoid <nowiki>fication on round-trip (even with selser)
  • Avoid paragraph splitting by moving the paragraph wrapper to the DOM (major project)

Change 115436 had a related patch set uploaded by GWicke:
Bug 52618: Avoid <nowiki>fication of td/tr/th syntax outside of tables

https://gerrit.wikimedia.org/r/115436

Change 115436 merged by jenkins-bot:
Bug 52618: Avoid <nowiki>fication of td/tr/th syntax outside of tables

https://gerrit.wikimedia.org/r/115436

Change 140015 had a related patch set uploaded by Subramanya Sastry:
(Bug 52618) Suppress <nowiki>s for table WT strings outside tables

https://gerrit.wikimedia.org/r/140015

Change 140015 merged by jenkins-bot:
(Bug 52618) Suppress <nowiki>s for table WT strings outside tables

https://gerrit.wikimedia.org/r/140015