Page MenuHomePhabricator

Parsoid creates too much table cells...
Closed, ResolvedPublic

Description

At this page, parsoid creates too much table cells:
http://parsoid.wmflabs.org/sr/%D0%93%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B0

See at "Да ли сте знали." or "Изабрана слика"...

I firstly thought this could be the consequence of #50603 or #50589... but now these bugs are fixed and we still have the issue.


Version: unspecified
Severity: normal

Details

Reference
bz53229

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:11 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz53229.

The newline in the auto-inserted <td> is preventing it from getting deleted in the DOM post-pass.

...
<tr data-parsoid='{"startTagSrc":"|-","autoInsertedEnd":true}' about="#mwt2">
<td data-parsoid='{"autoInsertedStart":true,"autoInsertedEnd":true,"dsr":[null,721,0,0]}'>
</td>
...

I suspect this is a regression from a fix of bug 52296. Will take a look if I can find a reduced test case for this behavior -- the output for this page is a bit ugly .. with a lot of empty HTML <b> tags, divs, dls, dts, in nested tables ... anyway, confirmed.

I have created a test page here:
http://sr.wikipedia.org/wiki/%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D0%BD%D0%B8%D0%BA:Kelson/test

The page includes the following test template:
http://sr.wikipedia.org/w/index.php?title=Корисник:Kelson/test/empty_row

Have a look to this diff:
http://sr.wikipedia.org/w/index.php?title=%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D0%BD%D0%B8%D0%BA%3AKelson%2Ftest%2Fempty_row&diff=7952861&oldid=7952852

With the first version (no empty line) you don't have the bug, with the second version (no empty line before <includeonly>) you have the problem.

Change 80485 had a related patch set uploaded by GWicke:
WIP bug 53229: Strip ws-only auto-inserted table cells too

https://gerrit.wikimedia.org/r/80485

Change 80485 merged by jenkins-bot:
Bug 53229: Strip ws-only auto-inserted table cells too

https://gerrit.wikimedia.org/r/80485

With the patch above that page now renders as expected:

http://parsoid.wmflabs.org/sr/%D0%93%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B0

We basically now accept whitespace in tokenizer-inserted table cells that turn out to be unnecessary after template expansion. In this case we accept the newline.

We'll probably deploy this fix to production tomorrow once it has gotten sufficient round-trip testing.

Change 80513 had a related patch set uploaded by Subramanya Sastry:
(Bug 53229): Dont strip sole child if it is an element

https://gerrit.wikimedia.org/r/80513

Change 80513 merged by jenkins-bot:
(Bug 53229): Dont strip sole child if it is an element

https://gerrit.wikimedia.org/r/80513