Page MenuHomePhabricator

Selser: More robust handling of text-nodes outside p-tags
Closed, ResolvedPublic

Description

See seler bug below when a text node is inserted outside a <p> tag.

[subbu@earth tests] echo "foo\n\nbar" > /tmp/wt
[subbu@earth tests] node parse < /tmp/wt > /tmp/old.html
[subbu@earth tests] cp /tmp/old.html /tmp/new.html
[subbu@earth tests] vi /tmp/new.html
[subbu@earth tests] cat /tmp/new.html
<body data-parsoid='{"dsr":[0,9,0,0]}'><p data-parsoid='{"dsr":[0,3,0,0]}'>foo</p>NEW<p data-parsoid='{"dsr":[5,8,0,0]}'>bar</p>
</body>
[subbu@earth tests] node parse --html2wt --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html
fooNEW
bar
[subbu@earth tests] node parse --html2wt < /tmp/new.html
fooNEW

bar

Details

Reference
bz62498

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:58 AM
bzimport set Reference to bz62498.

There are a number of image-related selser failures due to this bug.

(In reply to ssastry from comment #0)

See seler bug below when a text node is inserted outside a <p> tag.

*selser

[subbu@earth tests] node parse --html2wt --selser --oldtextfile /tmp/t
--oldhtmlfile /tmp/old.html < /tmp/new.html

*/tmp/wt

I don't see how we can generally serialize this kind of HTML so that it round-trips html2html.

Yes, we cannot make this survive html2html.

For now, I am marking this an enhancement for addressing in the context of the class of bugs we have for accepting arbitrary html. Meanwhile, I'll fix up parserTests.js to eliminate these kinds of (simulated) edits.

Arlolra added a project: Parsoid.
Arlolra set Security to None.
ssastry claimed this task.
ssastry updated the task description. (Show Details)

html2wt and selser now agree.

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --html2wt --selser --oldtextfile /tmp/wt --oldhtmlfile /tmp/old.html < /tmp/new.html
foo

NEW

bar

[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --html2wt < /tmp/new.html
foo

NEW

bar

html2html is not a concern here. Bare text nodes will get p-wrapped which is the right thing and the rendering will be preserved. Parsoid does not provide html2html string-level or DOM-tree level guarantees for arbitrary input HTML at this time.