Page MenuHomePhabricator

Parsoid list with newlines roundtrip issue: HTML "<ul><li>asd\nsdf</li></ul>" → Wikitext "* asd\nsdf" → HTML "<ul><li>asd</li></ul><p>sdf</p>"
Closed, DuplicatePublic

Description

Take this HTML:

<ul><li>asd
sdf</li></ul>

Parse to wikitext:

* asd
sdf

Parse back to HTML:

<ul><li>asd</li></ul>
<p>sdf</p>

I'm not sure what should happen here, but definitely not this.

It's rather easy to run into this in VisualEditor – take a paragraph with newlines and convert it to a list item. I ran into it making this edit: https://en.wikipedia.org/w/index.php?title=Polish_nationality_law&diff=prev&oldid=618961884 (I manually replaced the newlines with spaces before saving).


Version: unspecified
Severity: normal
See Also: T55568: Automatically switch from wikitext to HTML syntax in serializer when attributes were added to elements

Details

Reference
bz68800

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:32 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz68800.

Yes, this is a known issue. Parsoid currently cannot handle arbitrary HTML and convert it to wikitext in a way that preserves rendering on the html -> wt -> html path. But, we've talked about this issue more generally in the past and will address it including fallback mechanisms where some forms of HTML will have to get serialized as HTML tags rather than native wikitext. I thought we had a tracking or related set of bugs for this but, cannot find it right now.

We should identify any other related breakages that arise from within VE (which doesn't necessarily generate arbitrary HTML) and fix them together in Parsoid. This fix would be simpler than support for generic HTML->wt conversion that is preserved in a html2html transformation.

In cases like this, it would probably be reasonable to just convert newlines to spaces at some point (either in VisualEditor or in Parsoid).

Perhaps VisualEditor would be a better place to implement this from the user's perspective, but Parsoid doing what it does now would still be weird :) – maybe we should just fix this in both places?

This will need a Parsoid fix since other Parsoid users might still give it HTML that won't be preserved in the html -> wt -> html transformation.

VE can choose to fix or not independently.

I'm a bit confused - what was the actual list item content given in the Description of this task suppose to be -- 2 words or 1 word?

* asd sdf

OR

* asdsdf

If the desired output was the first one (2 words), then I can get that (no VE) by inserting then previewing the following:

<ul>
<li>asd 
sdf</li>
</ul>