Page MenuHomePhabricator

* <li> hack renders incorrectly in Parsoid
Closed, ResolvedPublic

Description

The following wikitext:

  • Foo
  • <li class="bar"> Baz

Should render to <li> Foo</li><li class="bar"> Baz</li>. Currently, Parsoid renders this as <li>Foo</li><li> </li><li class="bar"> Baz</li>.

To clarify: the li tag in the list item isn't an /additional/ tag, it /overrides/ what the * would generate. This is totally evil, but there are structures on enwiki that rely heavily on this. For instance, the tree structure on [[Line of succession to the British throne]] is a nested ul/li structure, with class="lastline" applied to the last <li> in each list to make the vertical line not continue downwards after the last item in that list. This is abstracted into [[Template:Tree list/final branch]]. The actual wikitext of the line of succession looks like this:

{{Tree list}}
*[[Image:Simple silver crown.svg|15px]] ''HM [[George V]] (1865–1936)''
[[Image:Simple silver crown.svg|15px]] ''HM [[Edward VIII]](1894–1972)'' {{sup|XA}}
[[Image:Simple silver crown.svg|15px]] ''HM [[George VI]] (1895–1952)''
***[[Image:Simple gold crown.svg|15px]] '''HM [[Elizabeth II]]''' (born 1926) ''Reigning monarch''

  • '''(1)''' HRH [[Charles, Prince of Wales|The Prince of Wales]] (The Prince Charles; b. 1948) {{sup|B D W}}
    • '''(2)''' HRH [[Prince William, Duke of Cambridge|The Duke of Cambridge]] (Prince William; b. 1982) {{sup|B D W}}
    • {{Tree list/final branch}}'''(3)''' HRH [[Prince Harry of Wales|Prince Henry 'Harry' of Wales]] (b. 1984) {{sup|B D W}}

****'''(4)''' HRH [[Prince Andrew, Duke of York|The Duke of York]] (The Prince Andrew; b. 1960) {{sup|B D W}}

  • '''(5)''' HRH [[Princess Beatrice of York]] (b. 1988) {{sup|B D W}}
  • {{Tree list/final branch}}'''(6)''' HRH [[Princess Eugenie of York]] (b. 1990) {{sup|B D W}}

Version: unspecified
Severity: normal

Details

Reference
bz41289

Related Objects

StatusSubtypeAssignedTask
Resolved GWicke
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:08 AM
bzimport set Reference to bz41289.

This round-trips fine and is protected from editing as template-affected, so is low priority before December.

This could possibly be handled in the ListHandler by inspecting the first token in the content between quote tokens. If this is a TagTk with .dataAttribs.stx === 'html', copy that token's attributes (and the stx flag) and drop the html token. For round-tripping, the quote token source would need to be remembered too.

An alternative implementation strategy could be DOM-based, looking for empty list items with auto-closed end tag flag followed by a html-syntax list item. The round-trip issues are the same, but fixing this up on the DOM might avoid unwanted interaction with the quote balancing algorithm.

Overall this bug is not easy. We use the keyword only to mark bugs that are pretty self-contained.

(In reply to comment #2)

Overall this bug is not easy. We use the keyword only to mark bugs that are
pretty self-contained.

According to https://bugzilla.wikimedia.org/show_activity.cgi?id=41289, you were the one to add the "easy" keyword. Are you saying it should be removed?

If you could add a 'self-contained' keyword to bugzilla I'd use that.

(In reply to comment #4)

If you could add a 'self-contained' keyword to bugzilla I'd use that.

Feel free to file a separate ticket in Bugzilla against Bugzilla and elaborate how this would be useful (and disjunct from easy). Right now I don't see enough critical mass for such a keyword, but I'm always happy to be proven wrong. :)

For round-tripping, the HTML-syntax li element can get a data.parsoid.prefixSrc member that contains the source of the preceding list item based on that item's dsr value (dom source range). See the comment in DOMPostProcessor about the dsr fields. In the DOMPostProcessor handler, the list item should also have the dsr updated to include the wikitext '* ' prefix.

This then needs to be handled in the WikitextSerializer. For HTML-syntax tags there is a generic _serializeHTMLTag handler that is handling also HTML-syntax list items. Somewhere in there the extra prefix source needs to be emitted based on data-parsoid. We should also make sure that the wikitext list item is always emitted in newline context, and strip it if that newline context is no longer there.

  • Bug 49296 has been marked as a duplicate of this bug. ***

https://gerrit.wikimedia.org/r/68719 (Gerrit Change Id1821038d3ad930f42630e60d793835a8316631e) | change APPROVED and MERGED [by jenkins-bot]

There is still an unrelated problem where Parsoid currently breaks lists when list items are separated by comments.

So "*a\n<!--foo-->*b" generates 2 lists in Parsoid and one in PHP parser. Something to be fixed independently. Will create a new bug report for that.

[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]