Page MenuHomePhabricator

Parsoid: Content leaks out of {{nihongo}} template (link inside link)
Closed, DeclinedPublic

Description

At http://parsoid-lb.eqiad.wikimedia.org/enwiki/Japan_at_the_2012_Summer_Olympics?oldid=600210359 I observed content leaking out of a template.

The wikitext for the reference in question is:

<ref name=JAAF20120611>[http://www.jaaf.or.jp/fan/news/2012/20120611.html {{Nihongo||ロンドンオリンピック トラック・フィールド種目・競歩種目の日本代表選手|The Japanese national team of Track & Field and Race Walk at the 2012 Olympic Games}}] {{ja icon}}. Japan Association of Athletics Federations (2012-06-11). Retrieved 19 June 2012.</ref>

The resulting Parsoid HTML (cleaned up and indented for readability) is:

<li about="#cite_note-JAAF20120611-7">

<span rel="mw:referencedBy" data-parsoid="{}">...</span>
<a rel="mw:ExtLink" href="...">
  <i about="#mwt199" typeof="mw:Transclusion" data-mw="...">
    The Japanese national team of Track &amp; Field and Race Walk at the 2012 Olympic Games
  </i>
  <span style="font-weight: normal" about="#mwt199">
    <span typeof="mw:Entity"> </span>
    (
    <span class="t_nihongo_kanji" lang="ja">
      ロンドンオリンピック トラック・フィールド種目・競歩種目の日本代表選手
    </span>
    <sup class="t_nihongo_help noprint"></sup>
  </span>
</a>
<a rel="mw:WikiLink" href="..." data-parsoid="misnested">
  <span class="t_nihongo_icon" style="..." data-parsoid="misnested">
    ?
  </span>
</a>
)
<link rel="mw:PageProp/Category" href="..." data-parsoid="misnested">
<span class="languageicon" style="..." about="#mwt137" typeof="mw:Transclusion" data-mw="...">
  (Japanese)
</span>
<link rel="mw:PageProp/Category" href="..." about="#mwt137">
. Japan Association of Athletics Federations (2012-06-11). Retrieved 19 June 2012.

</li>

Note that all of the elements that have data-parsoid=misnested on them were generated by the template, but were not part of the template's about group or marked in any other way as having been template-generated. This means that if the user edits the reference (or a bug in VE causes a whitespace edit, see bug 69861), you'll get a dirty diff that expands the second half of the template, like https://en.wikipedia.org/w/index.php?curid=31216768&diff=621842944&oldid=600210359 .

The core of the problem seems to be that {{nihongo}} was used inside of a link, and the template outputs a link itself (a linked, superscripted question mark linking to a help page), so Parsoid was asked to put a link inside of a link and tried to clean that up. Unfortunately it looks like this misnesting cleanup loses template information.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=69861

Details

Reference
bz69876

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:33 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz69876.

This following simplified example reproduced this issue:

[subbu@earth lib] echo '[http://example.com {{echo|legal stuff [[Help:help|help]]}}]' | node parse | node parse --html2wt
[http://example.com {{echo|legal stuff [[Help:help|help]]}}][[Help:help|help]]

We should try to expand the scope of template affected content to include the outer ext-link which will prevent this kind of corruption on serialization.

We now have the links-inside-links linter category to track these kind of broken wikitext. There is no reason to add complexity to Parsoid.