Page MenuHomePhabricator

Support nested <ref> tags
Closed, ResolvedPublic

Description

It is (sadly) possible and known for users to create nested <ref> tags, using {{#tag:ref|foo<ref>bar</ref>}}.

Real-world example:

https://en.wikipedia.org/wiki/Fomitiporia_ellipsoidea

Wikitext fragment:

{{#tag:ref|These sizes are based on the original [[species description|description]] offered by Cui and Dai in 2008.<ref>Cui and Dai 2008</ref> Dai described the [[type specimen]] as "not huge", and a significantly larger specimen has since been found. Dai said that, before the discovery, he and Cui "did not know the fungus [could] grow so huge".<ref name="BBC">Walker 2011</ref>|group=note}}

Output HTML from PHP parser in <references group="note" /> block:

<li id="cite_note-12"><span class="mw-cite-backlink"><b><a href="#cite_ref-12">^</a></b></span> <span class="reference-text">These sizes are based on the original <a href="/wiki/Species_description" title="Species description">description</a> offered by Cui and Dai in 2008.<sup id="cite_ref-10" class="reference"><a href="#cite_note-10"><span>[</span>10<span>]</span></a></sup> Dai described the <a href="/wiki/Type_specimen" title="Type specimen" class="mw-redirect">type specimen</a> as "not huge", and a significantly larger specimen has since been found. Dai said that, before the discovery, he and Cui "did not know the fungus [could] grow so huge".<sup id="cite_ref-BBC_11-0" class="reference"><a href="#cite_note-BBC-11"><span>[</span>11<span>]</span></a></sup></span></li>

Output HTML from Parsoid for above:

<li about="#cite_note-1" id="cite_note-1"><span rel="mw:referencedBy"><a href="#cite_ref-1-0">↑</a></span> <span>These sizes are based on the original <a rel="mw:WikiLink" href="./Species_description" data-parsoid="{&quot;a&quot;:{&quot;href&quot;:&quot;./Species_description&quot;},&quot;sa&quot;:{&quot;href&quot;:&quot;species description&quot;},&quot;stx&quot;:&quot;piped&quot;,&quot;dsr&quot;:[56,91,22,2]}">description</a> offered by Cui and Dai in 2008.<span about="#mwt161" class="reference" data-mw="{&quot;name&quot;:&quot;ref&quot;,&quot;body&quot;:{&quot;html&quot;:&quot;Cui and Dai 2008&quot;},&quot;attrs&quot;:{}}" id="cite_ref-1-0" rel="dc:references" typeof="mw:Extension/ref" data-parsoid="{&quot;src&quot;:&quot;&lt;ref&gt;Cui and Dai 2008&quot;,&quot;dsr&quot;:[123,144,5,0]}"><a href="#cite_note-1">[1]</a></span></span></li>


Version: unspecified
Severity: normal

Details

Reference
bz49555

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:48 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz49555.

Related URL: https://gerrit.wikimedia.org/r/69354 (Gerrit Change I43bb8b710bd10a9ddbea27818ff8aaf97ddb3fdc)

https://gerrit.wikimedia.org/r/69354 (Gerrit Change I43bb8b710bd10a9ddbea27818ff8aaf97ddb3fdc) | change APPROVED and MERGED [by jenkins-bot]

A 90% solution was merged. This won't support nested {{#tag:ref|..<ref>..</ref>..}} structures within templates, but will support them in the content page itself. That should be the common case.

Supporting these nested refs inside of templates would be extremely complex. Fortunately that use seems to be very rare.

Nested ref tags via sfn/efn templates aren't handled correctly.

{{sfn|Vallance|1991|pp=30–31}}{{efn|Timetables of the time did not differentiate clearly between through and connecting trains.{{sfn|Vallance|1991|p=31}}}}

from page: https://en.wikipedia.org/wiki/User:Edgepedia/VE/GNoSR

[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]

  • Bug 50383 has been marked as a duplicate of this bug. ***

Referring to Gabriel's comment above, as it turns out, the "rare" case is not so rare after all. [[Template:efn]] has a nested {{#tag:ref..}} in the template's body which is why we don't correctly handle it.

Since the PHP parser will not tell us where its <ref> tag came from (via a literal <ref> or a {{#tag:ref}}), we cannot use existing heuristics to support this. Short of instrumenting the php parser or analyzing the template body and correlating parser output with it (both of which are complicated), one solution to this would be to "trust" templates, i.e. if a template is producing a nested ref-tag, "trust" the template author to know what (s)he is doing and simply accept the nested <ref> tag whereas if it showed up at the top-level, we would have rejected it as malformed.

Unless there are objections, I am going to go with this approach for now.

Change 73886 had a related patch set uploaded by Subramanya Sastry:
(Bug 49555): Step #2: Support more nested ref scenarios.

https://gerrit.wikimedia.org/r/73886

Change 73886 merged by jenkins-bot:
Take #2: (Bug 49555): Support all nested ref scenarios

https://gerrit.wikimedia.org/r/73886