Page MenuHomePhabricator

Nested refs fail inside references block
Open, MediumPublicBUG REPORT

Assigned To
Authored By
Dragons_flight
Sep 18 2009, 1:41 AM
Referenced Files
Restricted File
Jan 29 2024, 5:33 PM
Restricted File
Jan 29 2024, 5:32 PM
F41726670: cite-ext-add-validations.log
Jan 28 2024, 10:44 PM
F41726671: cite-ext-add-validations.patch
Jan 28 2024, 10:44 PM
F41726433: new-Cite-ext-design.patch
Jan 28 2024, 8:36 PM
F41726434: new-Cite-ext-design.log
Jan 28 2024, 8:36 PM
F41649906: log_of_commit_to_parser.txt
Jan 3 2024, 10:12 PM
F41649905: log_of_commit_to_cite_extension.txt
Jan 3 2024, 10:12 PM

Description

By design, Cite does not allow a <ref> to appear within another <ref>. (This is probably a result of parser design and concerns over maintaining the integrity of Cite's internal data stack.)

However, clever Wikipedians realized that one can use #tag to avoid Cite's design limitations and create functional nested references such as:

AAA {{#tag:ref| BBB <ref> CCC </ref> }}

(See also: T18330)

Recently, Cite was expanded to allow reference content to be defined within a references block, i.e.

Foo <ref name="bar"/>

<references>
<ref name="bar">blah blah</ref>
</references>

While most references work fine in this context, the #tag based nested ref constructions universally fail when moved inside <references>. Depending on the number and order of #tag calls used and the depth of the stack the result will either be an uninformative error message or a misnumbered reference list.

The solution to this is probably to create an implementation that generically supports nested refs. (Part of the problem is that Cite was not intended to allow for nested refs at all and though the #tag based hack worked previously because of quirks in parser behavior, the same call structure doesn't work when moved inside the references block.)

The "right" way to do this is probably to restructure Cite's internal data stack and change the way the evaluation of references are hooked into the parser. I will probably implement this myself eventually, though probably not for some months.


Version: unspecified
Severity: major
See Also: T24635: Multiple problems with refs inside <references>...</references> blocks

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:54 PM
bzimport added a project: Cite.
bzimport set Reference to bz20707.
bzimport added a subscriber: Unknown Object (MLST).

Ruslik00 wrote:

An obvious solution to this problem is to use {{#tag:references|...}} construct.

(In reply to comment #1)

An obvious solution to this problem is to use {{#tag:references|...}}
construct.

Apparently one can define a nested ref inside a #tag:references as long as both the portions of the nest are also defined by #tag.

Based on an example by Ruslik_Zero:

AAAA<ref name="note1" group="note" />

notes

{{#tag:references|
{{#tag:ref|foo{{#tag:ref|inside ref}}bar|group="note"|name="note1"}}

group="note"}}

references

<references />

Frankly, I'm surprised that even that works, but it will fail if any of those three #tag calls is replaced by a normal invocation. Hence I would say this is a rather fragile work around.

I suspect this is the same issue that causes problems with #tag:ref in list defined references. A single use works, but more than one results in Cite error: <ref> tag defined in <references> has no name attribute.

Example:

<ref group=Note name=a />
<ref group=Note name=b />
<ref group=Note name=c />

;Notes
{{reflist|group=Note|refs=
{{#tag:ref|Note1<ref>ref1</ref>|name=a|group=Note}}
{{#tag:ref|Note2<ref>ref2</ref>|name=b|group=Note}}
{{#tag:ref|Note3<ref>ref3</ref>|name=c|group=Note}}
}}

;References
{{reflist}}

Nesting #tag:ref does not work either:

{{#tag:ref|Note1{{#tag:ref|ref1}}|name=a|group=Note}}

*** Bug 22300 has been marked as a duplicate of this bug. ***

athenurlauber wrote:

A reflist containing both a plain ref tag and a "#tag:" ref tag with a nested ref tag in it works if the ref-in-a-ref is listed before the plain ref, and doesn't work if the ref-in-a-ref is listed after the plain ref (demo here http://en.wikipedia.org/wiki/User:Dan_Pelleg/Sandbox/Nested_refs):

This works:

text<ref name=n1 group=nested_first/>
text<ref name=n2 group=nested_first/>.

;Reflist, nested ref listed first:{{reflist|group=nested_first|refs=

{{#tag:ref|With nested reference<ref>content
of nested reference</ref>|name=n1|group=nested_first}}

<ref name=n2 group=nested_first>With no nested reference</ref>
}}

;Nested refs:
{{reflist}}

This not:

text<ref name=n3 group=nested_second/>
text<ref name=n4 group=nested_second/>.

;Reflist, nested ref listed second:{{reflist|group=nested_second|refs=

<ref name=n3 group=nested_second>With no nested reference</ref>

{{#tag:ref|With nested reference<ref>content
of nested reference</ref>|name=n4|group=nested_second}}
}}

;Nested refs:
{{reflist}}

Here is the working sample from above without templates.

Placing <ref name=n2> first throws a "Cite error references no text" and a "Cite error references no key".

Using #tag:ref more than once in the list throws the errors.


text<ref name=n1 group=nested_first/>
text<ref name=n2 group=nested_first/>.

'''Notes'''

{{#tag:references|

{{#tag:ref|With nested reference<ref>content
of nested reference</ref>|name=n1}}

<ref name=n2>With no nested reference</ref>

group=nested_first}}

'''References'''

<references />


I consider this the most egregious issue with cite.php. The inability to easily add a reference to an explanatory note using cite.php has caused the creation and retention of multiple templates for this purpose. In use, most of these templates create duplicate ids, thus rendering invalid HTML and bad in-page linking.

The simplest fix from the user standpoint would be to allow <ref> tags to nest, while allowing the #tag:ref hack to still work. Note that many uses of #tag:ref have multiple embedded <refs>s. If we have to add new syntax, then we can deal with it.

There is no expectation of multiple nesting, i.e. stuffing a #tag:ref inside a #tag:ref.

This has been fixed by 944b24542827f827cdfb8646f1c2e87aeb929b65 / https://gerrit.wikimedia.org/r/#/c/181112/. The change will be deployed to all wikis on May 19/20, with MW 1.26wmf6, according to https://www.mediawiki.org/wiki/MediaWiki_1.26/Roadmap. You can test the new behavior on http://en.wikipedia.beta.wmflabs.org/ already.

(I verified the examples given in T22707#258934 and T22707#258959.)

T100477 might be related to this fix (started happening around the same time)

Anomie subscribed.

Reopening because the patch was reverted.

Aklapper edited subscribers, added: Jackmcbarn, Aklapper; removed: wikibugs-l-list.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Aklapper changed the subtype of this task from "Task" to "Bug Report".Feb 6 2022, 7:18 PM

Is this the same bug referred to at Help:Cite errors/Cite error references missing key in section #Issues and resolution, where it says:

Including more than one nested reference in List-defined references will cause a Cite error.

I have a use case and it isn't clear to me if it's governed by T22707 or not. My use case is an explanatory note ( {{efn}} ) defined as a list-defined reference, which has three short footnotes ( {{sfn}} ) embedded within it. The first sfn resolved to a citation listed in the "Works cited" appendix. The other two sfn's did not. The quote above appears to describe the reason why not. The failing revision is rev. 1174945853 of en:Tribunal correctionnel. My workaround was to comment out the embedded sfn's, in my next edit.

If this use case is *not* part of T22707, can someone point me to the phab ticket which governs that case? And if it is part of this ticket, could someone add some verbiage about it to help searchers find it, as I wasn't able to find this ticket in multiple phab search attempts, and finally ended up here via the {{tracked}} links at H:CERMK on en-wiki.

(It might be worth adding some verbiage here in either case, to help out folks like me who maybe ended up here by mistake—sort of a "distinguish"- or "about"-style "hatnote" to get users to the correct ticket; does Phab have anything like that?)

I would like to assign myself to this task. I spent the last two weeks working on this. I found several bugs in the code and a patch for each of them. The result is a code that works with nested <ref> tags without the need of templates or anything else. Some are not really bugs, but more choices in the design that were not compatible with the recursive power of the parser. Yet, I did not need to change a lot the code. For example, the idea of recursive parsing is that, when the extension receives a tag, it does not do much: it just do the part that it needs to do and let the parser (with the help of other calls to extensions) do the rest. The extension just focuses on a small thing and the recursive power of the parser makes the whole thing work. So, making the code truly recursive makes it simpler. Currently, the code does not use the recursive power of the parser and it's more complicated and it does not really work. In particular, in the new code, there is a call to recursiveTagParse() inside guardedRef(). Without that call, it cannot be said to use the recursive power of the parser. The current code does instead two calls to recursiveTagParse() at the end in ReferencesFormatter.php, after the recursion is terminated. It is important to respect the logic of the parser. The parser recursively transforms wikitext into half-parsed html, but this haf-parsed htm actually contains a lot of fully parsed html stored in a stripState. The stripState is conceptually an associative array that matches the full html created by an extension with a marker which is also found in the half-parsed html. I might seem complicated the way I explain it now, but it is simple.: the half-parsed html contains markers and the stripState associates these markers with full html. So, despite the fact that half-parsed html is created by the parser, the job of the extension is to return fully parsed html to be stored in the stripState. When this is understood, it is easy to see that the job of formatListItem is to directly create its own html to wrap the text, which will already be html, because of recursion. There should be no need wrap the text with wikitex and then transform the whole tag in html using calls to recursiveTagParse(), which is what the current code does or rather tries to do. It does not really work anyway, because the href urls toward the cite_ref ids must be computed dynamically during the recursion. After, at the end, the information is lost. The new approach makes use of recursion and considers that the text is html (because of recursion) and only wraps it with its own html. This means that there is no need to call recursiveTagParse() again in FormatReferences() like the current code does. The new code gets rid of these two not really recursive calls at the end and replaces it by a single truly recursive call in guardedRef(). This does not require a lot of change in the code, but it is big change in the perspective: it is much cleaner. There are other aspects of a similar kind to consider. This is just an example.

@dominic.mayers: Welcome and thanks for looking at the code! A bit hard to follow, so feel free to use developer access to submit the proposed code changes (the more atomic, the better) as a Git branch directly into Gerrit which makes it easier to review and provide feedback. If you don't want to set up Git/Gerrit, you can also use the Gerrit Patch Uploader. Thanks again.

I have divided the whole thing in 8 commits. Each one has its separated purpose and the code is running after each commit. However, they are not at all separated logically. The effect is not seen until all of them is applied. In fact, though I checked that the code is still running and seem to be ok in the case of non nested tags, it may be that each of them separately creates problems (seen after careful testing) without providing alone any advantages. They are designed to work together. Therefore, I am not sure what is the best way to present these 8 patches. The big picture might be important to understand each patch. What I described above is for a single patch, but it is kind of central. Yet, looking at the diff of this patch alone would still not do much better than what I tried to explain above. I am afraid that an understanding of the problems must come first and the patches individually will not in themselves provide that understanding. All of them together start to be complicated and require some explanation for the big picture. I would feel much more comfortable, if the expectation of people in terms of what they can understand by looking at the patches themselves is not too high. Concretely, I decided to present the 8 commits into a single series of patch sets. I have a question. Locally, I only tested the code with the top of the master branch and 5 extensions, all running properly: Cite, ParserFunctions, Scribunto, TemplateStyles and VisualEditor. I also added many templates to test {{harv}} and {{refn}}. But, I have not tested the code in a realistic environment. I tried, but did not succeed. In particular, my IDE (Netbeans) is not stable when I have all the extensions. How can I test my code, not necessarily locally, in a more realistic environment?

I have done all my commits locally with short subject, body, Bug # and Change Id (generated by git review). There are seven of them. I would like to push them to a topic branch without squashing as illustrated here , because squashing does not seem appropriate in this particular use case. In particular, I would like to add a cover letter to the topic. There is not enough details in these linked explanations. I would like to be sure I will do that correctly. For information, I have uploaded the log of six commits for the CIte extension. There is another commit for the parser in the core.

I have uploaded the log of six commits for the CIte extension.

Looks good to me. You may be interested in https://www.mediawiki.org/wiki/Gerrit/Advanced_usage#Create_a_dependency if you plan to put the patches for review into Gerrit

Looks good to me. You may be interested in https://www.mediawiki.org/wiki/Gerrit/Advanced_usage#Create_a_dependency if you plan to put the patches for review into Gerrit

Yes there are dependencies and did not realize it. However, the commits in the Cite extension will do a great job even without patching the bug in the parser. The only things is that if an editor creates nested ref tags that the parser cannot support because of this bug, the editor will receive an error message. Currently, the situation is just worst: even simpler nesting of tags is not supported.

Change 987507 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/core@master] parser: Find the closing tag with a balanced inner text

https://gerrit.wikimedia.org/r/987507

Change 987508 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/core@master] Cite ext: Allow arbitrary nested ref tags: a cover letter

https://gerrit.wikimedia.org/r/987508

Change 987509 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/extensions/Cite@master] Cite ext: Use recursiveTagParse in guardedRef, not after the recursion is done

https://gerrit.wikimedia.org/r/987509

Change 987510 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/extensions/Cite@master] Cite ext: Process nested <ref>s in <references> as ordinary <ref>s

https://gerrit.wikimedia.org/r/987510

Change 987511 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/extensions/Cite@master] Cite ext: Only increase $ref['count'] in ReferenceStack when the ref is a link

https://gerrit.wikimedia.org/r/987511

Change 987512 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/extensions/Cite@master] Cite ext: Strip empty tags also when searching missing closing tags

https://gerrit.wikimedia.org/r/987512

Change 987513 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/extensions/Cite@master] Cite ext: Order the notes as they appear in guardedRef()

https://gerrit.wikimedia.org/r/987513

Change 987514 had a related patch set uploaded (by Dominic mayers; author: Dominic mayers):

[mediawiki/extensions/Cite@master] Cite ext: Add a safeguard to avoid processing a second time the ref tags

https://gerrit.wikimedia.org/r/987514

In retrospective, I think it would have been better that I squash every thing in a single commit with a long explanation that covers every step of the change. My hope was that by doing a single git review for all the commits, gerrit or git review would have understood that it is a single proposal that must be presented as such into a single page with links to the patches. For that reason, I did not understand why Aklapper was concerned with dependencies. I even wrote that there was no dependencies, which is false (and erased it quickly). The point is that, if it is a single proposal, the notion of dependencies is not important, because they are ordered and if we understand the big picture, the dependencies are obvious. So obvious that we don't think of that as being important. I was very disappointed when I saw that it created many separate requests for review. Then I realized that, considering the dependencies, this was a mess. This has to be reviewed as a single proposal and the separation in many ordered patches is just for convenience.

Change 987512 abandoned by Dominic mayers:

[mediawiki/extensions/Cite@master] Cite ext: Strip empty tags also when searching missing closing tags

Reason:

This created a mess with many commits reviewed separately without the big picture.

https://gerrit.wikimedia.org/r/987512

Change 987507 abandoned by Dominic mayers:

[mediawiki/core@master] parser: Find the closing tag with a balanced inner text

Reason:

This created a mess with many commits reviewed individually without the big pucture

https://gerrit.wikimedia.org/r/987507

Change 987510 abandoned by Dominic mayers:

[mediawiki/extensions/Cite@master] Cite ext: Process nested <ref>s in <references> as ordinary <ref>s

Reason:

This created a mess with many commits reviewed individually without the big pucture

https://gerrit.wikimedia.org/r/987510

Change 987511 abandoned by Dominic mayers:

[mediawiki/extensions/Cite@master] Cite ext: Only increase $ref['count'] in ReferenceStack when the ref is a link

Reason:

This created a mess with many commits reviewed individually without the big pucture

https://gerrit.wikimedia.org/r/987511

Change 987513 abandoned by Dominic mayers:

[mediawiki/extensions/Cite@master] Cite ext: Order the notes as they appear in guardedRef()

Reason:

This created a mess with many commits reviewed individually without the big pucture

https://gerrit.wikimedia.org/r/987513

Change 987514 abandoned by Dominic mayers:

[mediawiki/extensions/Cite@master] Cite ext: Add a safeguard to avoid processing a second time the ref tags

Reason:

This created a mess with many commits reviewed individually without the big picture

https://gerrit.wikimedia.org/r/987514

Change 987509 abandoned by Dominic mayers:

[mediawiki/extensions/Cite@master] Cite ext: Use recursiveTagParse in guardedRef, not after the recursion

Reason:

This created a mess with many commits reviewed individually without the big picture

https://gerrit.wikimedia.org/r/987509

I reassigned myself to this task. I agree with those that say that it is not a bug, but a design decision. So, I have no intention to provide a patch that will correct that "bug". What is required is an entire new branch with an improved design. Currently, many wikipedians use the in list feature, that is, they prefer to put ref tags inside a references tag : <refererences> <ref> ... </ref> ... <ref> ... </ref> </references> (or the equivalent), because they feel that putting the ref tags directly in the body of the page makes the page hard to edit. It is also very common to see pages, even featured pages, that have notes with ref tags. Internally, notes are forms of ref tags. So, nested ref tags are very common. It is sad that it is not possible to nicely use these two common features together. Therefore, I have assigned myself the task of providing a new design that will allow these two features to coexist.

To show that it is possible to have nested refs, even in references blocks, without modifying the core parser, I have installed on this test site a recent master branch version of the core (not patched) with a new version of the Cite extension. Again, many wikipedians use nested refs and many prefer to put ref content in references blocks and, therefore, it is sad that these two features cannot be combined. I made sure that any reference error listed in the code (under a cite_error_<type> message), if there was any, is well displayed. There are none in that page, but I created this other page on the same site to illustrate the display of all these errors. I believe a good display of editor errors is important. I do not ignore the intension to move toward parsoid/PHP, but I believe that it is important to have a good point of reference when we do such a transition.

Change 987508 abandoned by Dominic mayers:

[mediawiki/core@master] Cite ext: Allow arbitrary nested ref tags: a cover letter

Reason:

This was a cover letter for abandoned changes

https://gerrit.wikimedia.org/r/987508

I have uploaded a patch with the associated commit message for the new proposed design. It's only proposed to inform a discussion. {F41728941}

I have uploaded a patch with the associated commit message for the addition of validations to the previous design.
{F41728944}
In the proposed design, the processing of the ref tags is independent of validation: all ref tags, with errors or not, are processed and presented as ref links or/and ref items. The validation simply attaches an error message to an erroneous ref item. In this way, the error messages are always close to their erroneous ref item and the erroneous ref item contains its backlinks when they exist. This is just an extra aspect of the design. The key point is to allow nested refs in references block.

I recall that I installed this new design in this test site. This other page illustrates the display of error messages.