Page MenuHomePhabricator

provide a way to specify what text/statement is supported by a <ref> block.
Open, MediumPublicFeature

Description

Author: virtueller_andy

Description:
Usually, any XML-based markup follows this pattern: <tag attribute="Attribute">Text</tag>

However, the ref element confuses attribute and text: <ref>Attribute</ref>. This makes it impossible to know, which sentence or paragraph of the text is addessed by the ref element:

Example 1 (current ref element):
*This is the first sentence. This is the second sentence.<ref>Somewhere</ref>

Obviously, nobody knows in example 1, if the source references the first and the last or only the last sentence.

Example 2 (suggestion):
*This is the first sentence. <ref source="Somewhere">This is the second sentence.</ref>

Here, it is quite obvious, that the source references the second sentence. This is, because this example follows correct XML markup. Everyone knows know, where to find the source for any remark in a text.

This problem has already been discussed in the German Wikipedia and has been deemed to be necessary to fix it: http://de.wikipedia.org/wiki/Wikipedia:Verbesserungsvorschl%C3%A4ge/Feature-Requests#Verbesserung_des_ref-Elements_f.C3.BCr_Quellenangaben


Version: unspecified
Severity: enhancement

Details

Reference
bz18231

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:35 PM
bzimport added a project: Cite.
bzimport set Reference to bz18231.
bzimport added a subscriber: Unknown Object (MLST).

The <ref> element can't violate XML standards, because it's wikitext, and wikitext doesn't pretend to be valid XML. Also, I don't see how there would be any difference in terms of rendering between example 1 and example 2: both would render a [1] after the second sentence.

Recommend WONTFIX.

I think the problem is a real one, even though the argument about XML is invalid.

The point is that <ref>...</ref> should enclose the statement that a reference is *for*, i.e. the statement it supports, rather than the reference information itself. That way, it would be clear which reference supports which statements. For practical reasons however, that would make things tricky, because attribute values are considered to be plain values, but the reference info is quite often complex wikitext. One solution would be to split it up: <statement id="foo">some text</statement> ... <ref for="foo">reference info</ref>. Note that the <ref> could acourr anywhere inthe text, it would be rendered at the end. The statement tag would not be visible at all.

This would be more flexible and semantically more expressive than the current way of doing things.

(In reply to comment #2)

This would be more flexible and semantically more expressive than the current
way of doing things.

I understand that, but unless it shows in actual rendering, I don't see how it's useful.

Especially when talking about wiki-code which is meant to be easy to use, a simple <ref>Text</ref> is _a lot_ easier to understand and to use than a complicated structure like <statement id="foo">some text</statement> ... <ref for="foo">reference info</ref>; not to mention that both would result in the same output...

<ref>s can already have a name. I suggest to simply allow <statement> tags that do nothing, and can be bound to refs by that name. That way, everything works as before, but if when and where people want to make very clear what piece of text a reference supports, then they can.

This information could be quite useful for some tools - e.g. i could imagine something that shows the references for a piece of text ion a small tooltip when the mouse pointer hovers over a piece of text.

changed summary to reflect the actual isse better

virtueller_andy wrote:

The output isn't the same, if, i.e. someone may have the mouse pointer over a specific sentence and that sentence may then appear in another colour (bright red) and the source may also be shown in a small box next to the mouse pointer. Than that really would make a difference and help to immediately know the source without any scrolling.

@Andy: yes somethign like that would become possible, and I see potential in the idea. But I think requesting this functionality is beyond the scope of this ticket. Being able to at least mark the relationship between the bits of text in question would already hel - then the info is there, to be used later in whatever way.

virtueller_andy wrote:

What about
<ref authors="Peter Miller" title="This Is A Titel" source="Journal of Health, Vol. 3, No. 7" year="2007" href="http:...">This is a sentence.</ref>

This would improve usability since i.e. the Englisch Wikipedia found only this way to include sources better:
<ref>{{cite journal|doi=10.1016/S1534-5807(03)00325-3 |title=Role of Pax Genes in Eye Evolution A Cnidarian PaxB Gene Uniting Pax2 and Pax6 Functions|pages=773&ndash;785|year=2003|author=Kozmik, Z|journal=Developmental Cell|volume=5}}</ref>

Usability isn't a reason AGAINST, but a reason FOR a XML-like ref element.

Those using the ref tag are usually advanced users.

virtueller_andy wrote:

PS: Btw, I agree that it is extremely important to know which part of a paragraph or even a sentence is addressed by an ref tag. This isn't possible now, but would open new opportunities (i.e. mouse over effects) and would improve the exactness of references.

rainald.koch wrote:

Modified example 1 (current ref element):
*This is may be true and this too.<ref>Somewhere</ref>

If the reference is carelessly placed at the end of the sentence, it may be meant for supposition #1 only.

@Roan:

unless it shows in actual rendering, I don't see how it's useful.

It's useful for user who know how to edit and for subsequent authors in (at least) two situations:

As a reviewer, if I know that supposition #2 is true but doubt #1, I may try to find a corroborating sentence in the cited reference, which might be a rather long text or even a book not at hand. If the info isn't there, it migth have been an interesting digression or a waste of time. You may argue that as a reviewer I should be an expert in the field and should easily provide better references for both facts.

Well, I may be more of an average author of wikipedia knowing #1 and #2 as facts and also closely related facts #3 and #4. It may be adequate to split the sentence. Where to place the <ref> to the unknown-by-heart reference?

It may be enough to advertise the pinpoint placement of <ref>s and the use of comments in the source. A backward-compatible extension of the syntax may be a satisfactory advertisement. It would support the necessary edits for a future mouse-over solution.

virtueller_andy wrote:

Additionally, there could be a GUI-like way to add ref elements in the browser. If you have your mouse over a specific paragraph then that paragraph could be in another color. The reference could then be edited in another text field. Then no tag would disturb an inadvanced user.

virtueller_andy wrote:

Is anyone already trying to realize this request?

One problem I see with that, is that facts are often backed up with more than one source. You end up with a complicated structure rather than a tree of references. Take this for an example:

Statement with A as source. Statement with both A and B as source. Statement with B as source.

There are two possible ways to interpret the following syntax:

<ref source="somewhere">Statement with A as source. <ref source="somwhere else">Statement with both A and B as source.</ref> Statement with B as source.</ref>

The </ref> part has to include a name tag as well, or else you won't be able to distinguish which references you are closing.

That being said, there is a nice GUI: highlight the referenced text when the user hovers over its reference number.

Regarding the following two paragraphs in the post of 2011-06-26 15:32:06

<ref source="somewhere">Statement with A as source. <ref source="somwhere

else">Statement with both A and B as source.</ref> Statement with B as
source.</ref>

The </ref> part has to include a name tag as well, or else you won't be able to

distinguish which references you are closing.

SGML-type markup languages - whether XML, HTML, XHTML or whatever - do not permit overlapping elements. Each element has an opening tag - <ref source="somewhere"> or <ref source="somwhere else"> - some content, and a closing tag - </ref>.

Closing tags never have identifiers, and always close the most recently opened unclosed element. So, in this example, the first </ref> closes the second <ref> (that with source="somwhere else"), and the second </ref> closes the first <ref> (source="somewhere").

denis wrote:

This is important as the current situation causes conflicts again and again. This would really help. At least as an option to switch on and off.

rainald.koch wrote:

When in doubt of the statements supported by an unaccessible reference in an existing, obviously heavily edited text, I use wikiblame http://wikipedia.ramselehof.de/wikiblame.php?user_lang=en with both the reference and critical statements to "estimate" the attribution.

Will the highlighting tool do something else or will it just use the current, possibly corrupted source code?

May be that the most important benefit of the attribution feature discussed here is an educational effect discuraging authors from messing up existing text (just started dreaming of a better world - the ultimate tool would search the literature for references supporting whatever I choose write).

[Resetting Priority and Target Milestone to reflect reality. To speed up fixings, patches are the most effective way.]

Backreferences from the refs to the "statements" can be simplified by noting that the previous sentence is covered by the following reference, and by noting that reference marks for sentences with identical references following the first reference can be dropped. This follows from existing off-wiki style guides on the matter.

First statement.<ref name="first">
More about the first statement.<ref name="first">
Third statement.<ref name="third">
Even more about the first statement.<ref name="first">

This will render as
First statement.[1] More about the first statement. Third statement.[3] Even more about the first statement.[1]

Note that the second ref is removed.

The statements/sentences are sometimes referred to as graphical periods, and is often defined as the text between strong separators or sentence separators. That is period (.), colon (:), question mark (?), and exclamation mark (!).

Backreferences from the refs to the "statements" can be simplified by noting that the previous sentence is covered by the following reference, and by noting that reference marks for sentences with identical references following the first reference can be dropped. This follows from existing off-wiki style guides on the matter.

Assuming that a reference applies to the sentence preceding it makes sense. But it's not machine readable: finding the boundaries of a sentence in natural text is a surprisingly hard problem even for a single language. I see no good way to support this for a wide range of languages.

In any case, even with this interpretation as the default, there should be a way to explicitly state what statement is covered by a reference.

It is a lot of weird cases, and it will not apply for all languages. But we don't have to support all languages and all cases, we only have to support a sufficient large base so we can figure out if this makes sense.

A whole bunch of programs that does segmenting of sentences only act on the four sentence terminators, doing an implicit termination of a sentence at a newline, and rejecting sentences of only a single word. That is the 90% -case.

There should be an explicit method to specify which text is covered by a reference, but that is the 10% -case, and it is probably far less too.

A more interesting discussion is whether a clause should be marked as a referenced unit, and if that should be done if a new clause is start by special words. In Norwegian "så" can be used as a separator of clauses.

Given that there's suddenly discussion on this, I should note that my very vague plan is for this feature to be part of the DOM-annotated replacement for the Cite extension (for which I don't think there's a single place to point); it'd probably not be worthwhile putting much effort into this in the legacy code.

The parsing team has a proposal for "heredoc arguments" (T114432) which would allow the use of the {{cite}} template to easily enclose a region of text.

This comment was removed by jeblad.
Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM