Page MenuHomePhabricator

Cite anchors should be numbered starting at 1 to correspond with on-screen labels
Closed, ResolvedPublic

Description

Author: mysekurity

Description:
I know this is an issue with Comp Sci rather than with Wikipedia, but I think it would be helpful if when clicking a footnote, the fragment identifier (what comes after # in the URL) would match up with the number footnote I clicked.

For example, If I were to read the text: "DST is said to save electricity by reducing the need for artificial evening lighting,[4]" and were to click the raised 4, I would be taken to "http://en.wikipedia.org/wiki/Article#_ref-3", rather than the more logically "http...Article#_ref-4", as I clicked the 4th reference.

I know that in Binary and Hexidecimal, the first number in the series is 0 (hence when I click "ref 1", I'm taken to #_ref-0), but I was wondering if it were possible to start the numbering at 1 to better match a layperson's expected result.

Forgive me if this has been addressed in another bug report, but I didn't see anything.


Version: unspecified
Severity: enhancement

Details

Reference
bz10537

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:48 PM
bzimport added a project: Cite.
bzimport set Reference to bz10537.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

This would be great except for the bit about breaking all external links to Wikipedia footnotes. I don't expect there to be terribly many of those, however, so that might be an okay price to fix even a little annoyance like this.

mysekurity wrote:

I can't imagine there would be many people linking to _footnotes_. Why not just link to the actual page (or copy it over)? I highly doubt it would break many links, but I'm willing to hear evidence to the contrary.

  • Bug 14986 has been marked as a duplicate of this bug. ***

While technically this change would break links, it wouldn't break them very much -- you'd just have to scroll down a line to the next note -- and it would make things much less confusing.

Wouldn't the links break also if someone inserted another footnote before the referenced? So I'd say go ahead with the change.

webboy wrote:

Patch

This patch changes the id of the <ref> and <references> tags to always have the same number as the on-screen labels (only anonymous references).

This patch also makes the group name part of the id (except default group). This is needed to avoid conflicts between groups.

attachment bug10537.patch ignored as obsolete

webboy wrote:

Patch including ParserTests and a bugfix

This patch includes updated citeParserTests.txt

This patch also fixes a bug with non-ASCII group names in the previous patch.

attachment bug10537.patch ignored as obsolete

ssanbeg wrote:

Sorry I had a bunch of stuff mostly done, so your patch won't apply to what's there now, but I did read through most of it.

The advantage of the current system is that all the references are numbered sequentially across groups, so all of the IDs can be unique without doing any kind of weird encoding. It looks like your patch changes it so that the IDs are no longer unique; i.e. <ref group=a name=b> would get the same ID as <ref name=a_b>.

But most of your work seems sound, so this may be fixable as simply as a more clever encoding, i.e. concat the group & name, and specify the length of the group in the ID, so if the former is something like cite_note-1-a_b, the latter could be cite_note-0-a_b

webboy wrote:

Updated patch

(In reply to comment #8)

The advantage of the current system is that all the references are numbered
sequentially across groups, so all of the IDs can be unique without doing any
kind of weird encoding. It looks like your patch changes it so that the IDs
are no longer unique; i.e. <ref group=a name=b> would get the same ID as <ref
name=a_b>.

But most of your work seems sound, so this may be fixable as simply as a more
clever encoding, i.e. concat the group & name, and specify the length of the
group in the ID, so if the former is something like cite_note-1-a_b, the latter
could be cite_note-0-a_b

I have changed the code to always include the group - name separator. This way id conflicts are avoided.

attachment bug10537-new.patch ignored as obsolete

ssanbeg wrote:

(In reply to comment #9)

Created an attachment (id=5162) [details]
Updated patch

(In reply to comment #8)

The advantage of the current system is that all the references are numbered
sequentially across groups, so all of the IDs can be unique without doing any
kind of weird encoding. It looks like your patch changes it so that the IDs
are no longer unique; i.e. <ref group=a name=b> would get the same ID as <ref
name=a_b>.

But most of your work seems sound, so this may be fixable as simply as a more
clever encoding, i.e. concat the group & name, and specify the length of the
group in the ID, so if the former is something like cite_note-1-a_b, the latter
could be cite_note-0-a_b

I have changed the code to always include the group - name separator. This way
id conflicts are avoided.

That would still create a conflict if both conflicting references have a group, i.e.
<ref group=a_b name=c/> and <ref group=a name=b_c/>. Once you concat arbitrary strings, there's
no way to tell how to split them without knowing the length, so I'd recommend storing the length
of one of the strings, either before or after the combined string.

webboy wrote:

New patch

Ids now include group name length.

attachment bug10537-new2.patch ignored as obsolete

This looks pretty icky to me...

<ref>

#cite_note-0-_1

<ref group="blah">

#cite_note-4-blah_1

Those lengths are pretty awful looking and pretty unpredictable.

Given that named refs are still assigned unique sequence numbers within their groups, what's the point of including the name in the fragment ID? It seems to me that it's an unnecessary exposure of internal data -- the purpose is to allow two links to the same footnote without having to hardcode the number, which may change, in the source text.

Since the number was also in the link, putting the name in the link provides no static benefit for external linkers; its only purpose seems to be to complicate things by being ambiguous with the groups here. :)

My recommendation is to simply drop the ref name from the fragment ID -- use group and number only. Then there's nothing that needs to be disambiguated.

webboy wrote:

Yet Another Patch

(In reply to comment #12)

My recommendation is to simply drop the ref name from the fragment ID -- use
group and number only. Then there's nothing that needs to be disambiguated.

Ok, no problem. Patch updated.

Attached:

ssanbeg wrote:

(In reply to comment #13)

Created an attachment (id=5164) [details]
Yet Another Patch

(In reply to comment #12)

My recommendation is to simply drop the ref name from the fragment ID -- use
group and number only. Then there's nothing that needs to be disambiguated.

Ok, no problem. Patch updated.

That seems to have resolved the conflicts.

You may have made more changes to the messages than are necessary; i.e. removed cite_error_ref_invalid_parameters which is fully translated, and added two replacements that are not -- the one with no group probably won't get used, so this would cause the error message to change from the local language to English for no good reason. The other message you added duplicates an existing message, so that may be unnecessary.

webboy wrote:

(In reply to comment #14)

You may have made more changes to the messages than are necessary; i.e. removed
cite_error_ref_invalid_parameters which is fully translated, and added two
replacements that are not -- the one with no group probably won't get used, so
this would cause the error message to change from the local language to English
for no good reason.

I removed *cite_error_ref_too_many_keys* because the name, the English text and at least the Dutch and German translation are wrong. This error isn't triggered when you specify two or more names, only the last name is passed to the extension by the parser. This error is only triggered when you specify invalid parameters.

Therefore I created the messages 'cite_error_ref_invalid_parameters' and 'cite_error_ref_invalid_parameters_group', just like the existing 'cite_error_references_invalid_parameters' and 'cite_error_references_invalid_parameters_group'.

'cite_error_ref_invalid_parameters' may not be used, but $wgAllowCiteGroups is still existing and can be used.

The other message you added duplicates an existing
message, so that may be unnecessary.

The contents of 'cite_reference_link_key_with_num' and the new 'cite_reference_link_key_with_group' are the same, but there functions are different.

ssanbeg wrote:

(In reply to comment #15)

(In reply to comment #14)

You may have made more changes to the messages than are necessary; i.e. removed
cite_error_ref_invalid_parameters which is fully translated, and added two
replacements that are not -- the one with no group probably won't get used, so
this would cause the error message to change from the local language to English
for no good reason.

I removed *cite_error_ref_too_many_keys* because the name, the English text and
at least the Dutch and German translation are wrong. This error isn't triggered
when you specify two or more names, only the last name is passed to the
extension by the parser. This error is only triggered when you specify invalid
parameters.

Therefore I created the messages 'cite_error_ref_invalid_parameters' and
'cite_error_ref_invalid_parameters_group', just like the existing
'cite_error_references_invalid_parameters' and
'cite_error_references_invalid_parameters_group'.

'cite_error_ref_invalid_parameters' may not be used, but $wgAllowCiteGroups is
still existing and can be used.

The name may not be accurate anymore, but it would take some coordination to change. It looks like there are still a little over 60 references to the old message after your patch is applied, and (I think) it would need to be coordinated with the localization efforts. I like the old text better; since it's not so verbose, and won't need to be changed whenever a new option is added.

The other message you added duplicates an existing
message, so that may be unnecessary.

The contents of 'cite_reference_link_key_with_num' and the new
'cite_reference_link_key_with_group' are the same, but there functions are
different.

It seems unnecessary to me, but that wouldn't get localized anyway, so no big deal.

sumanah wrote:

Marking with need-review because this patch still awaits review by Brion or another active MediaWiki or Cite developer.

sumanah wrote:

Jelte, now that we have moved to Git as a source control system, you're welcome to get developer access https://www.mediawiki.org/wiki/Developer_access to put this & future patches directly into Git. That will get them reviewed faster, too!

sumanah wrote:

Eran, would you do me the favor of checking on this patch and seeing whether it would work with Cite in trunk, and if so, moving it into Git? Thanks.

Gerrit change 28460

Merged.