Page MenuHomePhabricator

special characters in anchor part of interwiki links not escaped 'correctly'
Closed, DeclinedPublic

Description

Author: gangleri

Description:
Dear friends,

Please take a look at
http://meta.wikimedia.org/wiki/User:Gangleri/remarks#InterW
iki_link_translation and
http://en.wikipedia.org/wiki/User:Gangleri/remarks#proper_l
ink_translation .

Please see the differences. Thanks!

Regards Reinhardt


Version: unspecified
Severity: normal
OS: Windows XP
Platform: PC
URL: http://meta.wikimedia.org/wiki/User:Gangleri/remarks#InterWiki_link_translation

Details

Reference
bz670

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 6:58 PM
bzimport set Reference to bz670.
bzimport added a subscriber: Unknown Object (MLST).

gangleri wrote:

More about anchors:

Part I: Just for your information. Would be to much to fix.

a) Sections and subsection titels can look very odd.

See [[:en:Wikipedia
talk:Categorization#Poll_about_Category:Whatever.7C.2A_.2C_.7C.28space.29.2C
_.7C.21|en:Wikipedia talk:Categorization#Poll about Category:Whatever|* , |
(space), |!]] and look how the link to the anchor was made.

b) Tray to create a section / subsection and use an internal link [[foo]]
and /or an interwiki link [[:zn:foo]] or an external link
[http://www.foo.int/ foo] in the title. Save the section / subsection and
look at the url in your browser.

c) Tray to use '''font''' ''styles'' and / or "quotes" and / or 'marks' as
title. Tray all the special characters !%&?*# and so on.

Part B:

At [[:is:Notandi:Gangleri/test/Snið:Flokkatré]] there are
two "'''anchors'''" <nowiki><font id="Landafraedi" /> and <font
id="Island" /></nowiki>. They work only because in both cases special
characters as "æð" in "Landafræði" and "Í" in "Ísland" have been replaced.

I would be happy, if the issue with the special characters and ( and )
could be solved.

Best regards ~~~~

gangleri wrote:

See also [[bug 845]] and [[test:User:Gangleri/tests/anchors]].
Regards Reinhardt

gangleri wrote:

Additional remark(s):

Spaces (also multiple subsequent spaces) are translated correctly in an
anchor of an interwiki link.

See some examples (and an link to real "[[en:Pathology|pathological]]"
anchor) at http://test.wikipedia.org/w/index.php?title=User%3AGangleri%
2Ftests%2Fanchors&diff=0&oldid=4348

Regards Reinhardt

rowan.collins wrote:

If I understand this report correctly, the precise problem can be summarised as
follows (for reference, this paragraph would have been an ideal initial comment
for this bug report):

When typing an internal link to a section, you can use "special" characters like
"(", ")", etc and they will be converted to the "escaped" forms like ".28" and
".29" (e.g. "#foo (bar)" is equivalent to "#foo_.28bar.29"). However,
InterWiki links don't undergo this transformation, resulting in invalid links
(e.g. "[[fr:Foo#foo (bar)]]" will create a link to .../wiki/foo#foo_%28bar%29
not .../wiki/foo#foo_.28bar.29)

I think this is basically a matter of how much we assume to know about the
target of an interwiki link - many interwiki links are to non-MediaWiki sites,
which won't follow these special escaping rules. One possibility would be to
escape the link differently depending whether the interwiki prefix is marked in
the database as "local" or not (i.e. assume that everything 'local' is a
MediaWiki, and everything else isn't).

At first sight, it would seem logical to make incoming URLs with "wrong"
characters in behave as though they had them escaped, but this is awkward
because the wiki can't manipulate the address requested other than telling the
browser to reload as though it was a different page (it's entirely up to the
browser to scroll the page based on text after the "#").

To reply to some other comments:

(In reply to comment #1)

a) Sections and subsection titels can look very odd.

There's really not a lot that can be done about that; escaping of some kind is
inevitable unless we simply restrict what characters headings can contain.
However, when working correctly, you should never have to type these escaped
forms, so it shouldn't matter.

b) Tray to create a section / subsection and use an internal link [[foo]]
and /or an interwiki link [[:zn:foo]] or an external link
[http://www.foo.int/ foo] in the title. Save the section / subsection and
look at the url in your browser.

That's an unrelated bug: the "add section" code is guessing wrong what the
anchor will be on the final page. It might be worth filing another bug for.

At [[:is:Notandi:Gangleri/test/Snið:Flokkatré]] there are
two "'''anchors'''" <nowiki><font id="Landafraedi" /> and <font
id="Island" /></nowiki>. They work only because in both cases special
characters as "æð" in "Landafræði" and "Í" in "Ísland" have been replaced.

I'm not sure what you mean here; in what sense did they "not work" before this
change? Can you come up with a simplified test-case to demonstrate? If you just
mean that interwiki links weren't generated correctly, as with the main issue in
this report, then that is to be expected, for exactly the same reasons.

richholton wrote:

At least _part_ of the issue here is the difference between the way that the
parser creates anchors from headings (for the purpose of linking from the TOC),
and the way that in-page targets are handled in links (e.g. [[PAGE NAME#heading]].

The code in parser.php (function formatHeading) seems to do more work to ensure
that the anchor can be linked to. In linker.php, function makeKnownLinkObj, the
code is a bit simpler (less robust?), and a very similar bit of code exists in
editPage.php, function sectionAnchor.

Given the relative opacity of the code in parser.php, I'm not sure how practical
it would be to pull the code there into a function visible from linker and
editPage. Something along this line is probably needed.

bugzillas+padREMOVETHISdu wrote:

(In reply to comment #5)

b) Tray to create a section / subsection and use an internal link [[foo]]
and /or an interwiki link [[:zn:foo]] or an external link
[http://www.foo.int/ foo] in the title. Save the section / subsection and
look at the url in your browser.

That's an unrelated bug: the "add section" code is guessing wrong what the
anchor will be on the final page. It might be worth filing another bug for.

Filed as bug 1860.

gangleri wrote:

bug 2381: "enhancement request: traying to achive copy and past anchors"
suggests a workaround to handle some / many of the described anchors

conrad.irwin wrote:

At some point in the last 5 years and 10 days, the parser started anchor-escaping interwiki links in the same way as local links. There are other bugs with anchor link generation, see bug 18431 and bug 5019 et.al., but I think the issue identified in comment 5 has been solved, so I'm closing this bug.