Page MenuHomePhabricator

External URL syntax cannot handle square brackets
Open, LowPublic

Description

Author: Rehards Reinahrdt [[user:gangleri]]

from bug T4095 comment 8, Example case (from en.wikipedia.org/wiki/Solar_eclipse, external links section:

<nowiki>
[http://eclipse.span.ch/eclipse8april.htm Pictures of the most recent eclipse of
[[8 April]] [[2005]]]
</nowiki>

should give

<nowiki>
Pictures of the most recent eclipse of [[8 April]] [[2005]]
</nowiki>

as link text, but gives instead

<nowiki>
[http://eclipse.span.ch/eclipse8april.htm Pictures of the most recent eclipse of
2005 ]
</nowiki>

with the http: part as an external link, and the closing bracket as a link to "8 April". note: The example describes "nested links": "wiki links" inside "external links". This is somthing new. To my understanding the most usefull would be to change

[http://eclipse.span.ch/eclipse8april.htm Pictures of the most recent eclipse of [[8 April]] [[2005]]]

to

[http://eclipse.span.ch/eclipse8april.htm Pictures of the most recent eclipse of] [[8 April]] [[2005]]

This should be done by users / contributors. Would be great if the MediaWiki software could generate a warning.

Details

Reference
bz3695

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:52 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz3695.
bzimport added a subscriber: Unknown Object (MLST).

I've added a hack that fixes this for one embedded internal link if tidy is
disabled.

There are many nesting-related bugs in the current parser. It's impossible to
fix them all without rewriting the parser to a real state machine.

(In reply to comment #1)

There are many nesting-related bugs in the current parser. It's impossible to
fix them all without rewriting the parser to a real state machine.

Has this been fixed with the new parser?

  • Bug 19411 has been marked as a duplicate of this bug. ***
  • Bug 19411 has been marked as a duplicate of this bug. ***

Renaming and bumping up severity: the bug also occurs when the square brackets are part of the URL, which is more problematic, because some frameworks use brackets in GET parameters. For example,

[http://www.danishliterature.info/index.php?id=2092&no_cache=1&tx_lfforfatter_pi2[stage]=1&tx_lfforfatter_pi2[uid]=109&tx_lfforfatter_pi2[lang]=_eng Jørgen-Frantz Jacobsen]

should give

<a href="http://www.danishliterature.info/index.php?id=2092&no_cache=1&tx_lfforfatter_pi2[stage]=1&tx_lfforfatter_pi2[uid]=109&tx_lfforfatter_pi2[lang]=_eng">Jørgen-Frantz Jacobsen</a>

but the actual result is

<a href="http://www.danishliterature.info/index.php?id=2092&no_cache=1&tx_lfforfatter_pi2">[stage</a>=1&tx_lfforfatter_pi2[uid]=109&tx_lfforfatter_pi2[lang]=_eng Jørgen-Frantz Jacobsen]

Automatic URLs (ie. just pasting the URL in wikitext without any formatting) have the same problem.

Replacing [ and ] in the URL with %5B and %5D usually helps, but is technically incorrect because these are reserved characters and shouldn't be urlencoded (nor do most browsers urlencode them when you copy the URL from the address bar), and it depends on the whim of the url processor whether the urlencoded version will still point to the same resource.

A possible hack would be to change the regexp identifying the URL part: one could take advantage of the fact that square brackets are in practice always balanced in an url, aren't usually nested and are never at the end. It might be possible to add something like (\[[-\w_+%]*\][-\w_+%?=]+)? to the end of the URL regex to handle such links and still not mess up [1]-style automatically numbered links.

Not going to fix this with the current MW core. Punting to the new editor that Brion has planned. The new editor should automatically encode pasted URLs