Page MenuHomePhabricator

Redirect parsing is over-permissive; syntactically broken redirects are parsed as redirects to the wrong place.
Closed, ResolvedPublic

Description

Author: snottygobble

Description:
[http://en.wikipedia.org/w/index.php?title=Pinus_edulis&oldid=212833839 This] redirect should be parsed as horribly broken, not as a redirect to a category:

The text of that page is:

#REDIRECT [[Colorado Pinyon]

{{R from scientific name}}

[[Category:Symbols of New Mexico]]

i.e. the true target of the redirect is not properly closed. The text is syntactically incorrect, and should be parsed as such. Unfortunately, Mediawiki parses this as a redirect to [[:Category:Symbols of New Mexico]], which is completely unintuitive, and certainly not what was intended.


Version: unspecified
Severity: minor
URL: http://en.wikipedia.org/w/index.php?title=Pinus_edulis&oldid=212833839

Details

Reference
bz15053

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:19 PM
bzimport set Reference to bz15053.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

Patch to fix the problem

This patch would fix the problem, which is quite a simple one. However, it's not clear how much we want to tighten the parsing here. This patch would rule out the following page text as well:


#REDIRECT

[[Colorado Pinyon]]

I don't know whether we want to. People may be depending on this behavior. Also, if we're tightening up redirect parsing, my patch doesn't go as far as it could. It just requires that the link be on the same line, so this would still fail:


#REDIRECT [[Colorado Pinyon] {{R from scientific name}} [[Category:Symbols of New Mexico]]

A more complete patch would remove the magic word and then require the target link to be at the beginning of the line, after whitespace is stripped. Is this desirable? It could potentially break a lot of redirects.

Attached:

(In reply to comment #1)

A more complete patch would remove the magic word and then require the target
link to be at the beginning of the line, after whitespace is stripped. Is this
desirable? It could potentially break a lot of redirects.

I'm working on a patch that does just that.

Created attachment 5127
Proposed patch

This patch requires that the link immediately follow #REDIRECT, but does allow it to be on a different line. That is, everything that matches:

#REDIRECT <whitespace> [[<valid title>]]

is a redirect. <whitespace> can contain newlines, and can also be nothing (so #REDIRECT[[Foo]] will also work).

To illustrate, I've tested the following examples:
#REDIRECT [[Foo]]\n[[Category:Bar]]

returns Foo

#REDIRECT[[Foo]]

returns Foo

#REDIRECT\n[[Foo]]\n[[Category:Bar]]

returns Foo

#REDIRECT [[Foo]\n[[Category:Bar]]

returns null

Attached:

ayg wrote:

Well, want to commit it? It *might* break stuff, but it seems fairly unlikely. *I've* never seen a redirect with non-whitespace between the magic word and the link, at least.

(In reply to comment #4)

Well, want to commit it? It *might* break stuff, but it seems fairly unlikely.
*I've* never seen a redirect with non-whitespace between the magic word and
the link, at least.

Committed in r38737

snottygobble wrote:

Gosh you guys are fast. Thanks heaps for that.