Page MenuHomePhabricator

Slashes in titles should be escaped when generating links
Open, LowPublic

Description

Author: vigna

Description:
Some Wikipedia pages contain slashes in their title, e.g., "/b/" or "/dev/null". Such slashes should be escaped to %2F when they are not meant to separate paths components, as per RFC 1738/3986. Instead, for instance,

http://en.wikipedia.org/w/index.php?title=B/&redirect=no

contains an A element

<a href="/wiki//b/" title="/b/">/b/</a>

that should written

<a href="/wiki/%2Fb%2F" title="/b/">/b/</a>

This causes 2 problems:

  1. The URL above does not conform to the RFCs. This is a minor issue but it might rise other problems in the future.
  1. If the slash is initial, the URI is not normalized as per RFC. E.g., in Java

bsh % print(URI.create("wiki//b/").normalize());
wiki/b/

This might be a problem with crawlers, which usually normalize URLs to reduce the number of duplicates.

Note that %2F-escaped slashes work perfectly, and

http://en.wikipedia.org/wiki/%2Fb%2F

brings you exactly to /b/'s page. Bug 8254 seems to say the opposite, but it's a few years old.


Version: 1.23.0
Severity: normal

Details

Reference
bz61944

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:57 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz61944.
bzimport added a subscriber: Unknown Object (MLST).

Thanks for taking the time to report this!

It's not entirely clear to me what "should be escaped when generating links" mean. The user (when using classic text wikipage editing) or VisualEditor (when using VE for wikipage editing) cannot knowwhich kind of slash is meant.

Or did you mean "when generating wiki page titles" instead?

vigna wrote:

What I mean is that the Wikipedia text contains links like /b/. That is, the user will input the title of a page. But then some code will turn this title into a link: it shouldn't be http://en.wikipedia.org/wiki//b/, but rather http://en.wikipedia.org/wiki/%2Fb%2F .

Maybe the page above wasn't a good example. Consider

http://en.wikipedia.org/wiki/Cory_Doctorow

The reference entry for /usr/bin/god in the "Other" section is

The link generated by the Wikimedia software (or whoever is in charge :) is

<a href="/wiki//usr/bin/god" title="/usr/bin/god">/usr/bin/god</a>

This should be

<a href="/wiki/%2Fusr%2Fbin%2Fgod" title="/usr/bin/god">/usr/bin/god</a>