Page MenuHomePhabricator

Bogus entries in externallinks table due to unescaping of &%=+
Closed, ResolvedPublic

Description

Consider this URL:

http://example.com/index.php?foo=bar%26baz%3Dquux%2Bquux

It has one parameter, foo, with the value "bar&baz=quux+quux". Place this in an article and the externallinks table will contain this URL instead:

http://example.com/index.php?foo=bar&baz=quux+quux

This has *two* parameters, foo with the value "bar" and baz with the value "quux quux".

Then try this URL:

http://example.com/index.php?foo=%25xx

The value of foo is "%xx". But put it into an article, and externallinks will contain this URL instead:

http://example.com/index.php?foo=%xx

That's not even valid.

The problem lies in Parser::replaceUnusualEscapesCallback, it will unescape %25, %26, %2B, and %3D despite these all having special meaning in a URL when unescaped. I see a similar-sounding problem was reported in bug 4781, which was closed as "fixed" with no reference to the revision in which it was fixed. Bug 40267 also touched upon this issue, but these real problems appear to have been overlooked since the reporter there focused on the unescaping of various safe characters rather than only these unsafe ones.


Version: 1.23.0
Severity: normal

Details

Reference
bz57909

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:41 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz57909.
bzimport added a subscriber: Unknown Object (MLST).

So the question I have is: Can we just change replaceUnusualEscapesCallback (leaving externallinks inconsistent until all these pages happen to be reparsed), or should we try to figure out which pages are affected and run a maintenance script of some sort over them, or is externallinks supposed to contain such broken entries?

(In reply to comment #1)

So the question I have is: Can we just change replaceUnusualEscapesCallback
(leaving externallinks inconsistent until all these pages happen to be
reparsed), or should we try to figure out which pages are affected and run a
maintenance script of some sort over them, or is externallinks supposed to
contain such broken entries?

You could null edit all the pages. :-)

Change 152889 had a related patch set uploaded by Anomie:
Improve Parser::replaceUnusualEscapes

https://gerrit.wikimedia.org/r/152889

Change 152889 merged by jenkins-bot:
Improve/rename Parser::replaceUnusualEscapes

https://gerrit.wikimedia.org/r/152889