Page MenuHomePhabricator

{{#ifexist}} does not recognise URL encoded titles
Closed, InvalidPublic

Description

{{#ifexist}} does not recognise URL encoded filenames, see URL for testcase.

I fixed it with r37387 but Brion didn't like the fix at this point.

With r37420 he reverted:

Revert r37387 "Let the {{#ifexist}} works with encoded URLs too. See testcase"
Causes regression for titles containing "+".
Proper fix here is to move the existing % decoding into Title::newFromText or
Title::secureAndSplit so it's done automatically, rather than attempting to
replicate it, with the potential of getting it wrong like this, every place we
decide to accept titles

Sticking it to Bugzilla for the moment until I have time (or another dev ;-) )


Version: unspecified
Severity: normal
URL: http://test.wikipedia.org/wiki/User:Raymond/ifexist

Details

Reference
bz14779

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:11 PM
bzimport added a project: ParserFunctions.
bzimport set Reference to bz14779.
bzimport added a subscriber: Unknown Object (MLST).

rene.kijewski wrote:

Maybe the underlaying Sanitizer::decodeCharReferences could be changed?
{{titleparts}} is even worse affected by the bug: It only returns the input.
I tried to provide a patch, but I got confused with the mathematics of converting URL encode to UTF-8. ;-)

I don't know if someone needs might it, but this it the regex for proper URL encoded characters:
/(%[0-7][0-9A-Za-z])

(%[CDcd][0-9A-Za-z]%[89ABab][0-9A-Za-z])
(%[Ee][0-9A-Za-z](?:%[89ABab][0-9A-Za-z]){2})
(%[Ff][0-7](?:%[89ABab][0-9A-Za-z]){3})/x

rene.kijewski wrote:

Wouldn't rawurldecode do it well? But I think it should go directly to Title::newFromText.

http://de.php.net/manual/en/function.rawurldecode.php: "Note: rawurldecode() does not decode plus symbols ('+') into spaces. urldecode() does"

Sanitizer::decodeCharReferences *must not* attempt to deal with URL percent-encoding, as that would cause corruption of totally unrelated HTML output.

Probably the Sanitizer::decodeCharReferences() and the %-check & urldecode() both belong in either Title:newFromText or directly into Title::secureAndSplit() to ensure that titles are being consistently handled at the low-level; this means the various checks at higher levels should be checked and mostly pulled out.

There are probably a number of related bugs still open on this issue; be good to make sure they're all tied together.

fullurl has no problem with urlencoded pages:

{{fullurl:{{FULLPAGENAME}}}} and {{fullurl:{{FULLPAGENAMEE}}}} gives the same result.

CoreParserFunctions.php has the description - since r15276
$title = Title::newFromText( $s );

  1. Due to order of execution of a lot of bits, the values might be encoded
  2. before arriving here; if that's true, then the title can't be created
  3. and the variable will fail. If we can't get a decent title from the first
  4. attempt, url-decode and try for a second.

if( is_null( $title ) )

$title = Title::newFromUrl( urldecode( $s ) );

#ifexist has not this and only try to get the title from text and not from URL second.

I do not know this is a possible solution. I am not so familiar with MediaWiki or PHP to say, this is good or not. I can only give you a place, who this works. Maybe you can adapt from there.

I don't see why this is a bug. It doesn't recognise URL-encoded file names because the parameter is the file name, not the URL. I don't see the need to take MediaWiki apart and put it back together again when you could just fix your broken #ifexist calls.

This is not only a problem of file names. I have add a example (see url).

Rereading this, I have to concur with comment 6. This must have blocked some use case of mine in the past so I CC'd but I don't remember it and neither do the other comments explain why it is a bug from their points of view. I propose closing as INVALID.