Page MenuHomePhabricator

urlencode on variables get double-encoded
Open, LowPublic

Description

Author: sergey.chernyshev

Description:
I use {{urlencode}} to encode {{PAGENAME}} value and it looks like it double encodes them.

I created a test page for it on Wikipedia and it has the same issue:
http://en.wikipedia.org/wiki/User:Sergey_Chernyshev/Variable_Urlencode_%27_bug


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/wiki/User:Sergey_Chernyshev/Variable_Urlencode_%27_bug

Details

Reference
bz13288

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:03 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz13288.
bzimport added a subscriber: Unknown Object (MLST).

nicdumz wrote:

Variables are being escaped through wfEscapeWikiText, so ' is converted to '
Then &, #, and ; from "'" are escaped by urlencode()

While wfEscapeWikiText sucks, a simple fix for now would be to html_entity_decode the text before any {{urlencode: processing : html entities in URLs are invalid anyway (" is a bad title, and &#nn; is interpreted by navigators as & )

Index: CoreParserFunctions.php

  • CoreParserFunctions.php (révision 32034)

+++ CoreParserFunctions.php (copie de travail)
@@ -82,7 +82,7 @@

	}

	static function urlencode( $parser, $s = '' ) {
  • return urlencode( $s );

+ return urlencode( html_entity_decode($s, ENT_QUOTES) );

	}

	static function lcfirst( $parser, $s = '' ) {
  • Bug 22508 has been marked as a duplicate of this bug. ***

Isn't {{PAGENAMEE}} just for this purpose?

Yes, {{PAGENAMEE}} is a valid workaround. But it should work, nonetheless.

I see no way how it could possibly work without breaking BC.

conrad.irwin wrote:

{{PAGENAMEE:{{PAGENAME:&}}}} -> %26 (RIGHT)
{{URLENCODE:{{PAGENAME:&}}}} -> %26amp%31 (WRONG)
{{PAGENAMEE:&}} -> %26 (WRONG - ?)
{{URLENCODE:&}} -> %26amp%31 (RIGHT)

I put the ? there because [[&]] creates a link to [[&]] (perhaps also wrong) and http://en.wikipedia.org/wiki/%26amp; is an server error.

I think the solution would be to have {{PAGENAME}} et.al. return a "text-needs-escape" object of some kind, parser functions could then request that they get unescaped input as a flag, the parser would then escape the text when the escaping is neeeded.

The Django template engine deals with this issue very nicely, maybe we can copy some of their ideas.

(In reply to comment #6)

{{PAGENAMEE:&}} -> %26 (WRONG - ?)

I put the ? there because [[&]] creates a link to [[&]] (perhaps also
wrong) and http://en.wikipedia.org/wiki/%26amp; is an server error.

& is disabled on wmf due to broken clients. Also, entities in titles are normalised away unless I am mistaken.

I don't know enough about parser to say if that is possible.

Amalthea.wikimedia wrote:

Core of the issue seems to be that {{PAGENAME}} and others internally escapes some characters to entities, which breaks other magic words/parser functions when they are using it directly.

{{#ifeq:{{PAGENAME:File:Aci Sant'Antonio.svg}}|Aci Sant'Antonio.svg|y|n}}
→ "n"

{{#ifeq:{{PAGENAME:File:Aci Sant'Antonio.svg}}|Aci Sant'Antonio.svg|y|n}}
→ "y"

{{FILEPATH:Aci_Sant'Antonio.svg}}
→ "http://upload.wikimedia.org/wikipedia/commons/0/00/Aci_Sant%27Antonio.svg"

{{FILEPATH:Aci Sant'Antonio.svg}}
→ ""

{{str left|{{PAGENAME:File:Aci Sant'Antonio.svg}}|12}}
→ "Aci Sant&#39"

Amalthea.wikimedia wrote:

More or less duplicated by bug 16474 and bug 14779, as far as I can tell.