Page MenuHomePhabricator

Parameters to {{fullurl}} et al. should be URL-encoded
Closed, DeclinedPublic

Description

Author: locke.cole.wiki

Description:
Parameters to {{fullurl:}} are not URL encoded, and this can lead to issues with
spaces in parameter names. For example:

[{{fullurl:Special:Log/block|page=User:Locke Cole}} link]

Generates:

"http://server/wiki/Special:Log/block?page=User:Locke Cole"

This space in "Locke Cole" is treated as a space in the link syntax ([<url>
<link title>]). So, after {{fullurl:}} performs it's magic, we end up with this:

[http://server/wiki/Special:Log/block?page=User:Locke Cole link]

Which renders as:

"Cole link"

With the anchor tag pointing to:

"http://server/wiki/Special:Log/block?page=User:Locke"

Note: {{fullurle}}, {{localurl}} and {{localurle}} would also need to be changed
if solution #1 were chosen.


Version: 1.7.x
Severity: enhancement

Details

Reference
bz5720

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:12 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz5720.
bzimport added a subscriber: Unknown Object (MLST).

locke.cole.wiki wrote:

As I see it, there are two possible solutions to this:

  1. URL encode parameters.
  2. Provide a set of new magic words to allow editors to URL encode/decode as

necessary ({{urlencode:<string>}} and {{urldecode:<string>}}).

I'm leary of solution #1 because of possible unintended side effects, but at the
same time, solution #2 would be unnecessary bloat if solution #1 could work.

gangleri wrote:

see
Bug 839: method to URL-escape arbitrary text (e.g. the name of some other page)

omniplex wrote:

839 appears to be more general. Just in case:

The "normal" way to deal with spaces is %20,
that also works for [[google:query strings]].

Google also offers "+", where Wikipedia wants
"_" (underscore). We probably only want "_" (?)

gangleri wrote:

(In reply to comment #3)

The "normal" way to deal with spaces is %20,
that also works for [[google:query strings]].

Google also offers "+", where Wikipedia wants
"_" (underscore). We probably only want "_" (?)

see Bug 707: Google search with [[Google:search term]]

omniplex wrote:

[[mediazilla:707]] is more general, and it's Latin-1 part is
probably obsolete, RFC 3987 wasn't published at this time (?)

But fullurl: / fullurle: / localurl: / localurle: are straight
forward: CTL, SP, '"', "<", ">", "\", "^", "`", "{", "|", "}",
and "#" can't occur in the query part of an URL, they have to
be percent encoded. Replacing %20 for all spaces should do it.

robchur wrote:

Note: We now have {{urlencode}}.

omniplex wrote:

Nice, but it uses "+" for spaces, not the "universal" (in my parallel universe)
"%20". On [[m:Help:Variable]] (end of first section) I've replaced a convoluted
workaround by urlencode:, and for that example "+" worked. It should also work
for google queries. Is "+" really okay for Wikipedia everywhere?

Nope. That's because "+" is a valid character in MediaWiki page titles now
([[C++]] for example). The escape sequence "%20" would work for MediaWiki pages,
and I think it'd work for Google queries too.

using "+" to escape space is aprt of the URL and HTTP RFCs and should thus work
universally. I just checked, it also works in MediaWikis external link syntax
(both "boxed" and "free"). It should also work in normal wiki links, but anyway,
urlencoding is not appropriate there.

omniplex wrote:

re #9: Please state the precise section of RFC 3986 (STD 66)
where "+" is mentioned as placeholder for spaces, I'm far
from knowing this monster (or its predecessors 2396 + 1738)
by heart. It's tricky stuff, especially the query part, e.g.
"=", "&", and ";" are only conventions, no standard.

From what I can see, the "+" replacement is actually specified in
the HTML standard for application/x-www-form-urlencoded encoding
of submitted form data:

http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

It's not directly related to either URLs or HTTP.

robchur wrote:

If {{urlencode}} was changed so it switched spaces over to %20, would that be
sufficient?

omniplex wrote:

re #12: Most probably. I'm not sure why it strips trailing spaces,
but preserves (= encodes) leading spaces.

eric.frederich wrote:

I think I came across this same problem trying to use the following template.
The idea was so on a page for an rasterized image, the source could be provided
in a Visio / Power Point / Gimp / etc. file.
The problem is when the file name has a space in it, if I don't use urlencode,
it doesn't work at all on file names with a space.
If I use it with urlencode, it uses +'s instead of _'s and when I go to the
upload page the Destination file name has spaces in it.
Then when I upload the file I get a warning that the file name will be changed.
It winds up working out, but it would be nice if there was a function that
actually put underscores instead of plus signs.

-----Template-----

{|class="wikitable"

-
[[Media:{{{1}}}]]Download the {{{2}}} file
-
[{{fullurl:Special:UploadwpDestFile={{urlencode:{{{1}}}}}}} Upload]Upload

a new version of this file

-
[[:Image:{{{1}}}]]{{{2}}} file page
}

ayg wrote:

Problem: if we call urlencode on the parameter, multiple parameters can't be
attached. "action=edit&uselang=zh" would be escaped into gibberish. This could
of course be solved using {{fullurl: foo|bar=x|baz=y}} instead of

bar=x&baz=y}}, but that wouldn't be reverse-compatible. Seeing as this isn't

really needed what with {{urlencode:}}, should we break reverse compatibility
for it?

dto wrote:

For one, Brion doesn't seem to like {{fullurl:foo|bar=x|baz=y}} (see Bug
Activity for Bug 7348).

{{fullurl:Apple juice|action=edit}} now outputs "http://...Apple_juice&action=edit"; I've marked this bug as resolved.

ayg wrote:

What? This is about something different. Possibly WONTFIX, but definitely not FIXED.

stanley wrote:

(In reply to comment #0)
Instead of
[http://server/wiki/Special:Log/block?page=User:Locke Cole link]
I use
[http://server/wiki/Special:Log/block?page={{anchorencode:User:Locke Cole}} link]
which replace the space with underscore

ayg wrote:

Don't use anchorencode for that. It will fail on unusual characters. Use {{urlencode:}} for URLs, {{anchorencode:}} for anchors (stuff that goes after the # in a URL).

This is really not worth it by now.

What if fullurle only encoded spaces as "_" and possibly other special characters, but left &s, =s and #s alone? wouldn't that be backward compatible?