Page MenuHomePhabricator

colon sometimes URL encoded "%3A" right there in the browser URL entry area
Closed, InvalidPublic

Description

Details

Reference
bz17680

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:32 PM
bzimport set Reference to bz17680.
bzimport added a subscriber: Unknown Object (MLST).

Related to Bug #17681, Bug #17712.

ayg wrote:

This is because people are using urlencode() instead of wfUrlencode(), probably. It should be easy enough to fix on a case-by-case basis.

I'm sorry but I cannot find even the one spot in the code where this is doing this.
The urlencode()s I saw act on what comes after the colon.
I have replaced the URL with zh.wikipedia's for you to test easier.

Please confirm this is a Firefox bug.
Visit http://transgender-taiwan.org/index.php?title=Special:Allpages
Now hit the Go [提交] button.

Why is there the ugly %3A instead of the colon in Firefox:
http://transgender-taiwan.org/index.php?title=特殊%3A所有頁面&from=English&to=首頁&namespace=0
http://transgender-taiwan.org/index.php?title=特殊:__所有頁面&from=English&to=首頁&namespace=0
vs. the latter (lined up by me) seen in emacs-w3m. What does IE show?

If it is a Firefox bug, somebody please report it, because I can't get a word in edgewise there.

(In reply to comment #6)

Please confirm this is a Firefox bug.
Visit http://transgender-taiwan.org/index.php?title=Special:Allpages
Now hit the Go [提交] button.

Why is there the ugly %3A instead of the colon in Firefox:
http://transgender-taiwan.org/index.php?title=特殊%3A所有頁面&from=English&to=首頁&namespace=0
http://transgender-taiwan.org/index.php?title=特殊:__所有頁面&from=English&to=首頁&namespace=0
vs. the latter (lined up by me) seen in emacs-w3m. What does IE show?

If it is a Firefox bug, somebody please report it, because I can't get a word
in edgewise there.

This happens in IE for me as well; even worse, IE converts all the Chinese (?) characters to %xx pairs while Firefox just encodes the : to %3A. Seems to be a MW bug, I'll see if I can track it down.

Doesn't seem to be a MW bug after all. The HTML for the quick search form is:
<form action="/t/index.php" id="searchform"><div>

				<input type='hidden' name="title" value="Special:Search"/>

...
So MW's not at fault here.

I suspect this is related to the fact that the search form uses a POST request. Either way, it's a browser bug (a widespread one, it seems), not a MediaWiki bug, so any discussion about it shouldn't happen here.

ayg wrote:

It's not a *bug*, it just looks a little ugly. The two URLs in question are equivalent, there's no standard that requires the browser to prefer one to the other for display.

":" is a reserved character and will be percent-encoded by any browser that
follows the specification. See http://tools.ietf.org/html/rfc3986#section-2.2
and [[Percent-encoding#Types_of_URI_characters]]

OK, but what about e.g.,
http://en.wikipedia.org/w/index.php?title=User_talk:Jidanni&action=history
Here the colon is also in the query string, but do we ever see it end
up as %3A in any browser's URL bar, unless we type in in there
ourselves?

ayg wrote:

(In reply to comment #12)

":" is a reserved character and will be percent-encoded by any browser that
follows the specification. See http://tools.ietf.org/html/rfc3986#section-2.2
and [[Percent-encoding#Types_of_URI_characters]]

If anything, to the contrary, agents are not supposed to encode or decode colons, specifically because they're reserved:

URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent.  Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications.

If : is indeed considered reserved here, then encoding it would change the meaning of the URL. Compare:

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.

Firefox is not a "URI producing application" here -- it's just reproducing the URI provided by MediaWiki, and so this doesn't apply to it. It might apply to MediaWiki (which is why I'm addressing this argument at all), but a) it says "should", so we can always ignore it if it seems to be safe and makes the URLs prettier ;), and b) there's the exception "unless these characters are specifically allowed by the URI scheme to represent data in that component."

If we consult RFC 2616, which defines the http: scheme, we find in sections 3.2.1 and 3.2.2[1] that its production for the path part of the URI is that of abs_path from RFC 2396. If we look there, we find[2] that an abs_path can contain any pchar, with pchar being defined as

pchar         = unreserved | escaped |
                ":" | "@" | "&" | "=" | "+" | "$" | ","

Therefore I conclude that in "http:" URIs specifically, colons are not reserved in the path part, and it's perfectly legitimate for us to emit them unencoded, and for clients to encode and decode them freely (which is what Firefox seems to do).

[1] http://tools.ietf.org/html/rfc2616#section-3.2.1
[2] http://tools.ietf.org/html/rfc2396 (search for abs_path and follow the productions)

(In reply to comment #13)

OK, but what about e.g.,
http://en.wikipedia.org/w/index.php?title=User_talk:Jidanni&action=history
Here the colon is also in the query string, but do we ever see it end
up as %3A in any browser's URL bar, unless we type in in there
ourselves?

This is not a MediaWiki problem. Complain to Mozilla.

This is not a MediaWiki problem. Complain to Mozilla.

"Dear Mozilla, you made my colon pretty. I want it ugly." ????

I don't think you understand me.

I'm trying to say that I don't like %3A's when they could just be :'s.

I am still curious:
Could MediaWiki be adjusted to stop the phenomenon?

(In reply to comment #15)

This is not a MediaWiki problem. Complain to Mozilla.

"Dear Mozilla, you made my colon pretty. I want it ugly." ????

I don't think you understand me.

I'm trying to say that I don't like %3A's when they could just be :'s.

I am still curious:
Could MediaWiki be adjusted to stop the phenomenon?

As people have repeatedly said on this bug: the problem is not with MediaWiki, and no, there is nothing MediaWiki could do to stop browsers from mangling colons. All it could really do is serve colons rather than %3A's in links and forms, and it's already doing that.