Page MenuHomePhabricator

Optionally enable urldecode for external links
Open, MediumPublicFeature

Description

Author: cnit

Description:
Optionally enable urldecode for non-ASCII external links

At a some wiki site I add external links to another wikis (primarily in Russian). Such wiki have Cyrillic Titles, which links contain utf8 entities that are encoded (%xx). One may add such wiki to interwiki table to decode Title names. However, adding of every wiki is a bit tiresome. Browsers, like Firefox, already properly decode these URLs in their address line. I suggest to perform urldecode for such links. This way, Cyrillic external links become readable in generated html page. One may introduce a new $wgExternalLinksDecode, if such vehavior is undesired by default.


Version: 1.17.x
Severity: enhancement

attachment externallinksdecode.patch ignored as obsolete

Details

Reference
bz25934

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:21 PM
bzimport set Reference to bz25934.
bzimport added a subscriber: Unknown Object (MLST).

Just to clarify, this is for decoding the text part of the link, not the url in the href?

The idea itself sounds sane if the user just writes a url in the wiki (at least imho). However i don't think we'd want to url-decode something like:

[http://example.com some text for the link with %25 in it]

If the user specified the text for the link, we should assume they know what they are doing and not decode it. (your patch would decode both).

cnit wrote:

Firefox is performing entities decode for URL in address line. For example, try to place the following link to wiki page (even without the patch), then open it and look at address line:
http://ru.wikipedia.org/wiki/%D0%94%D1%80%D0%BE%D1%84%D0%B0
Series of hex codes %xx were replaced with Cyrillic chars, which are readable to anyone who knows Cyrillic alphabet.

However, when you copy/paste such URL to text editor back from address line, %xx will reappear again - so internally that is the same binary representation, the decoding is preformed only for visualization.

Opera and Safari probably does this, too. IE8 - does not, haven't checked IE9, yet.

[http://ru.wikipedia.org/wiki/%D0%94%D1%80%D0%BE%D1%84%D0%B0 %D0%94%D1%80%D0%BE%D1%84%D0%B0], the description is not decoded with my patch, only URL. Quite opposite, however probably matches Firefox logic.

I'm in favor of this idea, and the implementation is only for links in wikicode without text. I think that with all the international versions we have, this would be a welcome change for many.

Whoops. You're right this doesn't affect [http://example.com %E0%B4%B3%E0%B5%8D%E2%80%8D] style links since the parser sets the $escape argument for false for those kind of links. However it still seems a bit weird, as if I do something like $sk->makeExternalLink( "http://example.com/some_url", "some text that is not a url, entered by the user, containing a %HH code" ); in an extension, the result would probably not be what is expected.

cnit wrote:

urldecode of text for local utf8 characters, just as in major browser's address line

improves the readability of such links a lot.

attachment externallinksdecode2.patch ignored as obsolete

cnit wrote:

New patch which should have better compatibility to the existing Linker / Skin usage.

Stumbled on this in bugzilla... I like the basic idea of the patch, but there's a couple of issues which'll need to be worked out.

First, not all URLs with encoded characters are encoded in UTF-8... while we like to hope that most of them are in this day and age, there's no guarantee. Russian, Japanese, Chinese, etc sites may still use other national encodings, especially on older links...

Reasonable behavior would at least need to check for UTF-8 validity to avoid outputting garbage characters.

Second, there are lots of meaningful characters in URLs where the difference between being encoded and not actually changes the URL; for instance Firefox will show

http://en.wikipedia.org/wiki/What%27s_Eating_Gilbert_Grape%3F

as:

http://en.wikipedia.org/wiki/What's_Eating_Gilbert_Grape%3F

and not as:

http://en.wikipedia.org/wiki/What%27s_Eating_Gilbert_Grape?
which would actually point to "[[What's Eating Gilbert Grape]]" with an empty query string on the end if you copy-pasted the text.

Third, if the above are resolved, I'd probably be pretty happy with *not* adding the parameter on makeExternalLink() -- it's probably sane behavior for the automatic formatting of the bare link display in pretty much all cases.

cnit wrote:

Try to urldecode external links similar to the way Firefox does

Useful to improve readability of local UTF-8 encoded external links. Made with 1.17 branch.

Attached:

sumanah wrote:

Dmitriy, thank you for the patch. I'm sorry it has taken so long for us to respond to you!

Because it's been so long, when I tried to apply your patch, it didn't apply cleanly to trunk. But before you try to revise it so it applies, will you come into the MediaWiki-General channel on FreeNode IRC and ask for a more thorough review of your patch? That way you won't waste time redoing work. Thanks!

Removing bug 27292 as blocker, this has nothing to do with skins. (I don't think there's a "Parser" tracking bug, at least I didn't find one.)

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM
Aklapper removed a subscriber: wikibugs-l-list.