Page MenuHomePhabricator

Wikipedia Mobile cannot encode Chinese characters correctly
Closed, ResolvedPublic

Description

See the bottom of the given page, that "View this page on regular Wikipedia".

http://zh.wikipedia.org/w/mobileRedirect.php?to=http://zh.wikipedia.org/wiki/%26%23x4EA4%3B%26%23x6D41%3B%26%23x96FB%3B

This link is bad.


Version: .5
Severity: major
URL: http://zh.m.wikipedia.org/wiki/%E4%BA%A4%E6%B5%81%E9%9B%BB

Details

Reference
bz21976

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:49 PM
bzimport set Reference to bz21976.

works for me (Safari). What kind of browser are you using ?

At least it doesn't work for me with Opera Mini (on a smartphone). I took this URL on PC; maybe this URL differs from the one I got with Opera Mini.

Tested with Opera Mini and reproducible with Safari.

Page:
http://zh.m.wikipedia.org/wiki/字

Which is the bottom link from http://en.wikipedia.org/wiki/Zi

"View this page" links to: http://zh.wikipedia.org/w/mobileRedirect.php?to=http://zh.wikipedia.org/wiki/%25E5%25AD%2597

This malformed URL lands me on page: http://zh.wikipedia.org/wiki/字

The URL should have been been http://zh.wikipedia.org/wiki/%E5%AD%97

%25E5%25AD%2597 decoded == %E5%AD%97 so it's URL encoded twice.

Its behaviors seem to differ from each other among browsers.

ninniuz wrote:

The problem is the following:

from _footmenu_simple.html.haml

the href for that anchor is retrieved calling -> temp_url(@article.title)
where

def temp_url(path)
  %|#{redirect_url}?to=#{path_site}/wiki/#{path_encoded(path)}|
end

and

def path_encoded(path)
  CGI::escape(path)
end

But

path = @article.title is using HTML entities in the form "&#x<hex value>;" and when calling CGI::escape(path) those chars get URL encoded (that is using %<code> form).

Before calling CGI::escape the @article.title should be HTML unescaped (note CGI::unescapeHTML is not working at all).
Maybe you want to check http://po-ru.com/projects/html-entities/

  • Bug 21774 has been marked as a duplicate of this bug. ***
  • Bug 21473 has been marked as a duplicate of this bug. ***

i'll be fixing this. I'll find a way to retrieve the canonical pagename from the javascript options. At least that will be correct and safe.

I hope to have fixed this with:

http://bit.ly/6Pb4v6 (not yet deployed)

http://bit.ly/6SAXUG (not yet deployed)

  • Bug 22045 has been marked as a duplicate of this bug. ***