Page MenuHomePhabricator

Links with non-ASCII characters do not work
Closed, ResolvedPublic

Description

To reproduce, go to https://de.wikipedia.org/wiki/Bundeswettbewerb_Mathematik and download it as PDF file (using the new rdf2latex writer). Open the created PDF file (I only have an Adobe Reader 10.1.2 to test) and hover over the links. Those with only ASCII characters are as expected (e.g. "Mathematikwettbewerb" links to "https://de.wikipedia.org/wiki/Mathematikwettbewerb"), but those with non-ASCII characters aren't, e.g. "Stifterverband für die Deutsche Wissenschaft" links to "file:///E|/þÿ" (this link seems to be relative to the PDF file). I also tested a random article from el.wikipedia, and all the links were messed up.


Version: unspecified
Severity: major

Details

Reference
bz71547

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:56 AM
bzimport set Reference to bz71547.

Hm, seems to be an issue with Adobe Reader, the first online PDF-to-HTML-converter I could find, handled the links correctly. Anyway, Adobe Reader should be important enough to make the PDF files compatible to it.

I can confirm the problem with evince/poppler on Linux. When clicking such a link I get:

Error when getting information for file '/var/tmp/��': No such file or directory

  • Bug 71589 has been marked as a duplicate of this bug. ***

That þÿ at the start of the link seems to be a Byte Order Mark encoded as UTF-16 BE, but interpreted as ISO/IEC 8859-1.

Change 165983 had a related patch set uploaded by Cscott:
PDF can't handle UTF-8 URLs.

https://gerrit.wikimedia.org/r/165983

Change 165983 merged by jenkins-bot:
PDF can't handle UTF-8 URLs.

https://gerrit.wikimedia.org/r/165983