Page MenuHomePhabricator

download tool issue with Cyrillic encoding in filenames (wget)
Closed, InvalidPublic

Description

Author: a1

Description:
https://toolserver.org/~platonides/catdown/catdown.php tool do not recognize Cyrillic in names of files. For example it writes "Р%9FамС%8FС%82РЅРёРє_Р·Р°С%82опленнС%8BРј_РєРѕС%80аблС%8FРј_РІ_СеваС%81С%82ополе"
instead of "Памятник затопленным кораблям в Севастополе.JPG" Please, fix it.


Version: unspecified
Severity: normal

Details

Reference
bz40844

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:05 AM
bzimport set Reference to bz40844.
bzimport added a subscriber: Unknown Object (MLST).

As answered in the mailing list, that's a wget problem.

The list generated by my tool correctly uses:
http://upload.wikimedia.org/wikipedia/commons/a/ad/%D0%9F%D0%B0%D0%BC%D1%8F%D1%82%D0%BD%D0%B8%D0%BA_%D0%B7%D0%B0%D1%82%D0%BE%D0%BF%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%BC_%D0%BA%D0%BE%D1%80%D0%B0%D0%B1%D0%BB%D1%8F%D0%BC_%D0%B2_%D0%A1%D0%B5%D0%B2%D0%B0%D1%81%D1%82%D0%BE%D0%BF%D0%BE%D0%BB%D0%B5.JPG

The problem seems to lie in wget when extracting to a local filename.

If you are using *nix with a utf-8 filesystem, pass the
--restrict-file-names=nocontrol parameter to wget.

If you're using Windows you will end up with utf-8 encoded filenames, so
you'd need another script to decode them to the format used by Windows.

a1 wrote:

Unfortunately no. I could not understand how could i "pass the
--restrict-file-names=nocontrol parameter to wget".

Andrij, you would add that inside download.bat

I could try downloading the category for you if that helps.

I reported the problem upstream https://savannah.gnu.org/bugs/index.php?37564 This should be fixed at wget level.

Does this bug belongs to this bugzilla?

Andrij: Toolserver issues should be filed at https://jira.toolserver.org/secure/Dashboard.jspa

Closing as "INVALID" simply because this bug database is not the place where this report should be, but not because the report itself is invalid.