Page MenuHomePhabricator

Uploads don't allow non-ASCII characters in filename
Closed, ResolvedPublic

Description

Depending on the used version either the original file may not contain non-ASCII characters or the target page name on the wiki. This was changed in Ib751ee3f4074a60f3b53b0afe3cc2dfc3e17b2f7 in pwb 2.0 so versions prior to that won't work with non-ASCII local filenames and versions with that won't work with non-ASCII wiki page names.

The problem is simply that the 'filename'-value in the header of the file/chunk entry (not to be confused with the 'filename' entry in the MIME request). For example:

Content-Type: image/jpeg
MIME-Version: 1.0
Content-disposition: form-data; name="file"; filename*=utf-8''%C3%9C.jpg
Content-Transfer-Encoding: binary

[… binary data …]

This would be the RFC2231 compliant encoding of a non-ASCII character, which would be used by default in Python 3. Python 2 instead does a strange encoding of the complete line (this may not represent the same text as above but similar):

Content-disposition: =?utf-8?b?Zm9ybS1kYXRhOyBuYW1lPSJmaWxlIjsgZmlsZW5hbWU9?=   
 =?utf-8?b?IsOcMi5qcGci?=

Both are not accepted by the MediaWiki server and are answered with:

badupload_file: File upload param file is not a file upload; be sure to use multipart/form-data for your POST and include a filename in the Content-Disposition header.

Or Python 2:

missingparam: One of the parameters filekey, file, url, statuskey is required

It is possible to leave it UTF8 encoded although that is (afaics) not compliant with the RFCs related to MIME which say that the header may only contain US-ASCII characters.

Unfortunately I'm not sure what mediawiki does with this so I don't if there is a better way, especially as Python 3 doesn't support 'bytes' in the header and otherwise it's not possible to get the value not reencoded there.


Version: core-(2.0)
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=73662

Details

Reference
bz73661

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:57 AM
bzimport set Reference to bz73661.
bzimport added a subscriber: Unknown Object (????).

Change 174677 had a related patch set uploaded by XZise:
[FIX] Upload: Support Unicode filenames

https://gerrit.wikimedia.org/r/174677

jayvdb set Security to None.
jayvdb removed a subscriber: Unknown Object (????).