Page MenuHomePhabricator

File extension letter case treatment should be unified on Commons
Closed, ResolvedPublic

Description

Currently, the UploadWizard accepts an upper case file name extension (i.e. JPG), does not change it but neither allows the user to do so (see bug 34703). Therefore, it is possible to upload two files, one ending in .jpg and the other .JPG.

However, elsewhere on Commons, such as in the RenameLink gadget or the tool used by filemovers, .jpg and .JPG are considered the same and the tools won't allow renaming between these two variants, therefore making it impossible to repair consistency if a series of pictures have been uploaded using one case with only one or a few of them accidentally using another case. All these tools seem to share a common "file name sanitization" module, as reported at http://commons.wikimedia.org/wiki/User_talk:Rillke#RenameLink_forces_case_in_file_extension

As evidenced at https://commons.wikimedia.org/wiki/User_talk:Blahma#File:Brno.2C_T.C3.A1bor_15.jpg file movers need to rename files manually if they are requested to change letter case in file extension, such as renaming "foo.JPG" to "foo.jpg".

File names should be treated equally everywhere on Commons, so if file name extensions get "normalized" by file moving tools, the same "sanitization" should be performed in the UploadWizard.


Version: unspecified
Severity: normal

Details

Reference
bz40326

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:58 AM
bzimport added a project: UploadWizard.
bzimport set Reference to bz40326.

Bryan.TongMinh wrote:

There are some checks in place, but they only work if the existing file has the normalized extension, i.e., a warning will be given if you try to upload X.JPG and X.jpg exists, but not the other way around.

It is probably a good idea to just give a warning if the same filename exists with any extension. That would need support in FileRepo to search files without extension. I'll have a look at it.

Thank you, Bryan, for your quick response and act. It would inded be nice to see a warning when a similarly named file already exists.

However, this does not solve the problem completely, because it means that the Upload Wizard will go on stating that "file names differing only in extension case are possible", while the tools that might be used subsequently (file renaming) state "file name extensions must be normalized as if they were case-insensitive". This means that confusion will persist, unless a consensus on this is found across Commons. Perhaps we should invite more people into this discussion?

The easiest solution is, obviously, to equip the Upload Wizard with the same "sanitization" mechanism which is already used by the file mover, but I understand that this is not something what you are willing to do at the moment, am I right?

Bryan.TongMinh wrote:

I'm not aware of any software restrictions on file moving that perform the normalizations you describe. As far as I am aware it is possible to move a file to A.JPG if the file A.jpg already exists.

I have found the "cleanFileName" function from http://commons.wikimedia.org/wiki/MediaWiki:Gadget-AjaxQuickDelete.js being called as a part of https://commons.wikimedia.org/wiki/MediaWiki:RenameRequest.js which holds the code for the RenameLink gadget. Indeed, when you are not a file mover, visit a file page and click the "Move" tab, the gadget's dialog appears and there if you change the value in the "Enter the new name" field to "A.JPG" and leave that field by focusing another one, cleanFileName gets called and the field's value is automatically normalized to "File:A.jpg".

Yes, I could insert the Rename template manually, but the gadget seems to be more efficient.

And, I am not a filemover so I cannot check myself, but User:Taketa has suggested at http://commons.wikimedia.org/w/index.php?title=User_talk%3ABlahma&diff=77628385&oldid=77600880 that the same normalization occurs in the actual filemoving interface (and is the cause of .JPG and .jpg considered "identical"). Could someone please recheck this?

(In reply to comment #2)

I21eddc5d

Assigning bug to author ( of gerrit change 24124 ), +patch-in-gerrit

Is this fixed? The patch got merged.

(In reply to comment #1)

There are some checks in place, but they only work if the existing file has
the
normalized extension, i.e., a warning will be given if you try to upload
X.JPG
and X.jpg exists, but not the other way around.

It is probably a good idea to just give a warning if the same filename exists
with any extension. That would need support in FileRepo to search files
without
extension. I'll have a look at it.

People seem to want to be able to do that though. See bug 46741

Looks like this is just about done - I don't think it's necessary to change the filenames to have the same extensions in UW. I got the warning reliably by uploading a .jpg and .JPEG with the same name.

Gilles raised the priority of this task from Medium to Unbreak Now!.Dec 4 2014, 10:11 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Medium.Dec 4 2014, 11:20 AM

It seams this task was only closed because of this UploadWizard T36703 ?!? Uploading uppercase file extension is nevertheless still possible with the normal classic upload form and any other common upload-tool (I guess). I think the UploadWizard was only a example here and the really still open dupe task is T34660.