Page MenuHomePhabricator

action=upload overwrites files despite ignorewarnings is not set
Closed, DeclinedPublic

Description

Original bug title:
action=upload overwrites files despite ignorewarnings is not set

Follow-up from Bug 64883

Description:
UploadWizard does never set ignorewarnings when publishing files through API. And the API upload module is supposed to give a warning if a file exists. Thus, Upload Wizard isn't able [in theory] to overwrite files. But exactly this happened because the API returned "success" instead of a warning.

There are plenty of examples in Bug 64883


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=64883

Details

Reference
bz65338

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:12 AM
bzimport set Reference to bz65338.
bzimport added a subscriber: Unknown Object (MLST).

Just tried it locally. Attempting to upload File:Bug65338.png via API to my local wiki without specifying ignorewarnings correctly gives a warning that the file already exists and did not process the upload with a "Upload warnings: exists" error. So I'm closing this as WORKSFORME.

It may be that the situation you're complaining about is only triggered by certain filenames or the like, but in that case the bug is likely in the file upload code rather than in the API. You might try to see if the same bug occurs via Special:Upload to confirm that. Or it may be that UploadWizard actually is specifying ignorewarnings (there is certainly code to set that parameter it in the extension).

Also, bug 64883 is long and confused with many different issues being discussed. I'm not able to work out which things there might be relevant to this bug. In the future it would be helpful to copy the relevant information into the new bug, and also if possible to include a log of the exact API queries that can be used to reproduce the problem.

It may be that the situation you're complaining about is only triggered by
certain filenames or the like

Yes, I guess it has something to do with WMF setup *but* I cannot say for sure, hence this bug. What I can say for sure is that igorewarnings wasn't set and the file at https://commons.wikimedia.org/wiki/File:Treppe_2222_test_upload.jpg got through anyway.

UploadWizard actually is specifying ignorewarnings

In the following cases only:

  1. Uploading to stash: Upload through iframe
  2. Uploading to stash: FormData upload
  3. Publishing: ['was-deleted'] (the user is uploading a file under a title that previously existed but was deleted)
  4. Publishing: ['duplicate-archive'] (the a file with the same check sum was existing before on that wiki but is now deleted)

None of these 3 possibilities was the case at [[File:Treppe_2222_test_upload.jpg]]. I was also unable to reproduce this locally but it is an issue at Commons.

It would be helpful if you'd post the exact data that UploadWizard is posting (although you can redact the content of the file if it's large, and of course redact any tokens). I just tried uploading via API on Commons with https://commons.wikimedia.org/wiki/File:Anomie_test_for_bug_65338.png and got the correct error response when I tried to reupload a different version.

For good measure, I tried both direct upload and upload first to stash. Both worked correctly.

FYI, my upload data was:

{

'maxlag' => '5',
'file' => [ '/tmp/red.png', 'Anomie test for bug 65338.png' ],
'filename' => 'Anomie test for bug 65338.png',
'token' => [REDACTED],
'comment' => 'This file is a test for [[bugzilla:65338]]. Anomie will flag it for deletion when done. Sorry for the inconvenience.',
'watchlist' => 'nochange',
'format' => 'json',
'action' => 'upload'

}

UploadWizard does not set the maxlag param.

Requests:

action upload
comment User created page with UploadWizard
filekey 12anztrx74rw.voupfh.1178694.jpg
filename Koellner_dom_test.jpg
format json
text [...]

token [REDACTED]

action upload
comment User created page with UploadWizard
filekey 12anztspxrzw.nz2a22.1178694.jpg
filename Koellner_dom_test.jpg
format json
text [...]

token [REDACTED]

Results: http://pastebin.de/125362

What about the API calls that do the stash and return the filekey? Not that I'm expecting to find anything there, but for just in case.

(In reply to Brad Jorsch from comment #6)

What about the API calls that do the stash and return the filekey?

Request:
http://pastebin.de/125379
(with ignorewarnings because warnings should not prevent upload here)

Response:
http://pastebin.de/125380

Hmm. When I try to use UploadWizard to reproduce this issue, it refuses to even let me attempt the upload due to the duplicate filename. Are you doing something strange like submitting multiple uploads to the same name at the same time?

(In reply to Brad Jorsch from comment #8)

something strange like submitting multiple uploads to the same name at the same
time

I am using Upload Wizard. The files I choose to upload are named like that:

  • Treppe_2222 test upload.jpg
  • Treppe 2222 test_upload.jpg
  • Treppe 2222_test_upload.jpg

I upload these three files at once and Upload Wizard is "publishing" [moving from stash to public image storage] them "at the same time" [sending 3 requests at once]. I don't have the time to check but you possibly have to set "Maximum number of concurrent uploads" at https://commons.wikimedia.org/wiki/Special:Preferences#mw-prefsection-uploads to three.

Ok, I think we've gotten to the bottom of this now.

There's a race condition in the uploading code: it first checks for warnings and then processes the upload, and what's happening here is that one of your parallel requests is sneaking in an upload between the "check for warnings" and "process the upload" steps.

If the "process the upload" backend code were to have flags to say "do not overwrite" and "overwrite only", we could use that from the API code to fix this race condition.

(In reply to Brad Jorsch from comment #11)
Thanks for tracking this down, Brad.

If the "process the upload" backend code were to have flags to say "do not
overwrite" and "overwrite only", we could use that from the API code to fix
this race condition.

A new bug demanding that? The new bug blocking this bug?
Or can the API just block other uploads with the same title while one is being processed? Is there shared memory between the servers that run the API code except the database?

Just as a side note: It appears to happen more or less frequently that people upload two files under the same title by accident. This is, due to the flow of UploadWizard, which encourages uploading directly from your camera's folder. Then the user is renaming the files and can easily miss that there's a file with the same name already in the list. I’ve submitted a patch for UplaodWizard but I would welcome if we could make the API code also safe for this condition.

(In reply to Rainer Rillke @commons.wikimedia from comment #12)

A new bug demanding that? The new bug blocking this bug?

Go ahead.

Or can the API just block other uploads with the same title while one is
being processed? Is there shared memory between the servers that run the API
code except the database?

There is shared memory, but using it from the API in this way would probably not be the greatest idea. It would be better for such a thing to live in the backend upload-handling code.

Then the user is renaming the files and can easily miss
that there's a file with the same name already in the list.

Yeah, UploadWizard should be checking its list for duplicates (post-normalization, of course).