Page MenuHomePhabricator

Moving files results in broken descriptions and deleted images
Closed, ResolvedPublic

Details

Reference
bz40927

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:11 AM
bzimport set Reference to bz40927.
bzimport added a subscriber: Unknown Object (MLST).

sumanah wrote:

Marco, thanks for reporting this. Can you find any pattern in which files are affected?

sumanah wrote:

For ease in Bugzilla searching, the error message:

API request failed (internal_api_error_DBQueryError): Database query error <i>at Sun, 07 Oct 2012 15:27:11 GMT</i>

Task: movePage
NextTask: removeTemplate
LastTask: reloadPage
Page: File:Hl._Lukas_-_Reliquienschrein_und_Prozessionskerzenhalter.jpg
Skin: vector
[{{fullurl:Special:Contributions|target=Aa1bb2cc3dd4ee5&offset=20121007152843}} Contribs before error]

So, unless this is happening a lot, I think this might have just been a one-time API service outage/hiccup.

Bryan.TongMinh wrote:

(In reply to comment #2)

So, unless this is happening a lot, I think this might have just been a
one-time API service outage/hiccup.

Still, the file moving process should really be resilient against these situations.

Marco: Is this problem reproducible?

(In reply to comment #0)

Moving of some files results in erroneous behavior:
Only partial contents of the files get moved to the new name. The rest of the
information is either lost or kept on the old file name.

What do you mean by "partial contents of the files"?

Sun Oct 7 15:26:38 UTC 2012 srv214 commonswiki LocalFile::lock 10.0.6.41 1205 Lock wait timeout exceeded; try restarting transaction (10.0.6.41) SELECT 1 FROM image WHERE img_name = 'Hl._Lukas_-_Reliquienschrein_und_Prozessionskerzenhalter.jpg' LIMIT 1 FOR UPDATE

Sun Oct 7 15:28:02 UTC 2012 srv252 commonswiki LocalFile::lock 10.0.6.41 1205 Lock wait timeout exceeded; try restarting transaction (10.0.6.41) SELECT 1 FROM image WHERE img_name = 'Hl._Lukas_-_Reliquienschrein_und_Prozessionskerzenhalter.jpg' LIMIT 1 FOR UPDATE

(In reply to comment #1)

Marco, thanks for reporting this. Can you find any pattern in which files are
affected?

(In reply to comment #4)

Marco: Is this problem reproducible?

Sorry, I did not find any pattern and I could also not reproduce the behavior.

(In reply to comment #5)

What do you mean by "partial contents of the files"?

Description is one part and the image file is the other part. So only one part gets moved and the other stays as far as I could observe.

Title.php has to deal with some transactional things:
a) The file moves
b) The file DB metadata moves
c) The actual description page move

The first two are wrapped in a transaction as best as possible, it's possibly for partitions/errors to cause:
a) no move but duplicate files to be at the destination (failure to delete target files on error)
b) files at the destination and description/metadata still at the target (failure to commit()).

The first two are in a transaction by themselves for contention reasons. One could wrap some code in Title::moveTo() in $file->lock()/$file->unlock() calls to get all three in on DB transaction, but that would hold locks while thumbnails are purged. For this to work, the purges would need to be deferred till the description page is moved. Even still, it would be vulnerable to network partitions (failure to revert files on error or commit failure).

People have made various tweaks to this area to "improve" it (one of which caused a race condition that resulted in numerous lost files). To really fix this, we'd want to redo the file layout so we only create files (under hash names or uuid4s) but don't mutate them. A MW layor would do the "pretty name" and authentication (public vs deleted files) mapping to the actual files. File delete/restore/move would just be shuffling DB rows around (plus some purges outside of the transaction), which could be in one ACID transaction.

I have my doubts that anyone is willing to invest in redoing are stupid file layout anytime soon.

Not reproducible for the reporter, and no indicators that this happens a lot. => Lowering priority.

Two more failed moves have been added (linked from post above). All three failed moves have one commons issue, one move entry in the history for the source page but two move entries on the target page. The move order did not move the original description page but the new description page after file deletion.

  • Bug 44831 has been marked as a duplicate of this bug. ***

Lost file description pages can have legal implications.

(In reply to comment #15)

Has this occurred recently?

Yes, at https://commons.wikimedia.org/wiki/File:Cambridge_Public_Libary_-_Cambridge,_MA_-_DSC00103.JPG , the file disappeared after moving.

(In reply to Bawolff (Brian Wolff) from comment #17)
+++ support, support, support

this has been something i was always in favour of; it allows using MW with IIS properly, file renaming operations will be much faster etc. etc. -- push this!

We're trying to get tests built into it first - I took a stab but was struggling to get it working right.

(In reply to Bawolff (Brian Wolff) from comment #17)

Aaron's https://gerrit.wikimedia.org/r/#/c/127460/ will probably help this
situation.

Patch still awaiting review / merge...

Change 153990 had a related patch set uploaded by Aaron Schulz:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/153990

Change 153990 merged by jenkins-bot:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/153990

Change 155575 had a related patch set uploaded by Aaron Schulz:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/155575

Change 155575 merged by jenkins-bot:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/155575

All patches merged - what's left to do here?

(In reply to Andre Klapper from comment #25)

All patches merged - what's left to do here?

No reply to comment 25. Assuming this bug is FIXED.
If that is not the case: Please reopen and elaborate what is left to do here to get this report fixed.

Gilles raised the priority of this task from Medium to Unbreak Now!.Dec 4 2014, 10:11 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Medium.Dec 4 2014, 11:21 AM