Page MenuHomePhabricator

File storage uses (title,timestamp) as a unique key, but this is not unique
Open, HighPublic

Description

FileRepo::findFile appears to use the (title,timestamp) pair as a unique key to identify a particular version of a file. However, this is not necessarily unique. See for example https://commons.wikimedia.org/wiki/File:Treppe_22_22_test_upload.jpg where two versions were both uploaded at 2014-05-05T13:00:59Z.

For proper operation, the API requires some ordered unique identifier for file revisions. I imagine there are other places in the code that also require unique identifiers.

This appears to depend on bug 15441 (for the oldimage table), although I'm not familiar enough with the FileBackend code to know for sure whether that's generally the case or if it only applies to certain backends.


Version: 1.23.0
Severity: normal

Details

Reference
bz65264

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:24 AM
bzimport set Reference to bz65264.
bzimport added a subscriber: Unknown Object (MLST).

Change 133178 had a related patch set uploaded by Aaron Schulz:
Made LocalFile avoid duplicate (name,timestamp) pairs

https://gerrit.wikimedia.org/r/133178

Change 133178 merged by jenkins-bot:
Made LocalFile avoid duplicate (name,timestamp) pairs

https://gerrit.wikimedia.org/r/133178

This is probably a duplicate of some older bug.

So change 133178 fixed it going forward, which is good. Although I left a comment there.

We'll also need to do database cleanup for situations where this already exists, such as https://commons.wikimedia.org/wiki/File:Treppe_22_22_test_upload.jpg, to completely fix the bug. I also wonder if it's worth making the oi_name_timestamp index unique, although that would only detect the problem after the fact since the row for the current version isn't added to oldimage until a new version becomes current.