Page MenuHomePhabricator

Unable to move a corrupt file on commons: [[:commons:File:Нева.jpg]]
Closed, ResolvedPublic

Description

The first revision is missing and I can't delete this revision. Moving the whole file also fails. If it would not be in use, I would try to temporarily delete the whole file and restore only the non-corrupt version. But Commons Delinker is very fast :P

http://commons.wikimedia.org/wiki/File:Нева.jpg

The person requesting the move wanted one of the following new names:

  • File:Neva River and Palace Embankment, 2009-07-24.jpg
  • File:Вид на Неву и Дворцовую наб. с Троицкого моста, 2009-07-24.jpg

Version: unspecified
Severity: normal

Details

Reference
bz34934

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:09 AM
bzimport set Reference to bz34934.

See also full list of files with empty oi_archive_name
http://toolserver.org/~kalan/commons-broken.txt (tab-separated, 1864 row)

(select was "select * from oldimage where oi_archive_name=''")

next file that exists but is not shown http://commons.wikimedia.org/wiki/File:Khajuraho_India_-_Lakshman_Temple_-_Sculpture_15.JPG

http://upload.wikimedia.org/wikipedia/commons/archive/f/f4/20120324100019!Khajuraho_India_-_Lakshman_Temple_-_Sculpture_15.JPG

And here it turns out a real problem of the timestamp based storage:
https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo|info&iiprop=size|timestamp|url&iiurlwidth=365&titles=File:%D0%9D%D0%B5%D0%B2%D0%B0.jpg&iilimit=10&format=jsonfm

{

"timestamp": "2011-09-23T21:32:56Z",
"size": 3352071,
"width": 3072,
"height": 2304,
"thumburl": "https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/archive\/a\/a5\/20110923213256%21%D0%9D%D0%B5%D0%B2%D0%B0.jpg\/365px-%D0%9D%D0%B5%D0%B2%D0%B0.jpg",
"thumbwidth": 365,
"thumbheight": 274,
"url": "https:\/\/upload.wikimedia.org\/wikipedia\/commons\/archive\/a\/a5\/20110923213256%21%D0%9D%D0%B5%D0%B2%D0%B0.jpg",
"descriptionurl": "https:\/\/commons.wikimedia.org\/wiki\/File:%D0%9D%D0%B5%D0%B2%D0%B0.jpg"

},
{

"timestamp": "2011-09-23T21:32:56Z",
"size": 3310009,
"width": 3072,
"height": 2304,
"thumburl": "https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/archive\/a\/a5\/\/365px-%D0%9D%D0%B5%D0%B2%D0%B0.jpg",
"thumbwidth": 365,
"thumbheight": 274,
"url": "https:\/\/upload.wikimedia.org\/wikipedia\/commons\/archive\/a\/a5\/",
"descriptionurl": "https:\/\/commons.wikimedia.org\/wiki\/File:%D0%9D%D0%B5%D0%B2%D0%B0.jpg"

}

Yes, the same timestamp -> result:
comment #1

I ran a cleanup script on all orphaned commons files and didn't fix this particular file row. I'll need to see how many empty oi_archive_name rows are left and consider just deleting those rows.

Directly deleting the rows? At least you should keep a backup before...

(In reply to comment #5)

and consider just deleting those rows.

This will lead to inconsistency with OploadLog, is not?

Note what are two difference cases:

Example #1 - http://commons.wikimedia.org/wiki/File:Нева.jpg - empty 'oi_archive_name' value

Example #2 (comment #1) - http://commons.wikimedia.org/wiki/File:Viestura_ordenis.jpg - valid 'oi_archive_name' value, but "File not found" (corrupt file storage)

Moving this down in priority only because Aaron is working on what he believes is the root cause (bug 36132), and I want to move this down below that on his list of stuff to do. With any luck, this will automatically get fixed as a result of that.

(In reply to comment #6)

Directly deleting the rows? At least you should keep a backup before...

Of course. Though I don't plan on doing that anymore after looking at some of the examples more. It nice to at least see some of the history more clearly (without consulting action=history), so I'd prefer to leave the stub rows.

What can be more easily fixed is just changing the doDBUpdates() function of LocalFileMoveBatch. The line "$status->failCount += $total - $affected;" results in negative values in this case, causing the 'imageinvalidfilename' error.