Page MenuHomePhabricator

Run a cleanup bot over all EXIF-rotated images on Commons, other wiki sites
Closed, ResolvedPublic

Description

A general refreshImageMetadata run on the server should help, but we also need to check for problems such as mistagged images. I'll see if I can whip up a bot to run through and build a report...


Version: unspecified
Severity: normal

Details

Reference
bz31509

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:56 PM
bzimport set Reference to bz31509.

What 'cleanup' is it supposed to do?
Undo all rotations automagically performed due to EXIF?

Would it be possible to query all images that have EXIF orientation other than "Normal/1" and were uploaded before deploying MW 1.18? Their thumbnails are now autorotated though physical images are probably straight. If so, would it be possible to reset or remove their orientation tags directly in database (that kind of cleanup) so that they weren't autorotated without being needed? Rotabebot on Commons is slowly resetting these now (reset for over 4000 images in the past days and the queue is lengthening). If possible to clean up the database that way, should be ensured that Rotatebot wouldn't rotate images in queue once nothing to reset.

Unless just undoing autorotation feature and redoing it per bug 32875 so that the images were asked to be physically rotated per EXIF data while being upload and without confusion that later thumbs are straight and physical image is not or vice versa.

That's my ideas after seeing quite some people disturbed on projects as their previously uploaded straight images are now on side or upside down. Sorry if I miss something important.

saibotrash wrote:

(In reply to comment #2)

Would it be possible to query all images that have EXIF orientation other than
"Normal/1" and were uploaded before deploying MW 1.18?

Did you know this?
http://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Maintenance_category_for_files_with_EXIF_rotation_other_than_0_degrees

Assigning to Sam to work with Luxo and others and investigate whether anything can/should be done to speed up the process. See http://commons.wikimedia.org/wiki/User:Rotatebot for more info about the existing Rotatebot solution, and http://commons.wikimedia.org/wiki/Commons_talk:Rotation as the hub for on-wiki conversation about this topic.

saibotrash wrote:

(In reply to comment #4)

Assigning to Sam to work with Luxo and others and investigate whether anything
can/should be done to speed up the process. See

For the next deploy of such a feature: the about 50k files should have been fixed before this had been deployed. ;-)

In addition to your link: If there are questions about Rotatebot: just post on its talk page. If there are questions/ideas for a systematic approach, as said, the existing topic at the bots request is the best place.

I tried to run refreshImageMetadata.php on all wikis after the 1.18 deployment, see bug 30961. I stopped because of a memory leak. I don't think it's a good idea to run it now, since it would only exacerbate the autorotation issue.

This has gone back to being run on Toolserver after Mark fixing a LVS regression with regards to upload speed.

I do have a clone bot on hume, but as the speed for uploads back from the toolserver is much nicer, this was stopped.

Seems there's a backlog building up, so worth keeping an eye on. Rotatebot is still running at a decent speed according to [1]

Lowering priority to normal for the time being

[1] http://commons.wikimedia.org/w/index.php?limit=50&tagFilter=&title=Special%3AContributions&contribs=user&target=Rotatebot

Perhaps I'm missing something, but I don't see how refreshImageMetadata.php would affect anything (in relation to this bug). We've been extracting rotation metadata in (almost) the same way since MediaWiki 1.5.

(In reply to comment #9)

Perhaps I'm missing something, but I don't see how refreshImageMetadata.php
would affect anything (in relation to this bug). We've been extracting rotation
metadata in (almost) the same way since MediaWiki 1.5.

Yes, that's true. It was the thumbnail cleanup script that Ariel ran that made this problem especially obvious.

RotateBot seems to be pretty much caught up now, and will probably be able to keep up with whatever remaining rotation requests come along. Marking as fixed for lack of more appropriate state. :)