Page MenuHomePhabricator

thumb.php on Wikimedia Commons is serving broken images (thumbnail generation fails)
Closed, DeclinedPublic

Description

Author: laurent.jauquier

Description:
Hello and sorry for my limited English.

I've written a little script that I use to automatically retrieve the Wikimedia Commons "picture of the day" and save a local copy on my server in different sizes (width=200px, width=400px and width=1024px).

I call the file with this URL: $distantfile = "http://commons.wikimedia.org/w/thumb.php?f=".$filename."&w=400" (replacing "400" with whatever I want: 200, 400 or 1024. (Thumb.php is a script used to resize images)

My little script has been working fine for two years. However I've noticed several issues since ~ day 20th of October. It looks like thumb.php on Commons is now repeatedly generating errors with some sizes.

Example with the “picture of the day” of today (try it yourself in your web browser):

http://commons.wikimedia.org/w/thumb.php?f=Albrecht_of_Brandeburg_Duerer_VandA_E.653-1940.jpg&w=400 ==> ERROR

http://commons.wikimedia.org/w/thumb.php?f=Albrecht_of_Brandeburg_Duerer_VandA_E.653-1940.jpg&w=401 ==> OK

Please not that I am not a developer.


Version: wmf-deployment
Severity: major

Details

Reference
bz42047

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:00 AM
bzimport set Reference to bz42047.

https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Albrecht_of_Brandeburg_Duerer_VandA_E.653-1940.jpg/400px-Albrecht_of_Brandeburg_Duerer_VandA_E.653-1940.jpg this seems to be there and working. I don't know why getting it via thumb.php gices an error message.

(What I did: action=purge on the image, then add a thumb link with preview on random commons page, for the given image with 400px resolution, and it showed up. I did not check to see if the thumb was already there.)

Can you provide a couple more examples?

(In reply to comment #0)

http://commons.wikimedia.org/w/thumb.php?f=Albrecht_of_Brandeburg_Duerer_VandA_E.653-1940.jpg&w=400

> ERROR

Would have been interesting to know the *exact* error message here. :)
Does this still not work for you?

Likely a duplicate of one of the "dependencies" of bug 41371 but hard to tell.

the error in the logs (fluorine, thumbnail.log) is

convert: missing an image filename `/tmp/transform_29c41e1576d1-1.jpg' @ error/convert.c/ConvertImageCommand/3011." from "'/usr/bin/convert' -quality 80 -background white -define jpeg:size=57x119 '' -thumbnail '57x119!' -set comment 'File source: http://commons.wikimedia.org/wiki/File:Landvoogden_Albrecht_en_Isabella_van_Oostenrijk.jpg' -depth 8 -sharpen '0x0.8' -rotate -0 '/tmp/transform_29c41e1576d1-1.jpg' 2>&1"

which usually means some other thing went wrong, like a ulimit or space in /tmp or whatever.

bah, wrong albrecht. someday I'll learn how to read, sorry for the spam.

The URL currently works, so it's hard to tell what went wrong. Is it a text error or a corrupted image? If it's text, what does it say?

Next time this happens, please copy the output of the error (preferrably the source, not just what your browser shows) and paste it here.

laurent.jauquier wrote:

Hello here is another example. The following file couldn't be generated when my script called it on day 8 of November 2012.

http://commons.wikimedia.org/w/thumb.php?f=Aqueduc_Luynes.jpg&w=400

And as you can see it is still returning an error.

More examples:

http://commons.wikimedia.org/w/thumb.php?f=Albrecht_of_Brandeburg_Duerer_VandA_E.653-1940.jpg&w=200 (is still failing)

http://commons.wikimedia.org/w/thumb.php?f=Ensenada_fish_market_2.jpg&w=200 (is still failing)

http://commons.wikimedia.org/w/thumb.php?f=Lake_Kinney_mit_Mount_Whitehorn.jpg&w=200

...

...

However there are a few other files that were returning an error when my script called them and that seem to work now.

Examples:

http://commons.wikimedia.org/w/thumb.php?f=Lake_Agnes_im_Banff_National_Park.jpg&w=400 (was returning a broken image on day 5 of November 2012 but works now)

http://commons.wikimedia.org/w/thumb.php?f=2012-07-22%2015-37-00-fort-giromagny.jpg&w=400 (was returning a broken image on day 28 of October 2012 but works now)

http://commons.wikimedia.org/w/thumb.php?f=Amphiprion_sandaracinos_on_Heteractis_crispa.jpg&w=400

laurent.jauquier wrote:

As I've already said, this has been working very well for 2 years. I never had such issues before ~ days 18-25 of October 2012 (hard to say when it really begun).

The first one has:
00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 01 2c |......JFIF.....,|
00000010 01 2c 00 00 ff fe 00 48 46 69 6c 65 20 73 6f 75 |.,.....HFile sou|
00000020 72 63 65 3a 20 68 74 74 70 3a 2f 2f 63 6f 6d 6d |rce: http://comm|
00000030 6f 6e 73 2e 77 69 6b 69 6d 65 64 69 61 2e 6f 72 |ons.wikimedia.or|
00000040 67 2f 77 69 6b 69 2f 46 69 6c 65 3a 41 71 75 65 |g/wiki/File:Aque|
00000050 64 75 63 5f 4c 75 79 6e 65 73 2e 6a 70 67 ff e2 |duc_Luynes.jpg..|
00000060 0c 58 49 43 43 5f 50 52 4f 46 49 4c |.XICC_PROFIL|
0000006c

The second one:
00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 01 2c |......JFIF.....,|
00000010 01 2c 00 00 ff fe 00 68 46 69 6c 65 20 73 6f 75 |.,.....hFile sou|
00000020 72 63 65 3a 20 68 74 74 70 3a 2f 2f 63 6f 6d 6d |rce: http://comm|
00000030 6f 6e 73 2e 77 69 6b 69 6d 65 64 69 61 2e 6f 72 |ons.wikimedia.or|
00000040 67 2f 77 69 6b 69 2f 46 69 6c 65 3a 41 6c 62 72 |g/wiki/File:Albr|
00000050 65 63 68 74 5f 6f 66 5f 42 72 61 6e 64 65 62 75 |echt_of_Brandebu|
00000060 72 67 5f 44 75 65 72 65 72 5f 56 61 |rg_Duerer_Va|
0000006c

The Content-Length header on both is 108, which if I recall correctly is the size of the HTML error page that Swift sends.

This looks familiar, I think we were debugging a similar issue with Aaron. I think what happens is that MediaWiki takes the headers from a (failed in this case) HEAD and transplants a content from a successful GET, which results in a wrong content-length even with the correct body. But I may be wrong, Aaron is the authoritative source here :-)

laurent.jauquier wrote:

I think you're on the right track. Every file that failed to generated has been saved on my server with a size of 108 B.

Laurent: It happens now because an upgrade of the servers (and the software that handles image processing) was done in late October.

Not sure if this is the same problem, but as we don't have an error message here I"m adding this comment:

From https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=82888840#File:Lamppost-singapore.jpg :

The 199px thumbnail worked, the 200px one at
http://upload.wikimedia.org/wikipedia/commons/thumb/2/29/Lamppost-singapore.jpg/200px-Lamppost-singapore.jpg gave this error:

Traceback (most recent call last):

File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 382, in handle_one_response
  result = self.application(self.environ, start_response)
File "/usr/local/lib/python2.7/dist-packages/wmf/rewrite.py", line 368, in __call__
  resp = self.handle404(reqorig, url, container, obj)
File "/usr/local/lib/python2.7/dist-packages/wmf/rewrite.py", line 197, in handle404
  upcopy = opener.open(encodedurl)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
  response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
  '_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
  result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
  return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
  raise URLError(err)

URLError: <urlopen error [Errno 111] ECONNREFUSED>

So viewing that thumbnail gives the following in the exception logs:

2012-11-15 23:54:25 srv220 commonswiki: [2d6cea8a] /w/thumb.php?f=Aqueduc_Luynes.jpg Exception from line 61 of /usr/local/apache/common-local/php-1.21wmf4/includes/media/ImageHandler.php: No width specified to ImageHandler::makeParamString
#0 /usr/local/apache/common-local/php-1.21wmf4/includes/filerepo/file/File.php(796): ImageHandler->makeParamString(Array)
#1 /usr/local/apache/common-local/php-1.21wmf4/includes/filerepo/file/File.php(778): File->generateThumbName('Aqueduc_Luynes....', Array)
#2 /usr/local/apache/common-local/php-1.21wmf4/thumb.php(200): File->thumbName(Array)
#3 /usr/local/apache/common-local/php-1.21wmf4/thumb.php(56): wfStreamThumb(Array)
#4 /usr/local/apache/common-local/php-1.21wmf4/thumb.php(39): wfThumbHandleRequest()
#5 /usr/local/apache/common-local/live-1.5/thumb.php(3): require('/usr/local/apac...')
#6 {main}

The error seems to only happen occasionally at random if the page is refreshed. It does not correspond to any particular image scalars.

Which one of the two, Aaron? The bug report seems to have two different problems (content-length & ECONNREFUSED) intermixed with each other and this is getting confusing.

(In reply to comment #13)

The error seems to only happen occasionally at random if the page is refreshed.
It does not correspond to any particular image scalars.

Note this was for http://commons.wikimedia.org/w/thumb.php?f=Aqueduc_Luynes.jpg&w=400 and some other widths.

The exception only happens randomly. Looking at the exception URL and error, it's like I request /w/thumb.php?f=Aqueduc_Luynes.jpg&w=405 and sometimes what reaches MW is just /w/thumb.php?f=Aqueduc_Luynes.jpg

I can't seem to be able to reproduce this. Can you pinpoint in which appserver(s) this happens (either via an HTML comment in the body or, I guess, via the exception log), along with the caching layers (X-Cache/X-Cache-Lookup/Via) that the request passes through?

curl -i http://commons.wikimedia.org/w/thumb.php?f=Aqueduc_Luynes.jpg&w=405
[1] 22525
aaron@aaron-HP-HDX18-Notebook-PC:/var/www/CephWiki/core$ HTTP/1.0 500 Internal Server Error
Date: Mon, 19 Nov 2012 23:46:04 GMT
Server: Apache
X-Content-Type-Options: nosniff
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Status: 500 MediaWiki exception
Content-Length: 301
Content-Type: text/html; charset=utf-8
X-Cache: MISS from cp1008.eqiad.wmnet
X-Cache-Lookup: MISS from cp1008.eqiad.wmnet:3128
X-Cache: MISS from cp1016.eqiad.wmnet
X-Cache-Lookup: MISS from cp1016.eqiad.wmnet:80
Connection: close

<!doctype html>
<html><head><title>Internal error</title></head><body>
<div class="errorbox">[4de91a08] 2012-11-19 23:46:04: Fatal exception of type MWException</div>
<!-- Set $wgShowExceptionDetails = true; at the bottom of LocalSettings.php to show detailed debugging information. --></body></html>

Seems to keep happening with that file for now.

(In reply to comment #18)

curl -i http://commons.wikimedia.org/w/thumb.php?f=Aqueduc_Luynes.jpg&w=405
[1] 22525
Seems to keep happening with that file for now.

Actually ignore that, bad shell escaping :)

So yeah, it only happens at randoms refreshing by browser a bunch of times.

One way to get this error is to load a thumbnail in FF in that press the "stop" button. I wonder if what causes is this is some sort of early disconnect.

Are there any other problem links? None of the ones given above have this problem anymore?

laurent.jauquier wrote:

It looks like the problem has been solved. I haven't noticed any problem since day 2012-11-21. I will let you know if the problem happens again.

This can be reopened if it occurs again.

Aaron, both you and me have independently verified that the error indeed existed, so worksforme isn't exactly right.

Do you have any indications that a recent code deploy or infrastructure change might have fixed it? In a different case I'm pretty sure this won't be the last of it and we should reopen the bug and investigate more.

What I thought was reproducing this may have just been the client disconnects (initiated on my part). It can always be reopened later if it happens again and I have something to work on. And you can investigate if you want though or even assign it to yourself ;)

I'm afraid that we've been bitten by having two different bugs in the same bug report again. I was referring to the truncated images/Content-Length: 108 bug, as I mentioned back in comment 8. I don't think the cause for this has been found and the bug has been fixed yet, has it?

Gilles raised the priority of this task from Medium to Unbreak Now!.Dec 4 2014, 10:25 AM
Gilles added a project: Multimedia.
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Medium.Dec 4 2014, 11:20 AM