Page MenuHomePhabricator

punctuation in image name causes numerous problems
Closed, DeclinedPublic

Description

Author: mchapman22

Description:
Punctuation in an image name on wikipedia causes numerous problems. Known to be a
problem with ampersand and plus, also maybe dashes - presumably all punctuation. The
thumbnails generated from the image will show up briefly then disappear and cannot be re-
generated. Causes other problems like the "image" tab showing up as red (doesn't
exist). Also if the image is updated, new thumbnails will not be generated to replace
the existing ones even after a purge. This appears to be a recent bug as older images
don't suffer from this.


Version: unspecified
Severity: major
URL: http://en.wikipedia.org/wiki/Image:USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg

Details

Reference
bz8367

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:31 PM
bzimport set Reference to bz8367.
bzimport added a subscriber: Unknown Object (MLST).

When testing (after seeing a comment on [[WP:VPT]]), I found some bizarre cache
behaviour on the broken thumbnails.

First, two working ones, for comparison purposes:

$ telnet localhost 3128
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD
http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/800px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg
HTTP/1.0
Host: upload.wikimedia.org

HTTP/1.0 200 OK
X-Powered-By: PHP/5.1.4
Content-Type: image/jpeg
Date: Sun, 24 Dec 2006 23:14:22 GMT
Server: lighttpd/1.4.13
X-Cache: MISS from sq7.wikimedia.org
X-Cache-Lookup: MISS from sq7.wikimedia.org:80
X-Cache: MISS from delta.home.cesarb.net
X-Cache-Lookup: MISS from delta.home.cesarb.net:3128
Proxy-Connection: close

Connection closed by foreign host.

$ telnet localhost 3128
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD
http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/741px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg
HTTP/1.0
Host: upload.wikimedia.org

HTTP/1.0 200 OK
X-Powered-By: PHP/5.1.4
Content-Type: image/jpeg
Date: Sun, 24 Dec 2006 23:16:29 GMT
Server: lighttpd/1.4.13
X-Cache: MISS from sq14.wikimedia.org
X-Cache-Lookup: MISS from sq14.wikimedia.org:80
X-Cache: MISS from delta.home.cesarb.net
X-Cache-Lookup: MISS from delta.home.cesarb.net:3128
Proxy-Connection: close

Connection closed by foreign host.

And then, two broken ones:

$ telnet localhost 3128
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD
http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/740px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg
HTTP/1.0
Host: upload.wikimedia.org

HTTP/1.0 200 OK
Content-Type: image/jpeg
ETag: "6463846857823763183"
Accept-Ranges: bytes
Last-Modified: Sat, 21 Oct 2006 23:58:58 GMT
Content-Length: 260
Date: Sun, 24 Dec 2006 22:45:37 GMT
Server: lighttpd/1.4.13
X-Cache: HIT from sq9.wikimedia.org
X-Cache-Lookup: HIT from sq9.wikimedia.org:80
X-Cache: HIT from sq5.wikimedia.org
X-Cache-Lookup: HIT from sq5.wikimedia.org:80
X-Cache: HIT from sq10.wikimedia.org
X-Cache-Lookup: HIT from sq10.wikimedia.org:80
Age: 1755
X-Cache: HIT from delta.home.cesarb.net
X-Cache-Lookup: HIT from delta.home.cesarb.net:3128
Proxy-Connection: close

Connection closed by foreign host.

$ telnet localhost 3128
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET
http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/250px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg
HTTP/1.0
Host: upload.wikimedia.org

HTTP/1.0 200 OK
Content-Type: image/jpeg
ETag: "-3342178414438865210"
Accept-Ranges: bytes
Last-Modified: Fri, 13 Oct 2006 16:54:09 GMT
Content-Length: 260
Date: Mon, 20 Nov 2006 18:44:45 GMT
Server: lighttpd/1.4.13
X-Cache: HIT from sq2.pmtpa.wmnet
X-Cache-Lookup: HIT from sq2.pmtpa.wmnet:80
X-Cache: HIT from sq8.pmtpa.wmnet
X-Cache-Lookup: HIT from sq8.pmtpa.wmnet:80
X-Cache: HIT from sq4.pmtpa.wmnet
X-Cache-Lookup: HIT from sq4.pmtpa.wmnet:80
X-Cache: HIT from sq14.wikimedia.org
X-Cache-Lookup: HIT from sq14.wikimedia.org:80
X-Cache: HIT from sq13.wikimedia.org
X-Cache-Lookup: HIT from sq13.wikimedia.org:80
X-Cache: HIT from sq4.wikimedia.org
X-Cache-Lookup: HIT from sq4.wikimedia.org:80
X-Cache: HIT from sq7.wikimedia.org
X-Cache-Lookup: HIT from sq7.wikimedia.org:80
X-Cache: HIT from sq1.wikimedia.org
X-Cache-Lookup: HIT from sq1.wikimedia.org:80
X-Cache: HIT from sq4.wikimedia.org
X-Cache-Lookup: HIT from sq4.wikimedia.org:80
X-Cache: HIT from sq2.wikimedia.org
X-Cache-Lookup: HIT from sq2.wikimedia.org:80
X-Cache: HIT from sq5.wikimedia.org
X-Cache-Lookup: HIT from sq5.wikimedia.org:80
X-Cache: HIT from sq12.wikimedia.org
X-Cache-Lookup: HIT from sq12.wikimedia.org:80
X-Cache: HIT from sq2.wikimedia.org
X-Cache-Lookup: HIT from sq2.wikimedia.org:80
X-Cache: HIT from sq8.wikimedia.org
X-Cache-Lookup: HIT from sq8.wikimedia.org:80
X-Cache: HIT from sq10.wikimedia.org
X-Cache-Lookup: HIT from sq10.wikimedia.org:80
X-Cache: HIT from sq14.wikimedia.org
X-Cache-Lookup: HIT from sq14.wikimedia.org:80
X-Cache: HIT from sq2.wikimedia.org
X-Cache-Lookup: HIT from sq2.wikimedia.org:80
X-Cache: HIT from sq4.wikimedia.org
X-Cache-Lookup: HIT from sq4.wikimedia.org:80
X-Cache: HIT from sq10.wikimedia.org
X-Cache-Lookup: HIT from sq10.wikimedia.org:80
X-Cache: HIT from delta.home.cesarb.net
X-Cache-Lookup: HIT from delta.home.cesarb.net:3128
Proxy-Connection: close

<html><head>

<title>Bad title</title>
<body>

<h1>Bad title</h1>
<p>The requested page title was invalid, empty, or an incorrectly linked
inter-language or inter-wiki title. It may contain one more characters which
cannot be used in titles.</p>
</body></html>Connection closed by foreign host.

The X-Cache header lines seem positively strange. Also, unless I'm reading the
code wrong, the Cache-Control and Content-Type headers which should be there
(from thumb.php) are missing.

The image tab is red because there's no local page on en.wikipedia.org; the
image is hosted on Commons.

Am investigating the thumb issue; there seem to be some uggy things with the
thumbnail caching system.

I've adjusted the caching script to, hopefully, work correctly now for this case.

Can you confirm correct behavior now?

After purging a couple of times on commons, the 740px thumbnail is still in the
broken state:

$ telnet upload.wikimedia.org 80
Trying 66.230.200.228...
Connected to upload.pmtpa.wikimedia.org.
Escape character is '^]'.
GET
/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/740px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg
HTTP/1.0
Host: upload.wikimedia.org

HTTP/1.0 200 OK
Content-Type: image/jpeg
ETag: "6463846857823763183"
Accept-Ranges: bytes
Last-Modified: Sat, 21 Oct 2006 23:58:58 GMT
Content-Length: 260
Date: Sun, 24 Dec 2006 22:45:37 GMT
Server: lighttpd/1.4.13
X-Cache: HIT from sq9.wikimedia.org
X-Cache-Lookup: HIT from sq9.wikimedia.org:80
X-Cache: HIT from sq5.wikimedia.org
X-Cache-Lookup: HIT from sq5.wikimedia.org:80
Age: 61892
X-Cache: HIT from sq10.wikimedia.org
X-Cache-Lookup: HIT from sq10.wikimedia.org:80
X-Cache: MISS from sq9.wikimedia.org
X-Cache-Lookup: MISS from sq9.wikimedia.org:80
Via: 1.0 sq9.wikimedia.org:80 (squid/2.6.STABLE5), 1.0 sq5.wikimedia.org:80
(squid/2.6.STABLE5), 1.0 sq10.wikimedia.org:80 (squid/2.6.STABLE5), 1.0
sq9.wikimedia.org:80 (squid/2.6.STABLE5)
Connection: close

<html><head>

<title>Bad title</title>
<body>

<h1>Bad title</h1>
<p>The requested page title was invalid, empty, or an incorrectly linked
inter-language or inter-wiki title. It may contain one more characters which
cannot be used in titles.</p>
</body></html>Connection closed by foreign host.

The problem might be fixed, but purging doesn't seem to be enough to clear the
bogus entries. Unfortunately, I don't know how to check if the problem which
created these bogus entries in the first place has really been fixed. The crazy
"add 1 to the end" trick would probably fix this one too, but I'll leave it
broken to help debugging.

I cleared the old bad files for this particular entry manually, and it now shows
properly for me.

Purge doesn't clear them, and it's not saving new, correct copies on the caching
thumbnail server;
presumably fetching from the main server.

Sigh.

dunc_harris wrote:

Wouldn't it just be easier to disallow new uploads which contain bad characters?

ayg wrote:

We don't want to prohibit punctuation in image names. This is a bug in the caching system that should be fixed there, as I understand it.

jeluf wrote:

Are there still any current problems? Brion said in December that he fixed it and there wasn't any update since.

No problems currently that I'm aware of, and various changes have been made in infrastructure. If problems continue, please provide details.