Page MenuHomePhabricator

Page always delivered using gzip compression in HTTP
Closed, DuplicatePublic

Description

Author: research

Description:
When this page is requested, it seems to be delivered using gzip compression in the HTTP protocol, even if the client has not specified that they accept gzip compression. I found this problem when a user of my Copyscape service reported the page was coming up as gibberish in Copyscape. I've also confirmed it using the Rex Swain HTTP viewer here (http://www.rexswain.com/httpview.html) I thought you might want to know in case it reflects some broader (caching?) issue.


Version: unspecified
Severity: normal
URL: http://de.wikipedia.org/wiki/Demonstrationen_in_Tibet_2008

Details

Reference
bz16230

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:21 PM
bzimport set Reference to bz16230.
bzimport added a subscriber: Unknown Object (MLST).

We've seen a few of these here and there, but haven't been able to reproduce the actual cause.

Pages seem to get stuck in the cache with wrong compression...

HEAD http://de.wikipedia.org/wiki/Demonstrationen_in_Tibet_2008 HTTP/1.0
Host: de.wikipedia.org
User-Agent: testing

HTTP/1.0 200 OK
Date: Sun, 02 Nov 2008 15:40:43 GMT
Server: Apache
X-Powered-By: PHP/5.2.4-2ubuntu5wm1
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: de
Vary: Accept-Encoding,Cookie
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=dewikiToken;string-contains=dewikiLoggedOut;string-contains=dewiki_session;string-contains=centralauth_Token;string-contains=centralauth_Session;string-contains=centralauth_LoggedOut
Last-Modified: Fri, 31 Oct 2008 22:46:09 GMT
Content-Encoding: gzip
Content-Length: 21077
Content-Type: text/html; charset=utf-8
Age: 92944
X-Cache: HIT from sq20.wikimedia.org
X-Cache-Lookup: HIT from sq20.wikimedia.org:3128
X-Cache: MISS from sq31.wikimedia.org
X-Cache-Lookup: MISS from sq31.wikimedia.org:80
Via: 1.0 sq20.wikimedia.org:3128 (squid/2.6.STABLE21), 1.0 sq31.wikimedia.org:80 (squid/2.6.STABLE21)
Connection: close

So either:

  • Squid is mistakenly caching a gzip-requested response for non-gzip-requestors

or

  • PHP/MediaWiki is mistakenly gzipping a response for non-gzip-requestors, which Squid dutifully caches.

It's intermittent and we haven't been able to reproduce the conditions causing it on demand. Might be worth double-checking the PHP gzhandler stuff to make sure its conditions are correct.

I have seen this again today. http://pl.wikipedia.org/wiki/Template:GeoTemplate got served gzipped.

This edit fixed this: http://pl.wikipedia.org/w/index.php?title=Szablon%3AGeoTemplate&diff=15439485&oldid=14665146

According to the Squid mailing list (see http://www.mail-archive.com/squid-users@squid-cache.org/msg59375.html) Squid does not support gzipping response for non-gzip-requestors (there is an add-one project to do this: http://devel.squid-cache.org/projects.html#gzip).

So, unless we mess with the Squid source, I would rule Squid out.

A discussion in bug 7098 seems to be related to this issue.

  • This bug has been marked as a duplicate of bug 7098 ***