Page MenuHomePhabricator

MediaWiki fails streaming files when mod_deflate and ob_gzhandler are also set ("Content-Encoding: , gzip")
Closed, ResolvedPublic

Description

Dan Nesset reports in mediawiki-l that when serving files from ConfirmAccount, the response contains "Content-Encoding: , gzip", which "befuddles
some browsers, such as FF, IE and Safari and they fail to decompress the
file."

This comes as a combination of mod_deflate, PHP's ob_gzhandler and MediaWiki.

When serving files, mediawiki clears any gzipping layer, including its own one. You seem to have at php.ini output_handler=ob_gzhandler. When mediawiki detects that ob_gzhandler is active, performs ob_end_clean() and does header( 'Content-Encoding:' ); in order to clean the Content-Encoding field (otherwise you would get plain data with header saying it's in gzip).
Then, you also have mod_deflate into the mix. It detects an existing Content-Encoding header, and apr_table_mergen "merges" adding ', gzip' despite the header being empty.

Where is the bug?
mod_deflate shouldn't concatenate if the field is empty.
php could skip passing Content-Encoding to other modules if empty.
MediaWiki could use the header( 'Content-Encoding: identity' ); instead of header( 'Content-Encoding:' );

How can _you_ fix it right now?
You don't need having three compressing layers. I'd deactivate mod_deflate and output_handler=ob_gzhandler, letting mediawiki compress the pages automatically for you.
Just disabling mod_deflate or output_handler=ob_gzhandler would work too, but note that keeping mod_deflate with your current configuration will compress streamed files, which is likely to be inefficient.

rfc2616 section 14.11 defines Content-Encoding header as
"Content-Encoding" ":" 1#content-coding
The #rule (see section 2) requires at least one content-coding to be present, which MediaWiki is currently violating (yes, the empty header does arrive at the user browser).


Version: 1.18.x
Severity: normal
URL: http://thread.gmane.org/gmane.org.wikimedia.mediawiki/36969

Details

Reference
bz28069

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:34 PM
bzimport set Reference to bz28069.

Fixed in r84060. Set the Content-Encoding as identity, using header_remove if available.

dnessett wrote:

I checked php.ini on our wiki server and output_handler=ob_gzhandler isn't set.

I reported this problem on apache bugzilla (https://issues.apache.org/bugzilla/show_bug.cgi?id=50935) and the ticket was closed as invalid, stating. "There are known mod_deflate issues in 2.2.3 which
match the described symptoms." The maintainer did not reference any tickets that described these "known mod_deflate issues", so it is hard to say whether this is a legitimate comment or a smoke screen. I have asked for a clarification.

I will deactivate mod_deflate and see if that fixes the problem. However, we have other applications running on our wiki server machine that use apache (e.g., WordPress and Subversion), so a global solution involving mod_deflate activation may not be desirable.

I also have created a coding workaround for ConfirmAccount that uses a global to turn off compression when sending CVs. I will open a ticket and supply the patch as an enhancement request for ConfirmAccount if deactivating mod_deflate doesn't solve the problem.

dnessett wrote:

However, we
have other applications running on our wiki server machine that use apache
(e.g., WordPress and Subversion), so a global solution involving mod_deflate
activation may not be desirable.

Actually, this is incorrect. We only run the wiki software on our wiki server machine. Not sure where my brain was when I wrote that. However, other sites may run other apache based applications on their wiki servers or may not have access to httpd.conf.

dnessett wrote:

(In reply to comment #2)

I will deactivate mod_deflate and see if that fixes the problem.

This works.

Ignore empty Content-Encoding in mod_deflate

I checked php.ini on our wiki server and output_handler=ob_gzhandler isn't set.

Maybe it is set in LocalSettings? Some old LocalSettings may enable it.

I reported this problem on apache bugzilla
(https://issues.apache.org/bugzilla/show_bug.cgi?id=50935) and the ticket was
closed as invalid, stating. "There are known mod_deflate issues in 2.2.3 which
match the described symptoms." The maintainer did not reference any tickets
that described these "known mod_deflate issues", so it is hard to say whether
this is a legitimate comment or a smoke screen. I have asked for a
clarification.

It is reproduceable in 2.2.17, and the fix is as simple as checking at line 565 if the field is empty. I'm attaching a patch for that. It's not clear if it should be done, given that the input is not rfc conformant, either.

I also have created a coding workaround for ConfirmAccount that uses a global
to turn off compression when sending CVs. I will open a ticket and supply the
patch as an enhancement request for ConfirmAccount if deactivating mod_deflate
doesn't solve the problem.

You can also backport r84060.

Attached:

dnessett wrote:

(In reply to comment #5)

Created attachment 8309 [details]
Ignore empty Content-Encoding in mod_deflate

I checked php.ini on our wiki server and output_handler=ob_gzhandler isn't set.

Maybe it is set in LocalSettings? Some old LocalSettings may enable it.

Yes, it was being set in LocalSettings.

I reported this problem on apache bugzilla
(https://issues.apache.org/bugzilla/show_bug.cgi?id=50935) and the ticket was
closed as invalid, stating. "There are known mod_deflate issues in 2.2.3 which
match the described symptoms." The maintainer did not reference any tickets
that described these "known mod_deflate issues", so it is hard to say whether
this is a legitimate comment or a smoke screen. I have asked for a
clarification.

It is reproduceable in 2.2.17, and the fix is as simple as checking at line 565
if the field is empty. I'm attaching a patch for that. It's not clear if it
should be done, given that the input is not rfc conformant, either.

I also have created a coding workaround for ConfirmAccount that uses a global
to turn off compression when sending CVs. I will open a ticket and supply the
patch as an enhancement request for ConfirmAccount if deactivating mod_deflate
doesn't solve the problem.

You can also backport r84060.

Since turning off mod_deflate is a workaround, I have discarded the patch. Backporting r84060 seems the best way forward.

Attached:

dnessett wrote:

(In reply to comment #5)

Created attachment 8309 [details]
Ignore empty Content-Encoding in mod_deflate

I checked php.ini on our wiki server and output_handler=ob_gzhandler isn't set.

Maybe it is set in LocalSettings? Some old LocalSettings may enable it.

I reported this problem on apache bugzilla
(https://issues.apache.org/bugzilla/show_bug.cgi?id=50935) and the ticket was
closed as invalid, stating. "There are known mod_deflate issues in 2.2.3 which
match the described symptoms." The maintainer did not reference any tickets
that described these "known mod_deflate issues", so it is hard to say whether
this is a legitimate comment or a smoke screen. I have asked for a
clarification.

It is reproduceable in 2.2.17, and the fix is as simple as checking at line 565
if the field is empty. I'm attaching a patch for that. It's not clear if it
should be done, given that the input is not rfc conformant, either.

I also have created a coding workaround for ConfirmAccount that uses a global
to turn off compression when sending CVs. I will open a ticket and supply the
patch as an enhancement request for ConfirmAccount if deactivating mod_deflate
doesn't solve the problem.

You can also backport r84060.

Wouldn't it be useful to provide the apache patch to mozilla? They can decide whether to incorporate it or not. I am not an apache developer, but I think testing for improperly set input is a reasonable course of action.

Once again, here is the bug ticket:

https://issues.apache.org/bugzilla/show_bug.cgi?id=50935

Attached:

dnessett wrote:

I have backported r84060 to the code running on our test wiki (http://test.citizendium.org).

Here's the bug entry for Firefox being unable to load pages specifying 'Content-encoding: identity':

https://bugzilla.mozilla.org/show_bug.cgi?id=341944

It was apparently fixed somewhere between 2006 and 2010, but I don't know if it got fixed before the Firefox 2.0 or 3.0 releases, or if other browsers are affected.

dnessett wrote:

(In reply to comment #9)

Here's the bug entry for Firefox being unable to load pages specifying
'Content-encoding: identity':

https://bugzilla.mozilla.org/show_bug.cgi?id=341944

It was apparently fixed somewhere between 2006 and 2010, but I don't know if it
got fixed before the Firefox 2.0 or 3.0 releases, or if other browsers are
affected.

I installed FF 2.0 on Windows 7 over Parallels on Mac OS X. I then streamed a .doc file on our test wiki using the ConfirmAccount extension to see if Content-Encoding: identity was handled properly. However, we run PHP 5.3.5, so the patch simply removed the Content-Encoding header and, it appears, either Apache or some other logic in MW requested gzip encoding (now with a properly formed Content-Encoding header value). So, I can't test Content-Encoding: identity processing by FF 2.0.

Here is the request/response from the CV download (again using ellipses to obscure certain information):

(Request-Line) GET /wiki?title=Special:ConfirmAccounts/editors&file=87...b.doc HTTP/1.1
Host test.citizendium.org
User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Accept text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Connection keep-alive
Referer http://test.citizendium.org/wiki?title=Special:ConfirmAccounts/editors&acrid=...
Cookie cz_test_session= ...

(Status-Line) HTTP/1.0 200 OK
Date Fri, 18 Mar 2011 22:48:23 GMT
Server Apache/2.2.3 (CentOS)
X-Powered-By PHP/5.3.5
Expires Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control no-cache, no-store, max-age=0, must-revalidate
Pragma no-cache
Last-Modified Fri, 25 Feb 2011 19:35:17 GMT
Vary Accept-Encoding
content-disposition inline;filename*=utf-8'en'87...b.doc
Content-Encoding gzip
Content-Length 16423
Content-Type application/x-wiki
X-Cache MISS from aristotle.citizendium.org
X-Cache-Lookup MISS from aristotle.citizendium.org:80
Via 1.0 aristotle.citizendium.org:80 (squid/2.6.STABLE21)
Connection keep-alive

If someone can supply me with a URL that will supply a Content-Encoding: identity header, I will test it on FF 2.0.

(In reply to comment #10)

and, it appears, either
Apache or some other logic in MW requested gzip encoding (now with a properly
formed Content-Encoding header value).

It was mod_defalte what gzipped it.

So, I can't test Content-Encoding:
identity processing by FF 2.0.

Just use the test case Brion provided to Mozilla. Create a php file containing (with some_file.png being a real file)
<?php
header('Content-Type: image/png');
header('Content-Encoding: identity');
readfile('some_file.png');
?>

dnessett wrote:

(In reply to comment #11)

Just use the test case Brion provided to Mozilla. Create a php file containing
(with some_file.png being a real file)
<?php
header('Content-Type: image/png');
header('Content-Encoding: identity');
readfile('some_file.png');
?>

It took a bit of fiddling, but I successfully tested FF 2.0 and it seems to handle 'Content-Encoding: identity' properly. The test php I used was:

<?php
@apache_setenv('no-gzip', 1);
header('Content-Type: image/png');
header('Content-Encoding: identity');
readfile('images/cz_testwiki_logo.png');
?>

Here is the HTTPFox output:

(Request-Line) GET /test.php HTTP/1.1
Host test.citizendium.org
User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Accept text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 300
Connection keep-alive
Cookie ......

(Status-Line) HTTP/1.0 200 OK
Date Sat, 19 Mar 2011 00:28:17 GMT
Server Apache/2.2.3 (CentOS)
X-Powered-By PHP/5.3.5
Content-Encoding identity
Content-Type image/png
X-Cache MISS from aristotle.citizendium.org
X-Cache-Lookup MISS from aristotle.citizendium.org:80
Via 1.0 aristotle.citizendium.org:80 (squid/2.6.STABLE21)
Connection close

It was apparently fixed somewhere between 2006 and 2010, but I don't know if it
got fixed before the Firefox 2.0 or 3.0 releases, or if other browsers are
affected.

It's strange, nsHTTPCompressConv::OnDataAvailable (mozilla/netwerk/streamconv/converters/nsHTTPCompressConv.cpp) seems to have kept the same implementation (by processing unknown encondings as plain in the default block) since 23-Mar-2000 when it was created (with several whitespace changes in-between).

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*