Page MenuHomePhabricator

Detect "extra whitespace" / BOM conditions
Closed, DeclinedPublic

Description

It's not infrequent for people to complain of broken RSS feeds and other behavioral oddities which are caused by extra whitespace in a custimized or extension PHP source file.

Usually this is a BOM character at the start or extra line breaks at the end inserted by a "helpful" text editor.

It should in many cases be possible to detect this condition, as well as other possible problems, by checking the value of headers_sent() somewhere in OutputPage, or perhaps even in Setup.php, before actual output is ready to start.

It should even be possible to scan all files included so far for the common problem conditions in this case, telling the user where to look and what to fix.


Version: 1.11.x
Severity: enhancement

Details

Reference
bz9954

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:37 PM
bzimport set Reference to bz9954.

Upstream issue : http://bugs.php.net/bug.php?id=22108

Maybe add a script in t/maint to check for it.

The most common problems are for non-technical third-party users, so a maintenance script requiring a bunch of non-standard stuff is maybe not the best thing -- it would never be run by the target audience.

Test script in t/maint/bom.t (r22239).

  • Bug 10371 has been marked as a duplicate of this bug. ***

Created attachment 3832
Some work in this regard

Fiddled about a bit with this the other day, needs some polishing up before committing.

attachment bom-work.diff ignored as obsolete

ezyang wrote:

I was fiddling around with the patch, and noticed a few interesting things:

  • The automatic enabling of gzip compression will interfere with the ability to detect whether or not checkOutputBreakage() is necessary: if it is enabled then headers will not be sent and MediaWiki will think everything is hunky-dory. We should also check ob_get_contents() for a non-false/empty string value.
  • To use character escapes, you need to use double-quotes. Patch as it stands won't be able to detect BOMs.
  • Using MediaWiki exceptions is not the most friendly way of bubbling the error back to clueless end-users, because they need to enabled using $wgShowExceptionDetails (the instructions are clear enough, though, so it's not too big of a problem). Furthermore, a full stack trace is given even when it is effectively useless. Is there any way to mute this?
  • The current places checkOutputBreakage() is called don't get to the response fast enough to get things working. Because WebResponse sends all headers, it would probably be more appropriate to perform the check there, and then set a little variable that would prevent the check from occurring over and over again
  • headers_sent() has a neat little extra two parameters it accepts, $file and $line, which can get rid of most of the futzing around with checking included files. This only works, however, if output buffering is disabled, so the code is not in vain.

I will submit a patch after confirmation that my analysis is correct.

(In reply to comment #6)

I was fiddling around with the patch, and noticed a few interesting things:

  • The automatic enabling of gzip compression will interfere with the ability to

detect whether or not checkOutputBreakage() is necessary: if it is enabled then
headers will not be sent and MediaWiki will think everything is hunky-dory. We
should also check ob_get_contents() for a non-false/empty string value.

Not true in my testing so far, but perhaps that was only on LocalSettings.php.

  • To use character escapes, you need to use double-quotes. Patch as it stands

won't be able to detect BOMs.

Woops. :)

  • Using MediaWiki exceptions is not the most friendly way of bubbling the error

back to clueless end-users, because they need to enabled using
$wgShowExceptionDetails (the instructions are clear enough, though, so it's not
too big of a problem).

That's just a quick way to do an error,f eel free to experiment with other display options. Further, the exception details are not required for this.

  • The current places checkOutputBreakage() is called don't get to the response

fast enough to get things working. Because WebResponse sends all headers, it
would probably be more appropriate to perform the check there, and then set a
little variable that would prevent the check from occurring over and over again

Details...

  • headers_sent() has a neat little extra two parameters it accepts, $file and

$line, which can get rid of most of the futzing around with checking included
files. This only works, however, if output buffering is disabled, so the code
is not in vain.

This won't necessarily always get you what you want. Also it might be nice to report multiple problems. But yes, might help. :)

ezyang wrote:

Full patch, i18n too!

This took a lot longer than I thought it would, mainly because I ended up having to internationalize the whole thing (translators, get on it!)

A few notes:

  • I added MWException::$suppressStackTrace, which is a public variable sub-classes can override in order to change the behavior of the default reports
  • I added a margin to the default exception text so that it doesn't wrap around the floated image. It looks weird when that happens
  • Besides the few typos I fixed, the basic algorithm is the same. The headers_sent() call will handle the few cases where our specialized checks miss things. The unknown error message should never be called
  • Fallback messages are the same as the message keys. I considered duplicating the messages in the class itself, but decided against it since premature headers will almost never cause $wgLang not to be populated.
  • Detailed exception reporting needs to be on to see this in its full glory. I don't know if users are smart enough to do that, but... whatever. Darwinism or something.

attachment bom-2.patch ignored as obsolete

robchur wrote:

A sterling effort, but I would suggest using the exception for internal purposes and *forcing* the user to see the message. Most of the people this sort of thing is going to benefit won't know to switch $wgShowExceptionDetails on, and so will never see the advice being given.

ezyang wrote:

Full patch v3, with improved documentation and Rob Church's suggestion

Alright, following Rob Church's suggestion I've added another member variable, $alwaysShowExceptionDetails which MWException subclasses can use to override $wgShowExceptionDetails. Also, the lack of documentation on some of the functions was bothering me, so I added docblocks.

It strikes me that the FatalError exception doesn't play nice with command-line utilities: since it's HTML error message, the HTML will get output as raw text. Maybe we should add strip_tags()?

attachment bom-3.patch ignored as obsolete

Created attachment 3872
Slightly tweaked patch - relative path output and an RSS fix

Nice work! A couple notes:

First, it might be nice if we could get the message prettied up a little more before we put this in a release; we want it to be easy to read and easy to see how to fix the problem.

One quick change I made was to make the file path relative against the wiki installation path instead of absolute. This will first make it shorter -- which is nice :) -- and second it avoids exposing complete directory paths, which would worry some people as it leaks potentially private information about filesystem structure which can be used to guide attacks.

A couple paragraph breaks, some bold or teletype markup on the filename, and maybe even a link to a help page on www.mediawiki.org might do wonders here for making it ready for a panicked end-user.

Related to this, while it's fun to say "get a better editor" it's probably not very nice. ;)

Another issue I noticed is that there's a *lot* of code that calls header() directly, so wouldn't necessarily hit the check in WebResponse::header(). It may just be necessary to go fix them all, as with the one in Feed that I fixed so the RSS feed for recent changes would trigger the explanatory exception _before_ the PHP warning from header() splashed across output.

Attached:

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

At some point in time, we had a spike of people using notepad (which insert a BOM) to edit their PHP files. This is rare by now, I guess most people use a decent editor nowadays or newbies are smarter?

The worse thing in the patch, is that it adds yet another feature to maintain for minimal benefits.

I would personally close the bug and keep handling the BOM issue through support requests.

...and closing bug report. We can handle those rare issues through the usual support channels.