Page MenuHomePhabricator

IE6/IE7 user-agent strings involving "InfoPath" detect as non-Unicode-compliant
Closed, ResolvedPublic

Description

Author: cybercoolcougar

Description:
When editing a page using Internet Explorer 6, Mediawiki says "Your browser is not unicode-compliant,
blahblah".

I did some debugging and found:
In the file includes\DefaultSettins.php,
several REGEXP patterns are defined on non-compliant browsers:

$wgBrowserBlackList = array(

/**
 * Netscape 2-4 detection
 * The minor version may contain strings such as "Gold" or "SGoldC-SGI"
 * Lots of non-netscape user agents have "compatible", so it's useful to check for that
 * with a negative assertion. The [UIN] identifier specifies the level of security 
 * in a Netscape/Mozilla browser, checking for it rules out a number of fakers. 
 * The language string is unreliable, it is missing on NS4 Mac.
 * 
 * Reference: http://www.psychedelix.com/agents/index.shtml
 */
'/^Mozilla\/2\.[^ ]+ .*?\((?!compatible).*; [UIN]/',
'/^Mozilla\/3\.[^ ]+ .*?\((?!compatible).*; [UIN]/',
'/^Mozilla\/4\.[^ ]+ .*?\((?!compatible).*; [UIN]/',

#NOTE THIS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

/**
 * MSIE on Mac OS 9 is teh sux0r, converts þ to <thorn>, ð to <eth>, Þ to <THORN> and Ð to <ETH>
 *
 * Known useragents:
 * - Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)
 * - Mozilla/4.0 (compatible; MSIE 5.15; Mac_PowerPC)
 * - Mozilla/4.0 (compatible; MSIE 5.23; Mac_PowerPC)
 * - [...]
 *
 * @link http://en.wikipedia.org/w/index. ... &oldid=12355864
 * @link http://en.wikipedia.org/wiki/Template%3AOS9
 */
'/^Mozilla\/4\.0 \(compatible; MSIE \d+\.\d+; Mac_PowerPC\)/'

);

And in the file includes\EditPage.php, the current browser's USER-AGENT string is checked against the
patterns:

function checkUnicodeCompliantBrowser() {
        global $wgBrowserBlackList;
        if( empty( $_SERVER["HTTP_USER_AGENT"] ) ) {
                // No User-Agent header sent? Trust it by default...
                return true;
        }
        $currentbrowser = $_SERVER["HTTP_USER_AGENT"];
        foreach ( $wgBrowserBlackList as $browser ) {
                if ( preg_match($browser, $currentbrowser) ) {
                        return false;
                }
        }
        return true;
}

Note the 3rd pattern,

'/^Mozilla\/4\.[^ ]+ .*?\((?!compatible).*; [UIN]/',

it will match the IE6's USER-AGENT string on my machine, whick is shown below:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; iebar; .NET CLR 1.1.4322; InfoPath.1)

Please note the "InfoPath.1" part near the end of the string, it is there because I installed InfoPath,
a component of Microsoft Office 2003. The starting letter 'I' makes it matched with the 3rd pattern.


Version: 1.10.x
Severity: normal
OS: Windows XP
Platform: PC
URL: http://<found-in-intranet-site-so-no-useful-info-here>

Details

Reference
bz7629

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:23 PM
bzimport set Reference to bz7629.
bzimport added a subscriber: Unknown Object (MLST).

IE 6.0 hasn't ever triggered this in the wild that we know of. Have you done something
strange to customize your user-agent string? Can you confirm that it works properly
when restored to normal?

cybercoolcougar wrote:

After installing .NET framework 1.1 and MS Office 2003 (including InfoPath component), my IE's user-agent string has
been changed like this:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; iebar; .NET CLR 1.1.4322; InfoPath.1)

InfoPath is a standard component of MS Office 2003 Pro, not a WEIRD PLUGIN.

cvb-mediawiki wrote:

This regexp array doesn't recognized IE7 with this $USER_AGENT:

'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon; MRA 4.8 (build
01705); InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)'

and MediaWiki show this message:

''WARNING: Your browser is not unicode compliant. A workaround is in place to
allow you to safely edit articles: non-ASCII characters will appear in the edit
box as hexadecimal codes.''

but this browser support UTF8!

cvb-mediawiki wrote:

This regexp array doesn't recognized IE7 with this $USER_AGENT:

'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon; MRA 4.8 (build
01705); InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)'

and MediaWiki show this message:

''WARNING: Your browser is not unicode compliant. A workaround is in place to
allow you to safely edit articles: non-ASCII characters will appear in the edit
box as hexadecimal codes.''

but this browser support UTF8!

ayg wrote:

It seems like the intent is to allow browsers that don't have "compatible"
there, but the regex doesn't work. Should probably be

  • '/^Mozilla\/2\.[^ ]+ .*?\((?!compatible).*; [UIN]/',
  • '/^Mozilla\/3\.[^ ]+ .*?\((?!compatible).*; [UIN]/',
  • '/^Mozilla\/4\.[^ ]+ .*?\((?!compatible).*; [UIN]/',

+ '/^Mozilla\/2\.[^ ]+ [^(]*\((?!compatible).*; [UIN]/',
+ '/^Mozilla\/3\.[^ ]+ [^(]*\((?!compatible).*; [UIN]/',
+ '/^Mozilla\/4\.[^ ]+ [^(]*\((?!compatible).*; [UIN]/',

The .*? is screwed up by the nested parentheses (it eats the initial parenthesis
to avoid the prohibited "compatible" string). Perl regex is all very nice, but
POSIX-style is better here. Patch needs review.

Tested the above modifications against actual referrer strings in our logs to
confirm.

Of 43003 MSIE samples, 3709 listed the InfoPath extension. 93 MSIE hits were
false-positive matches for the regexes, of which 77 listed the InfoPath extension.

In total, less than 0.12% of sampled hits were false positive matches -- 0.22%
of MSIE hits, 2.08% of InfoPath hits. (I did not sample edits specifically, but
all hits.)

Fixed in r21726.