Author: info
Description:
In my install (MediaWiki: 1.11.0 (r26292), PHP: 5.2.1 (apache2handler) on Windows XAMPPlite) , just the annotation
[[Test à stuff::Main Page]]
(note the à -- a accent grave) on a page leads to:
a) No wiki text appearing in page display and preview (only the surrounding skin appears).
b) In Special:Browse it displays as a garbled string
Test � stuff Main Page
I spent 6 hours debugging this 8-).
SMW_Factbox.php and SMW_SpecialBrowse.php both do
preg_replace('/[\s]/', ...
This is not multibyte safe, so the string can get garbled.
In this annotation, I think the agrave accented character is C3A0 in UTF-8 , and the code point A0 in ISO8859 is NBSP.
The result *on certain PHP configurations* (?) is a garbled string where the agrave is partly replaced.
a) Special:Browse displays the garbled string.
b) Many of MediaWiki's own preg_* functions use the /u PCRE modifier, so that "Pattern strings are treated as UTF-8." But as someone commented in http://php.net/manual/en/reference.pcre.pattern.modifiers.php#54805 ,
Regarding the validity of a UTF-8 string when using the /u pattern modifier,
...
- When the subject string contains invalid UTF-8 sequences / codepoints, it basically result in a "quiet death" for the preg_* functions, where nothing is matched but without indication that the string is invalid UTF-8
In this example the Factbox code garbles this string in its generated HTML, and when MediaWiki's parser calls MagicWords which does some replacements with /u, BOOM, the first one (stripToc) returns nothing.
In these two cases I found a fix is for SMW to use the /u PCRE modifier in its own preg* functions.
a) SMW_SpecialBrowse.php around line 178 change
$html .= '<strong>' . $skin->makeKnownLinkObj($att, preg_replace('/[\s]/', ' ', smwfT($att), 2)) . "</strong> \n";
to
$html .= '<strong>' . $skin->makeKnownLinkObj($att, preg_replace('/[\s]/u', ' ', smwfT($att), 2)) . "</strong> \n";
b) SMW_Factbox.php around line 273 change
$text .= '<tr><td class="smwpropname">[[' . $property->getPrefixedText() . '|' . preg_replace('/[\s]/',' ',$property->getText(),2) . ']] </td><td class="smwprops">';
to
$text .= '<tr><td class="smwpropname">[[' . $property->getPrefixedText() . '|' . preg_replace('/[\s]/u',' ',$property->getText(),2) . ']] </td><td class="smwprops">';
What's confounding is this doesn't happen on ontoworld.org, or even on my ISP (see http://www.skierpage.com/tmp/mb_bug.php , works OK). Maybe it's only an issue on Windows or with XAMPPLITE.
Version: unspecified
Severity: major
URL: http://www.skierpage.com/tmp/mb_bug.txt