Page MenuHomePhabricator

Charset error on Special:UnusedProperties (1.8 alpha)
Closed, ResolvedPublic

Description

Author: dasch

Description:
Somehow on this special Page the encoding is wrong. All other pages work correctly
http://www.wecowi.de/wiki/Spezial:Verwaiste_Attribute


Version: unspecified
Severity: normal

Details

Reference
bz30705
TitleReferenceAuthorSource BranchDest Branch
ci: add package build and pypirepos/cloud/toolforge/jobs-cli!1dcaroenable_cimain
Revert "buildservice: enable gitlab CI"repos/cloud/toolforge/builds-builder!1aborrerocimain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:53 PM
bzimport set Reference to bz30705.
bzimport added a subscriber: Unknown Object (MLST).

dasch wrote:

Seams like this is not nessesary a encoding problem but a problem, that the page does not correctly check the existents of the pages. I think the selection is not done correctly. It should check if the page really exists.

dasch wrote:

$egMapsDefaultService = "openlayers";
$egMapsAvailableServices = array('googlemaps2', 'yahoomaps', 'openlayers','osm');
and the corosponding API Keys

dasch wrote:

(In reply to comment #2)

$egMapsDefaultService = "openlayers";
$egMapsAvailableServices = array('googlemaps2', 'yahoomaps',
'openlayers','osm');
and the corosponding API Keys

sorry wrong BUG

It looks like either the wiki is misconfigured for the database's character set settings, or the database itself has been corrupted with an incorrect Latin1-to-UTF-8 conversion applied on export or import.

Page contents usually survive this because they're stored in binary BLOB fields, but page titles, usernames, edit comments etc may have gotten misconverted.

Try switching the $wgMySQL5 setting and double-check the encodings. Ideally, most newly configured wikis will be set to binary charset/collation -- which allows MediaWiki to speak UTF-8 Unicode without limitations. If things claim to be either latin1 or utf8 and the contents are clearly wrong when viewed directly in the db, it may be incorrectly set up.

This sometimes results from mysqldump operating on wikis that were originally set up with a really old configuration where fields were labeled as Latin1 (such as when upgraded from an old MySQL 4.0 instance, or old versions of MediaWiki that had aimed primarily for MySQL 4.0 compatibility. MySQL 4.0 and earlier have *no* customizable charset support so whatever the default charset was got used, even though we always actually sent/received UTF-8 data. This sometimes results in data getting "converted to UTF-8" by mysqldump, or some other sort of problem.)

Sometimes also the database itself is still ok, but a reconfiguration of the wiki has caused the old settings to be lost and it's now defaulting to using the modern mode, which can end up doing a similar misconversion. Try turning off $wgMySQL5 in this case?

dasch wrote:

Please consider the fact, that this is the only special page where this charset error can be seen. All other pages are correct and this page already was displayed correctly on this wiki and there were no changes made in the database exept the one made by update.php and SMW_Admin.php for updating to the new version

dasch wrote:

And another special page also for Semantic MediaWiki properties has no problem with special-chars
http://www.wecowi.de/wiki/Spezial:Gew%C3%BCnschte_Attribute

dasch wrote:

BTW in my LocalSettings
$wgDBmysql5 = false;

Ah indeed -- this is some Semantic MediaWiki-only thing? Possibly only some tables are incorrectly set up, or something else in SMW is causing bogus values to end up stored.

dasch wrote:

Yes I think so. But to be honest, I took a look at my database and this seams totally confused. Most tables are InnoDB, but not all. Some have utf8_general_ci some have latin1_swedish_ci and one even is in utf8_bin
that's all a bit strange

dan.bolser wrote:

What happens if you set all to utf8?

dasch wrote:

Anybody from the SMW Team cares about this?

I believe that the cause of this problem is that a temporary table is created to compute the results of this particular special page. The problem then occurs due to incompatible character encoding settings between your existing tables and this new table.

The way in which SMW creates the temporary table on MySQL is:

CREATE TEMPORARY TABLE tablename( title VARCHAR(255) ) ENGINE=MEMORY

The encoding used in this table therefore defaults to the global settings. It is possible that these are not the same as for the other tables. It should be possible to fix the problem by changing all table encodings to be the same, and making sure that this is also the encoding used as a default.

The deeper architectural problem is that MediaWiki uses the global variable $wgDBTableOptions for defining additional options, including specific charsets (I think). However, these global options usually include the ENGINE setting. So we cannot use them for temporary tables. So if somebody changes $wgDBTableOptions to use a non-default charset, SMW will take this into account only for its normal tables. Do you have a custom setting for $wgDBTableOptions?

In the long run, we should also find a more efficient way to compute the results on this special page. The current solution does not scale to bigger wikis.

  • Bug 38140 has been marked as a duplicate of this bug. ***

No updates for two and a half years - is this still an issue? Should this really be high priority?

Kghbln claimed this task.

I am closing this as "Resolved" since this was probably fixed along the way (four new releases since then). The wiki in question was updated to a newer version and currently does not show this behaviour. My wiki with a similar issue is fluffy, too. If this unexpectedly re-appears as an issue a new report may very well be filed on GitHub making reference to this one.