Page MenuHomePhabricator

Database collation settings of SMW sometimes do not match MW's
Closed, ResolvedPublic

Description

Author: Michael.W.Dietrich

Description:
When running "php SMW_refreshData.php -v" whilst upgrading from 0.7.x (version from svn) after some pages have been processed properly I get the following error message and processing stops.

(116) Processing page with ID 1393 ...
Es gab einen Syntaxfehler in der Datenbankabfrage.
Die letzte Datenbankabfrage lautete: „ SELECT DISTINCT m3WDBpage.page_title as title, m3WDBpage.page_namespace as namespace, m3WDBpage.page_id as id FROM prop9, m3WDBpage INNER JOIN m3WDBsmw_relations AS rel8 ON rel8.subject_id=m3WDBpage.page_id LEFT JOIN m3WDBredirect AS rd10 ON rd10.rd_from=rel8.object_id WHERE ((m3WDBpage.page_namespace='0') OR (m3WDBpage.page_namespace='6')) AND (prop9.title=rel8.relation_title AND ((rel8.object_title='Msg' AND rel8.object_namespace=0) OR (rd10.rd_title='Msg' AND rd10.rd_namespace=0))) ORDER BY m3WDBpage.page_title ASC LIMIT 51 “ aus der Funktion „<tt>SMW::getQueryResult</tt>“.
MySQL meldete den Fehler: „<tt>1267: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (latin1_general_ci,IMPLICIT) for
operation '=' (localhost)</tt>“.

If this can't be fixed I've to step back to a bakup version of the database and to 0.7.

As I'm aware, that the german language version of SMW is Not used to often i doubt that this bug will be critical to users of other language versions but german.


Version: unspecified
Severity: critical
OS: Linux
Platform: PC
URL: http://www.self-qs.de

Details

Reference
bz12059

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:58 PM
bzimport set Reference to bz12059.
bzimport added a subscriber: Unknown Object (MLST).

This is the general "illegal collation" issue that some people get when importing DB-dumps. In short, MediaWiki does not set any collation to its tables, and neither does SMW. But sometimes a DB-import creates unwnated collation information for some of the tables (typically for the MediaWiki part), while the rest does not have the collations set. To fix this, you need to set the collation information for all tables/columns/dbs in the same way, see http://dev.mysql.com/doc/refman/5.0/en/charset-syntax.html for details.

There is currently no automatic fix for that problem, but it is not a specific SMW1.0 software issue (the same things can happen in SMW0.7).

To find out about the collations of one table, use the mysql command "SHOW CREATE TABLE tablename;" To set the collation of one specific field, use something like "ALTER TABLE tablename CHANGE subject_title subject_title varchar(255) character set latin1 NOT NULL default '' collate latin1_bin;" where the rest of the declaration (varchar...) should be the same as before.

Another, possibly simpler, workaround is the following:

  • Check out the collation MediaWiki has now in your DB by using e.g. "SHOW

CREATE TABLE categorylinks;"

  • Set this as a default collation for your database, e.g. "ALTER DATABASE yourdatabase CHARACTER SET latin1 COLLATE latin1_bin;"
  • Delete all SMW tables, currently this is "DROP TABLE smw_relations, smw_attributes, smw_longstrings, smw_nary, smw_subprops, smw_specialprops, smw_nary_relations, smw_nary_attributes, smw_nary_longstrings;" You can always findout the names of all current tables by running SMW_setup.php (or by pressing the Special:SMWAdmin button in the wiki).
  • Then setup the tables again, running "php SMW_setup.php" (or using SMWAdmin)
  • Then populate the tables again, running "php SMW_refreshdata.php -vp" and then "php SMW_refreshdata.php -v".

This should do the trick, but it obviously will break the wiki for some short time as long as the tables are gone. If all fails, switching off inline queries temporarily (see settings documented in SMW_GlobalSettings.php) should prevent major errors in wiki usage (i.e. normal pages will display, queries will remain empty).

Michael.W.Dietrich wrote:

Well everything works out well with your solution Markus - so thanks so far (probably you want to correct the typo in php SMW_refreshdata.php -vp and php SMW_refreshdata.php -v to php SMW_refreshData.php -vp and php SMW_refreshData.php -v as people under unixish environments might wonder why they can't just Drag/Drop/Execute your solutions commands directly from the web to a local console.

BUT

when Executing the "final" php SMW_refreshData.php -v Everythng works out well until suddenly.... (see console copy below).

...
(238) Processing page with ID 1525 ...
(239) Processing page with ID 1526 ...

Warning: DOMDocument::loadXML(): Input is not proper UTF-8, indicate encoding !
Bytes: 0xE4 0x69 0x73 0x63 in Entity, line: 6 in /var/wiki2/includes/Parser.php on line 2967
Parser::preprocessToDom generated invalid XML
Backtrace:
#0 /var/wiki2/includes/Parser.php(3021): Parser->preprocessToDom('----?als [[Test...')
#1 /var/wiki2/includes/Parser.php(915): Parser->replaceVariables('----?als [[Test...')
#2 /var/wiki2/includes/Parser.php(295): Parser->internalParse('----?als [[Test...')
#3 /var/wiki2/extensions/SemanticMediaWiki/maintenance/SMW_refreshData.php(99): Parser->parse('----?als [[Test...', Object(Title),
Object(ParserOptions), true, true, 2265)
#4 {main}

Michael.W.Dietrich wrote:

Hi Markus,

heres another QS issue with this

When trying to resolve the above mentioned probs myself I tried out this:

s-ddf-wiki:/var/wiki2/maintenance# php SMW_refreshData.php --help

Refreshing all semantic data in the database!
Processing pages from ID 0 to ID 2969 ...

I thik it qould be better style if a script (even only a maintanance script) would write out some info instead of processing itself when called with the --help commandline parameter. Just a gusee

Michael

I have now implemented two further features that should close this bug completely:

(1) SMW uses the "binary" keyword for creating tables, just as MediaWiki does. Not doing this before may have caused some collation differences between the two under certain conditions.

(2) The refresh script now provides an automated way to recreate all SMW database tables (just as in the manual solution above). The parameter to achive such a "full" refresh is called "-f". An example run is as follows:

% php SMW_refreshData.php -vpf && php SMW_refreshData.php -v

Note that a second refresh must be triggered, just as in the earlier solution. You can of course split both runs into two commands, and combine them with start (-s) and end (-e) settings to process the pages in chunks or in parallel.

This should close *this* bug (which is apparently not related to the last two comments that reopened it). Final comments to the above remarks:

  • Re:#3: try using the -s and -e parameters to skip problematic pages in refreshing. Update them manually through the wiki. Check whether the reported problem relates to SMW at all (the error seems to originate in Parser.php, not in SMW).
  • Re:#4: I agree, but cannot make myself care too much about it. Note that most (all?) other MediaWiki maintenance scripts behave just the same.

Feel free to file new bugs if you find that either of these issues needs further discussion.

Michael.W.Dietrich wrote:

After *some more investigation* in this it firstly seems that the behaviour that led to reopening this bug was discovered in smw with some 27xxx Build of mediawiki software and was somewhat healed with a 29xxx build of mediawiki from late December 2007. After installing another new build of smw 29xxx from Jan 2008 I still discovered some pages not visible in IE but only in firefox.

*FINDING THAT* a Teamplate named "ask" used in those pages lead to very long output which was refering to the content that was put ou merely nonsens in this versions of smw and mediawiki (as I do not want to doubt that it had some sense in former versions)

"FIXING THE WIKI* was as simple as opening the Teamplate ask *in edit mode* (because it did also not open in some Browsers because of the output length) e.g. by calling http://www.self-qs.de/m3WDB?title=Vorlage:Ask&action=edit (where www.self-qs.de is the url of the wiki and m3WDB is a link to index.php) *AND* writing "noinclude" tags around the templates content. This made ALL Wikipages showable and editable again and did give the chance to correct those error symptoms described above.

Thanks for helping and I agree *Issue resolved/fixed*

Just let me comment on

<cit>* Re:#4: I agree, but cannot make myself care too much about it. Note that most
(all?) other MediaWiki maintenance scripts behave just the same.</cit>

It was becaus MediaWiki behaved so less self explantory that you wrote smw - am I right? So what do we need semantics for if we do not even document our source code better that others do. *To suggest a solution*: If you where so kind sending me a explanation of all parameters implemented in the maintanance scripts I will go to the smw svn, check out the scripts and write those few lines giving the /?-call some more documentative charme - What do you think about that?

regards and hapy new year
Michael

Good to hear that the other problems are gone now. We SMW developers strongly discourage the use of universal "ask" templates (i.e. templates that query for everything in order to not have to use custom queries on different pages), but they tend to reappear at least on ontoworld.org. In most cases, special-purpose tempaltes with only one or two relevant queries are much more useful.

<cit> If you where so kind sending me a explanation of all parameters implemented in the maintanance scripts I will go to the smw svn, check out the scripts and write those few
lines giving the /?-call some more documentative charme - What do you think
about that?</cit>

Now I begin to understand the problem. The parameters are already explained *within* the scripts as part of the initial comments in each of them. If this explanation is not sufficient, feel free to ask via the mailinglists! Of course, having a --help parameter would not hurt either, and possibly one can even format the scripts in such a way that the current initial comment text is also the returned help text (i.e. bcomes an initial string definition instead of a comment, reducing effort for maintaining the docu). So feel free to do that change if you consider it helpful.

Thanks, and a late Happy New Year as well!

Markus