Page MenuHomePhabricator

Wikimedia databases contains categorylinks with type "page" for media files. Run updateCollation.php to fix
Closed, ResolvedPublic

Description

Commons files with type "page" in categorylinks table

The Commons database contains categorylinks for files with type "page". But files should have type "file".

I think this error may exist in other databases too. Bug 29787 was about a category in English Wikipedia, but that was fixed by null edits. Now null edits does not seem to fix the errors at Commons.

Bug 29787 has an example of problems caused by this bug.


Version: unspecified
Severity: normal
URL: http://toolserver.org/~endumen/fileswithpagetype.txt

attachment ignored as obsolete

Details

Reference
bz35609

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:14 AM
bzimport set Reference to bz35609.
bzimport added a subscriber: Unknown Object (MLST).

Moving attachment contents to url field.

The content of attachment 10352 has been deleted by

Mark A. Hershberger <mah@everybody.org>

who provided the following reason:

should be in url field

The token used to delete this attachment was generated at 2012-04-02 16:58:03 UTC.

All of these appear to have cl_collation field set to ''. Thus running updateCollations.php should fix the issue.

My query on toolserver db said that there were 48331 such affected rows on commons. And they seem to have happened on July 7, 2011. I don't know what happened at that time. The pages I looked at weren't edited at that time. In the server admin log the only thing happening to commons at that time that i saw was CleanupTitles.php and NamespaceDupesWT.php were running. I'm not sure how that could cause this.

Anyhow running updateCollation.php should fix the issue as it will see it as an old style row and update it appropriately.

Note, there is also 15695 such rows on enwiki.

mysql> explain select count(*) from categorylinks where cl_collation = '';
+----+-------------+---------------+------+---------------+--------------+---------+-------+--------+--------------------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+----+-------------+---------------+------+---------------+--------------+---------+-------+--------+--------------------------+

1SIMPLEcategorylinksrefcl_collationcl_collation34const115866Using where; Using index

+----+-------------+---------------+------+---------------+--------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

Running updateCollation.php against commonswiki currently...

mysql> explain select count(*) from categorylinks where cl_collation != 'uppercase';
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+--------------------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+--------------------------+

1SIMPLEcategorylinksrangecl_collationcl_collation34NULL60341Using where; Using index

+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+--------------------------+
1 row in set (0.02 sec)

Enwiki is clean now.

Running it via foreachwiki, seems most wikis are clean, but not all, just noticed cawiki and dewiki weren't (for a couple of examples)