Page MenuHomePhabricator

Specified key was too long error in update.php script while altering categorylinks table which has myisam engine
Closed, ResolvedPublic

Description

Author: cnit

Description:
I am trying to upgrade 1.15.4 to 1.17 trunk, while running update.php I've got the following error message:

...doing rev_id from 59003 to 59202
...doing rev_id from 59203 to 59402
...doing rev_id from 59403 to 59602
...doing rev_id from 59603 to 59802
rev_len population complete ... 3701 rows changed (0 missing)
Creating iwlinks table...ok
...iwl_prefix_title_from key already set on iwlinks table.
Adding ul_value field to table updatelog...ok
Adding iw_api field to table interwiki...ok
...iwl_prefix key doesn't exist.
...iwl_prefix_from_title key doesn't exist.
Adding cl_collation field to table categorylinks...A database query syntax error has occurred.
The last attempted database query was:
"ALTER TABLE wiki_categorylinks
CHANGE COLUMN cl_sortkey cl_sortkey varbinary(255) NOT NULL default '',
ADD COLUMN cl_sortkey_prefix varchar(255) binary NOT NULL default '',
ADD COLUMN cl_collation varbinary(32) NOT NULL default '',
ADD COLUMN cl_type ENUM('page', 'subcat', 'file') NOT NULL default 'page',
ADD INDEX (cl_collation),
DROP INDEX cl_sortkey,
ADD INDEX cl_sortkey (cl_to, cl_type, cl_sortkey, cl_from)
"
from within function "DatabaseBase::sourceFile( /var/www/wiki/phase3/maintenance/archives/patch-categorylinks-better-collation.sql )".
Database returned error "1071: Specified key was too long; max key length is 1000 bytes (localhost)"

Shouldn't the length of field indexes be restricted in myisam mode (a different patch file)? The "source" wiki MW 1.15.4 DB has myisam tables.

If that is not a bug, is there any way to fix it not having to patch maintenance/tables.sql and maintenance/patch-categorylinks-better-collation.sql ?

Should I patch them temporarily and which key limitations would you suggest?


Version: 1.17.x
Severity: enhancement

Details

Reference
bz25503

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:17 PM
bzimport set Reference to bz25503.

cnit wrote:

By the way, that wiki was happily going 1.12 to 1.14 then to 1.15 (in myisam mode) and I don't recall such errors while running update.php (hope my memory is not too short).

cnit wrote:

I've changed maintenance/patch-categorylinks-better-collation.sql like this (see an attachment) and the problem went away. Is it OK to use cl_sortkey(1) in cl_sortkey index? Usually only first letter of sortkeys matters. Perhaps the value () can be increased, before key length limit is reached - I haven't checked it. But anyway, myisam upgrade now works and 1.17 trunk wiki is functional.

Maybe it would be a good idea for wikimedia to have a myisam-based setup at some of it's testing hosts?

cnit wrote:

Fixes index of categorylinks of 1.17 schema in myisam mode

Attached:

ayg wrote:

(In reply to comment #0)

I am trying to upgrade 1.15.4 to 1.17 trunk, while running update.php I've got
the following error message:

...doing rev_id from 59003 to 59202
...doing rev_id from 59203 to 59402
...doing rev_id from 59403 to 59602
...doing rev_id from 59603 to 59802
rev_len population complete ... 3701 rows changed (0 missing)
Creating iwlinks table...ok
...iwl_prefix_title_from key already set on iwlinks table.
Adding ul_value field to table updatelog...ok
Adding iw_api field to table interwiki...ok
...iwl_prefix key doesn't exist.
...iwl_prefix_from_title key doesn't exist.
Adding cl_collation field to table categorylinks...A database query syntax
error has occurred.
The last attempted database query was:
"ALTER TABLE wiki_categorylinks
CHANGE COLUMN cl_sortkey cl_sortkey varbinary(255) NOT NULL default '',
ADD COLUMN cl_sortkey_prefix varchar(255) binary NOT NULL default '',
ADD COLUMN cl_collation varbinary(32) NOT NULL default '',
ADD COLUMN cl_type ENUM('page', 'subcat', 'file') NOT NULL default 'page',
ADD INDEX (cl_collation),
DROP INDEX cl_sortkey,
ADD INDEX cl_sortkey (cl_to, cl_type, cl_sortkey, cl_from)
"
from within function "DatabaseBase::sourceFile(
/var/www/wiki/phase3/maintenance/archives/patch-categorylinks-better-collation.sql
)".
Database returned error "1071: Specified key was too long; max key length is
1000 bytes (localhost)"

Please post the output of "SHOW CREATE TABLE wiki_categorylinks\G". I have a guess as to how this could happen, but would like to confirm it.

Should I patch them temporarily and which key limitations would you suggest?

Make the cl_sortkey column (*not* the index) shorter until it works. Offhand, I'd think

ALTER TABLE wiki_categorylinks CHANGE COLUMN cl_sortkey cl_sortkey varbinary(230) NOT NULL default '';

should do it, because I'm guessing it's 255*3 = 765 bytes for cl_to in utf8, one byte for cl_type, four bytes for cl_from, 765 + 1 + 4 = 770, leaving 230 bytes for cl_sortkey. But I'd like that SHOW CREATE TABLE from you to confirm it.

Ideally, please try different sizes and tell me the biggest that works. My first guess is 230 should work but 231 shouldn't, but try various sizes and tell me what you get.

(In reply to comment #2)

I've changed maintenance/patch-categorylinks-better-collation.sql like this
(see an attachment) and the problem went away. Is it OK to use cl_sortkey(1) in
cl_sortkey index?

No. That destroys the point of the index, you may as well just drop the index entirely if you do that. The field itself has to be shortened, not the index. The index has to cover the whole field to work correctly. (Although if your wiki is small enough, you might not notice the difference if the index isn't working.)

Maybe it would be a good idea for wikimedia to have a myisam-based setup at
some of it's testing hosts?

This is not related to MyISAM, as far as I know -- it's probably related to the use of utf8 collation instead of binary.

cnit wrote:

Please post the output of "SHOW CREATE TABLE wiki_categorylinks\G". I have a guess as to how this could happen, but would like to confirm it.

Here's the output:

CREATE TABLE wiki_categorylinks (

`cl_from` int(10) unsigned NOT NULL DEFAULT '0',
`cl_to` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
`cl_sortkey` varbinary(255) NOT NULL DEFAULT '',
`cl_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`cl_sortkey_prefix` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
`cl_collation` varbinary(32) NOT NULL DEFAULT '',
`cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
UNIQUE KEY `cl_from` (`cl_from`,`cl_to`),
KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
KEY `cl_collation` (`cl_collation`),
KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`(1),`cl_from`)

) ENGINE=MyISAM DEFAULT CHARSET=utf8

Then, I've tried field length of 231 utf8 chars:
ALTER TABLE wiki_categorylinks CHANGE COLUMN cl_sortkey cl_sortkey
varbinary(231) NOT NULL default '';

caused the same error when altering index, while

230 utf8 chars seems to be OK:
ALTER TABLE wiki_categorylinks CHANGE COLUMN cl_sortkey cl_sortkey
varbinary(230) NOT NULL default '';

fixed the error:
mysql> ALTER TABLE wiki_categorylinks DROP INDEX cl_sortkey, ADD INDEX cl_sortkey (cl_to, cl_type, cl_sortkey, cl_from);
Query OK, 11510 rows affected (1.06 sec)
Records: 11510 Duplicates: 0 Warnings: 0

No. That destroys the point of the index, you may as well just drop the index

entirely if you do that. The field itself has to be shortened, not the index.
The index has to cover the whole field to work correctly. (Although if your
wiki is small enough, you might not notice the difference if the index isn't
working.)
Thanks for info, my mistake.

This is not related to MyISAM, as far as I know -- it's probably related to the use of utf8 collation instead of binary.

I've switched another wiki from 1.16 trunk (with InnoDB tables) to 1.17 trunk just few days before and there was no warnings or errors during execution of update.php. Has that patch-categorylinks-better-collation.sql already been incorporated into pre-release 1.16 ? It seems not - cannot find such patch in old backup. So I am not absolutely sure that it's unrelated to MyISAM (however this is another 1.15.4 wiki).

ayg wrote:

Fixed in r74798. Thanks for the report and testing.

ayg wrote:

(and thanks to Max for CC'ing me)

cnit wrote:

Aryeh, I've decided not to open new bug but reopen this one instead, because in r75939 with MyISAM utf8 tables 'php update.php' gave me another "key too long" message (this time in Windows, but that probably makes no difference):
.........
Fixing collation for 11510 rows.
1000 done.
2000 done.
3000 done.
4000 done.
5000 done.
6000 done.
7000 done.
8000 done.
9000 done.
10000 done.
11000 done.
11491 done.
Creating msg_resource table...A database query syntax error has occurred.
The last attempted database query was:
"CREATE UNIQUE INDEX mrl_message_resource ON wiki_msg_resource_links (mrl_mess
age, mrl_resource)
"
from within function "DatabaseBase::sourceFile( C:\www\phase3/maintenance/archiv
es/patch-msg_resource.sql )".
Database returned error "1071: Specified key was too long; max key length is 100
0 bytes (127.0.0.1)"

SHOW CREATE TABLE gives the following output:
CREATE TABLE wiki_msg_resource_links (

`mrl_resource` varchar(255) NOT NULL,
`mrl_message` varchar(255) NOT NULL

) ENGINE=MyISAM DEFAULT CHARSET=utf8

I still haven't tried to correct the length of fields manually, so I don't know if that is a last error of such kind. I hope so.

ayg wrote:

Please open a new bug, this is a separate issue. Product is MediaWiki, component is Resource Loader. Maybe there should be a primary key on (mrl_message) instead of a unique key on (mrl_resource, mrl_message), but perhaps you want duplicate mrl_messages sometimes, I don't know. Possibly the columns could just be shortened here too. I don't know what this table does offhand, so I can't say. Someone like Trevor or Roan will have to look at it.

(In reply to comment #9)

Please open a new bug, this is a separate issue. Product is MediaWiki,
component is Resource Loader. Maybe there should be a primary key on
(mrl_message) instead of a unique key on (mrl_resource, mrl_message), but
perhaps you want duplicate mrl_messages sometimes, I don't know.

Yes, that's the entire purpose of the table, you have (resource, message) pairs. It's an n-to-m relationship.

Possibly the
columns could just be shortened here too.

I guess they could be. How short would they have to be?

ayg wrote:

It's 1000 bytes total at three bytes per characters, so you've got 333 characters total for the two columns. Alternatively, you could make them varbinary (or somehow force ASCII encoding for varchar?) so that they're one byte per character for everyone.

(In reply to comment #11)

It's 1000 bytes total at three bytes per characters, so you've got 333
characters total for the two columns. Alternatively, you could make them
varbinary (or somehow force ASCII encoding for varchar?) so that they're one
byte per character for everyone.

varbinary sounds good, I'll do that

(In reply to comment #12)

varbinary sounds good, I'll do that

Done in r76150.

ayg wrote:

Dmitriy, does it work now?

cnit wrote:

Aryeh, yes, I've re-imported old 1.15 sql dump, updated local trunk, and re-run php update.php and it seems to work fine. Thank you, Roan.
Next time, if I'll spot the bug, I'll open a new one instead of re-opening this one. I just didn't knew these new tables are belonging to ResourceLoader (now I do).