Page MenuHomePhabricator

Update FlaggedRevs for new categorylinks schema
Closed, ResolvedPublic

Description

Author: ayg

Description:
FlaggedRevs uses cl_sortkey, so it might be broken by the categorylinks changes http://lists.wikimedia.org/pipermail/wikitech-l/2010-July/048399.html. I looked at the code and wasn't immediately sure what it was doing, so I wasn't sure how to fix it. If someone who knows more about FlaggedRevs could either fix it or say what needs to be done or say what FlaggedRevs uses categorylinks for, please do.

Important changes are:

  1. The (cl_to, cl_sortkey, cl_from) index was changed to (cl_to, cl_type, cl_sortkey, cl_from). This means that SELECT ... WHERE cl_to = 'Foo' ORDER BY cl_sortkey; will cause a filesort. It would need to be changed to ORDER BY cl_type, cl_sortkey, or have AND cl_type = 'page' added, or something. (cl_type is a new ENUM equal to either 'page', 'subcat', or 'file' depending on the namespace of cl_from.) Likewise if you have offsets, those would need to include cl_type too to use the index properly.
  1. cl_sortkey is now a binary string, at least in principle. Right now it's mostly text but could have embedded nulls; in the future it will likely be changed to pure binary sortkeys generated by CLDR or something. If you're printing these or anything, don't.

Possibly no changes are needed, I'm not sure.


Version: unspecified
Severity: normal

Details

Reference
bz25084

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:13 PM
bzimport set Reference to bz25084.

(In reply to comment #0)

  1. The (cl_to, cl_sortkey, cl_from) index was changed to (cl_to, cl_type,

cl_sortkey, cl_from). This means that SELECT ... WHERE cl_to = 'Foo' ORDER BY
cl_sortkey; will cause a filesort. It would need to be changed to ORDER BY
cl_type, cl_sortkey, or have AND cl_type = 'page' added, or something.
(cl_type is a new ENUM equal to either 'page', 'subcat', or 'file' depending on
the namespace of cl_from.) Likewise if you have offsets, those would need to
include cl_type too to use the index properly.

We only use the sortkey in two places. One of them isn't affected. The second was, and is fixed in r72509.

  1. cl_sortkey is now a binary string, at least in principle. Right now it's

mostly text but could have embedded nulls; in the future it will likely be
changed to pure binary sortkeys generated by CLDR or something. If you're
printing these or anything, don't.

We don't :)