Page MenuHomePhabricator

CentralNotice banner filter is case sensitive
Open, MediumPublic2 Estimated Story Points

Description

If I go to https://meta.wikimedia.org/wiki/Special:CentralNoticeBanners and enter the text "rory" in the filter field, I get no results.

If I enter the text "Rory" in the filter field, I get one result:

  • PrivacyPolicyDiscussion_Rory1 (Preview on-wiki)

Search should be case insensitive.


Version: unspecified
Severity: normal

Details

Reference
bz53751

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:49 AM
bzimport set Reference to bz53751.

mwalker wrote:

Fascinating. The prototype SQL schema and the one on metawiki differ! We will have to run an alter on the table to get it to be case insensitive.

Metawiki cn_templates.tmp_name is varbinary(255)
CentralNotice.sql cn_templates.tmp_name is varchar(255)

So the alter would be:
ALTER TABLE cn_templates MODIFY COLUMN tmp_name varchar(255);

Wonder if I can just do this tomorrow -- I'll ask Asher.

The cn_templates table is small, so the above ALTER would be OK to just run on metawiki master. Thank you for asking first :-)

Although, I notice that other metawiki cn_% tables with VARCHAR fields in the schema show as VARBINARY too. Does this deserve a more thorough investigation?

mwalker wrote:

(In reply to comment #2)

Does this deserve a more thorough investigation?

So, in doing a more thorough look through I noticed that in fact:

  • All table rows in CentralNotice are binary types
  • The default character set and collation for all the wikis that I checked is binary
    • As CN does not specify a table charset/collation we take the databases which is binary -- which is why all our columns are binary.

The answer to the historical question of "Why are all our tables by default binary charset/collation instead of utf8?" is apparently:

(05:00:55 PM) ori-l: mysql's 'utf8' wasn't real utf8 until a fairly recent version (5.1 iirc)
(05:00:55 PM) ori-l: it was restricted to the basic multilingual plane
(05:01:24 PM) ori-l: which is a problem if you want to support certain languages
(05:01:31 PM) ori-l: and we do

  • so --

I'm going to send an email to the wikitech list about how to resolve this problem.

mwalker wrote:

(In reply to comment #3)

I'm going to send an email to the wikitech list about how to resolve this
problem.

For posterity -- http://www.gossamer-threads.com/lists/wiki/wikitech/394239

Any alterations to the database schema should be wrapped in a maintenance script and/or committed to a Git repo as an SQL patch. Other people presumably use this extension. :-)

Well my understanding is the listed ALTER was actually to bring the schema back inline with the schema already committed to the repo. Not an SQL patch per se; the production schema is simply wrong.

But I agree it should become a maintenance script if the problem is more widespread or indicative of a design flaw with regard to character sets. Switching wholesale to utf8 has many implications.

Pcoombe subscribed.

The same applies to the filter for available banners on the campaign page (https://meta.wikimedia.org/w/index.php?title=Special:CentralNotice&subaction=noticeDetail&notice=test#cn-template-searchbox). I assume the same change would fix that too.

awight set the point value for this task to 2.Apr 19 2016, 8:24 PM

While certainly not a Q2/short-term priority, I am commenting simply to add a vote for how useful it'd be for this search to be case-insensitive.