Page MenuHomePhabricator

Case insensitivity in the search box does not function for Titles with Mixed Case
Closed, ResolvedPublic

Description

Case insensitivity doesn't work for Titles with Mixed Case (see [[WP:MIXEDCAPS]]).

We had a bot creating thousands upon thousands of redirects from the small-case versions of mixed-case articles. http://en.wikipedia.org/wiki/User:BOTijo

This is obviously sub-optimal to simply fixing the case-insensitivity in the search box to find the mixed case article automagically.

We've revoked the bot's authorization in hopes this bug can be fixed.


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/wiki/WP:MIXEDCAPS

Details

Reference
bz19882

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 10:40 PM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz19882.

"Go" searches to titles with mixed capitalization should work just fine on Wikipedia for the last year or two thanks to the TitleKey extension; there's no need to create redirects for that.

Can you provide some sample searches that fail? We might have had a regression in functionality or a break in indexing.

See any Page with Mixed Caps, e.g.

"Congolese Union of Republicans" (this is a new page I just found, might not be around when you get to this)

Type in search box: "congolese union of republicans" = Does not get you to the article.

(In reply to comment #1)

"Go" searches to titles with mixed capitalization should work just fine on
Wikipedia for the last year or two thanks to the TitleKey extension; there's no
need to create redirects for that.

[[en:Special:Version]] doesn't list TitleKey at this time.

rainman wrote:

We are using lucene as a prefix backend on en.wp at the moment.

catlow wrote:

Has TitleKey recently been removed? I'm sure this used to work fine. Will it be restored?

(In reply to comment #4)

We are using lucene as a prefix backend on en.wp at the moment.

Lucene doesn't seem to use the SearchGetNearMatch hook, which AFAICT is what is needed to affect the "Go" button.

rainman wrote:

It would be good if they both could co-exist. lucene.php should be loaded after titlekey in CommonsSettings.php and $wmgUseTitleKey = false removed from lucene.php. Also, titlekey index might need rebuilding for the past couple of months.

I think I found our problem:

} elseif ( in_array( $wgDBname, array( 'enwiki' ) ) ) {

  1. Big RAM pool 1, via LVS $wgLuceneHost = '10.2.1.11'; $wgLuceneSearchVersion = 2.1; $wgEnableLucenePrefixSearch = true; $wgLucenePrefixHost = '10.0.3.8'; #search8 $wmgUseTitleKey = false;

}

For some mysterious reason the Lucene configuration disables TitleKey on enwiki. Ouch! Removing this...

Ok, TitleKey is reenabled and I'm rebuilding the index.

Ok, me & Robert worked out the compat issue between TitleKey and MWSearch; should now be fixed with the adjustment from r54533.

TitleKey is still on to handle the "go" search, but no longer interferes with MWSearch's Lucene prefix search when it's enabled as long as we load them in the right order.

Ok, TitleKey index rebuild is now done and we have the best of both worlds. :) Case-insensitive match on 'go' searches works and we have the more advanced drop-down ajax search with the Lucene backend.