Page MenuHomePhabricator

Search: Add ability to search for special chars
Closed, ResolvedPublic

Description

I mean characters like '<'. See short entry on wiki - https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Search_for_special_chars


Version: 1.25-git
Severity: normal

Details

Reference
bz72381

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:58 AM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz72381.
bzimport added a subscriber: Unknown Object (MLST).

Technically this is already implemented with Cirrus's source regex queries but its super duper slow in production now. Right now the default implementation is to brute force run the regex over all the pages. That takes, like, 10 minutes on enwiki if you can't reduce the set of considered pages some other way (title filter, other required text, smaller namespace, etc). After about a minute of waiting on the search varnish normally chops the request and sends you a timeout which is pretty lame. So 10 minutes of compute time get wasted (kinda, we mitigate it a bit but it still lame).

Anyway, we're in the process of deploying trigram accelerated regex searches so we only actually have to run the regexes on pages that have a chance of matching the regex in the first place. In the common case its something like 60 times faster than the brute force. 10 seconds is ok to wait if not great. In the worst case we actually cut the query off at some point and don't let it take any more time. This can cause weird results (Bug 72128) but at least you get results at all rather than waiting forever.

The trigram searches aren't the default because we haven't built the trigram index for all the wikis. The plan is to make it the default once the trigram index is built for all the wikis which will take another few days.