Page MenuHomePhabricator

Lucene should appropriately highlight text in quotes
Closed, ResolvedPublic

Description

Example URL: http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%22peace+be+upon+him%22&ns0=1&fulltext=Search

Ideally, information inside quotes would be interpreted as a single unit, similar to the behavior of Google and many other search engines.


Version: unspecified
Severity: minor

Details

Reference
bz15573

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:19 PM
bzimport set Reference to bz15573.
bzimport added a subscriber: Unknown Object (MLST).

Upon discussion with rainman-sr, it would seem that the results are accurate, but the highlighting is intentionally handicapped for performance reasons.

I've adjusted the bug summary accordingly (under the assumption that this isn't a duplicate bug).

We can't keep trying to actively encourage people to use the internal MW search if we're going to handicap things like proper highlighting. The results become entirely context-less and almost entirely useless.

Basic search result highlighting is done by building a regular expression from the search terms which _more or less_ matches what the search engine will do.

The MySQL search engine class was fixed some time ago to handle quoted phrases as a single chunk, but the Lucene extension (MWSearch) wasn't checking for quotes in its parsing. I've ripped the parsing code from SearchMySQL and copied it to MWSearch to handle this better for now.

It would probably be good to do this with some common code to avoid the duplication (though in SearchMySQL this is combined with the actual generation of the MySQL boolean search query, making it a bit more complicated). Might also be good to handle a wider range of whitespace, etc.

It's still not perfect, but it now handles these common cases pretty much as expected.

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]