Page MenuHomePhabricator

full text search phrases
Closed, ResolvedPublic

Description

Author: blentz

Description:
hack to add double quote search phrase functionality

I have a pool of users who are nervous to adopt our MediaWiki installation as a corporate document repository due to a problem using search phrases. Our user community wishes to enter double-quote enclosed search strings to search for multiple works that are adjacent to each other, instead of entering multiple search terms which all may exist separately in different parts of the document.

I was surprised to find on the http://www.mediawiki.org/wiki/Help:Searching and http://en.wikipedia.org/wiki/Help:Searching pages that this isn't a supported function:

Even if you enclose a phrase in quotes, the search looks for each word individually. e.g. if you enter "world war 2" it will return pages that contain "world" and "war" and "2".

Phrase: There is no method for searching for a phrase. Contrary to what you might expect, enclosing phrases in double quotation marks such as "can of tuna" will retrieve all pages containing "of" "tuna" and "can".

I was even more surprised when I realized that my installation had full text support in MySQL 4.1+, using the Boolean Full-Text Searches option. From http://dev.mysql.com/doc/refman/4.1/en/fulltext-boolean.html:

A phrase that is enclosed within double quote (‘"’) characters matches only rows that contain the phrase literally, as it was typed.

Though some digging, I found that MediaWiki is actually stripping the double-quote characters out, even though the user performing the search intended to have them in place and the underlying database search functions support it.

I have hacked together a patch to make this work for the user community, but it's got some limitations (obvious upon review to those who know the code). If there's any chance that search phrases will be implemented in a future version of MediaWiki, please let me know. Our user base thinks they've got a real need for this feature.

Thanks in advance


Version: 1.10.x
Severity: enhancement

Attached:

Details

Reference
bz10699

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:52 PM
bzimport set Reference to bz10699.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

Note that Wikipedia doesn't actually use MySQL fulltext search, it uses the Lucene extension. The latest version of the LuceneSearch extension appears to support this: compare a search for 'lower learning' and '"lower learning"' on Wikipedia. The former returns vastly more results, 7844 instead of 4.

This seems reasonable as a patch to the default fulltext search, but I don't know much of anything about MySQL fulltext search or our support for it, so I'll ask someone else to look at it.

Fixed in r25794, along with bug 4021.

Phrase search was always meant to work, but never quite made it through the filtering stage, whoops. :P

Now using an expanded set of chars for the filter, and the original for parsing through the query to get regexes for the result highlighting.