Page MenuHomePhabricator

String "PO:BAR:AZU" is not searched as a whole
Closed, DeclinedPublic

Description

Author: massimo.palmieri1

Description:
when I serach for string PO:BAR:AZU I wish retrieve only the string "PO:BAR:AZU", not string "PO:BAR" close to string "AZU". Thank you


Version: unspecified
Severity: normal
Whiteboard: cirrus-fixed

Details

Reference
bz54669

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:19 AM
bzimport set Reference to bz54669.
bzimport added a subscriber: Unknown Object (MLST).

Do you have a concrete example, maybe a link to a search that isn't returning what you need?

The problem with this request in general is that (mostly) search breaks up terms using http://www.unicode.org/reports/tr29/#Word_Boundaries when articles are changed and changing that requires rebuilding the index of all articles _and_ most users actually want to split words on colons.

An option that you have now is to search for "PO:BAR:AZU"~0 which will search for PO, BAR, and AZU which are _right_ next to each other.

Despite my dithering I'll prioritize this to low for now and see what I can do about it when I get to it.

So I had a look at this and it looks like the default text segmentation algorithm seems to do exactly what you want. Have a look here:

https://test2.wikipedia.org/w/index.php?search=PO%3ABAR%3AAZU&title=Special%3ASearch&fulltext=1

Can you give another example? Closing as WORKSFORME for now.

massimo.palmieri1 wrote:

If I search (in it:wiki) for the string PO:BAR:AZU I retrieve 34 entries (see https://it.wikipedia.org/w/index.php?search=PO%3ABAR%3AAZU&title=Speciale%3ARicerca), but only 7 of these entries are containing the string PO:BAR:AZU. The remaining entries have fractions of this string, but not the entire contiguous string.
I can't use the search string "PO:BAR:AZU", because the search tool is not able to retrieve this string (and so I couldn't able to use "PO:BAR:AZU"~0 because the search tool gives the same result (no entry).
The search with the string PO:BAR:AZU~0 retrieves 3 entries, but they are absolutely not useful, and I don't know why.
Excuse for my request, but these strings are used to identify heraldic forms in descriptions like mathematical formulas and I need use the entire string to make searches.
Thanks
Massimo Palmieri

it:wiki isn't using CirrusSearch! it:wiktionary does, but not it:wiki. For what it is worth, I checked the Italian analyzer that we use for it:wiktionary and it leaved PO:BAR:AZU as a single token - which is what you want.

I'm sorry I can't be of more help with the other search tool.

This bug report is INVALID for CirrusSearch as CirrusSearch is not involved in the problem.
Changing component from CirrusSearch to MWSearch (though MWSearch will not receive much attention anymore).

Marking WONTFIX as MWSearch/lsearch has reached end of life and is replaced by Cirrus/Elasticsearch.