Page MenuHomePhabricator

Page shows up in search results though it does not include a search term
Closed, ResolvedPublic

Description

Author: sumanah

Description:
To reproduce:

  1. Go to https://test2.wikipedia.org
  1. In the upper-right-hand searchbox, search for love message and hit Enter.
  1. Result page: https://test2.wikipedia.org/w/index.php?search=love+message&title=Special%3ASearch with a top result: https://test2.wikipedia.org/wiki/Love_issue

Problem: the [[Love issue]] page does not include the word "message", so it shouldn't be in the results.

This may be related to bug 40210.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=53799

Details

Reference
bz52904

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:49 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz52904.
bzimport added a subscriber: Unknown Object (MLST).

I don't believe this is related to Bug 40210 but it is complicated because it highlights three things:

It looks like lsearchd finds documents that contain ALL the terms then ranks them whereas CirrusSearch finds all the documents that contains ANY OF the terms then ranks them. What CirrusSearch does is actually normal behaviour for a search engine - both google and bing do it. It wouldn't be hard for me to change CirrusSearch to work just like lsearchd, but I'm not sure it'd be right.

This may be related to 52906. I don't think so, though.

The reason this page is so highly ranked is because it contains a single word title match and no other pages do. Right now CirrusSearch is configured to weight any title matches very highly compared to text matches.

All and all, I'm not sure what action to take on this bug, if any.

sumanah wrote:

(Just a tip: thank you for mentioning bug 52906 . It's great to say "bug 52906" or "bug # 52906" if you mention a bug number, because then BZ automatically links to it! Magic.)

I think we'd want to distinguish among these cases:

  • there are no results that include all the search terms, and some results that are partial matches
  • there are very few results that include all the search terms, and then more partial matches
  • there are lots of results that include all the search terms

I agree that if there are NO results that include all the search terms, then we should offer partial matches (and say that we are doing so, and why).

And if there are FEW full matches and a lot of partial matches, I believe that we would generally want to rank full matches above partial matches. I personally prefer to rank full matches above partial matches. On real wikis with non-loremipsum content, I'm sure that page title matches are pretty important and should be ranked pretty high. But maybe we should just test that again after rollout to mediawiki.org.

I agree with testing again after we roll out to mediawiki.org. We may not be able to be truly happy with testing until we deploy this to enwiki as the non-default search backend.

I'm setting the priority to high so I'm sure to look at it again.

So after some more research it looks like I Google and Bing to default to AND and I was just mistaken. I'll switch it in the morning. I would like to one day do the whole "we couldn't find enough matches with your query so we tried some other queries for you" thing but right now defaulting to AND is probably the right thing to do.