Page MenuHomePhabricator

Valid Page name resulting in backend search error
Closed, DeclinedPublic

Description

Author: protonk

Description:
Searching for "Wikipedia talk:Articles for creation/2012: A Year Of No Significance ... Colonial Penetrations in the First World - Part 1 in the Sound and Light of India series" (yes, that's the exact title, see here https://en.wikipedia.org/wiki/Wikipedia talk:Articles for creation/2012: A Year Of No Significance ... Colonial Penetrations in the First World - Part 1 in the Sound and Light of India series) results in the following error:

  • "An error has occurred while searching: The search backend returned an error: Internal error in SearchEngine: Trying to extract field from zero-length list of terms"

STR:

  1. Paste in "Wikipedia talk:Articles for creation/2012: A Year Of No Significance ... Colonial Penetrations in the First World - Part 1 in the Sound and Light of India series" (without quotes) to the search box and click search/hit enter
  2. There is no step 2

I don't know how Cirrus is implemented so I'm not sure what is causing this, though I'd guess it expects ... as a range operator.


Version: unspecified
Severity: normal

Details

Reference
bz66259

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:21 AM
bzimport set Reference to bz66259.
bzimport added a subscriber: Unknown Object (MLST).

That's a notorious lsearchd error that I've spent a day or two trying to track down only to come up empty handed. Its probably caching related and probably will go away with time. I've tried restarting the services to clear that cache in the past but that just causes lsearchd to break across the site and doesn't fix the problem.

Cirrus doesn't spit out any errors when I search with that string but it doesn't find the page because it is deleted. Searching for <all:Articles for creation/2012: A Year Of No Significance ... Colonial Penetrations in the First World - Part 1 in the Sound and Light of India series> does bring up discussion about it.

Cirrus doesn't use ... as a range operator. Cirrus's is syntax is documented here: https://www.mediawiki.org/wiki/Search/CirrusSearchFeatures . Its reasonably up to date.

If you are sure you got the error from cirrus feel free to swap the component back. I did try it myself and got it on lsearchd and not cirrus.

protonk wrote:

Hey Nik,

I'm indeed not sure I got it from Cirrus. Until I convince myself otherwise I'll stick with that change.

As some additional info, it appears that searching for "Wikipedia talk:Articles for creation/2012: A Year Of No Significance ." will also trigger the error, with the minimum case so far being something like

"Wikipedia:A/b: c ."

Which is still (AFAIK) a valid page name. Interestingly (or maybe not), removing the period returns an error, but it is "An error has occurred while searching: The search backend returned an error:"

HTH,

-Adam

When I've seen these in the past they were based on the search finding a certain results not the query itself. I could find the code actually throwing the exception but couldn't really do anything about it :(

Unfortunately, I'd suggest checking back in a week and seeing if its still broken. This is one of the weird bugs that comes up from time to time that made us really want to make cirrus. As flawed as it is right now we're able to debug it much more easily for a bunch of reasons. I'm honestly afraid that anything that I do to fix this will break lsearchd more.

protonk wrote:

Unfortunately, I'd suggest checking back in a week and seeing if its still broken.

Can do.

Dumb question, if I set up a local install on a VM, how would I get at the lucerne errors getting dumped out?

(In reply to Adam Hyland from comment #6)

Dumb question, if I set up a local install on a VM, how would I get at the
lucerne errors getting dumped out?

Setting up lucene search is a bit difficult. I've actually never gotten it 100% working in a development environment. I think you are even less likely to get these errors because you'd need to set up replication.....
Setting up Cirrus is easier and I'll volunteer it here even if it isn't helpful! Because posterity!

  1. Set this up: https://www.mediawiki.org/wiki/MediaWiki-Vagrant
  2. vagrant enable-role cirrussearch
  3. vagrant provision

Sorry I couldn't be more help.

protonk wrote:

Just a minor update. Searching again for "Wikipedia:A/b: c ." results in the same error. That's not really dispositive as it could be the same caching issue cropping up again, but I said I'd check back later. :)

It's an issue the old search is having with punctuation in search queries. It's been like this for weeks (months?) now and I've never managed to run it down. Nothing to do with caching.

protonk wrote:

(In reply to Chad H. from comment #9)

It's an issue the old search is having with punctuation in search queries.
It's been like this for weeks (months?) now and I've never managed to run it
down. Nothing to do with caching.

So Nik notes that setting up Lucerne + replication in a development environment will be prohibitively difficult. Is there anything else I can do to help track this down?

  • Bug 71722 has been marked as a duplicate of this bug. ***
  • Bug 69730 has been marked as a duplicate of this bug. ***
demon claimed this task.

Poor lsearchd. RIP.