Page MenuHomePhabricator

RegEx-style Cirrus searches are ignored on en.wikipedia
Closed, ResolvedPublic

Description

Author: SpontaneousGrumbler

Description:
A search for "wyoming"insource:/wyoming/ should find only lower-case examples, but over 45,000 articles, mostly capitalized, are returned. This worked 2 days ago.


Version: unspecified
Severity: normal

Details

Reference
bz72894

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:58 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz72894.

Yes, they're disabled for the time being as they were causing downtime.

SpontaneousGrumbler wrote:

And how were the users of WP notified that the feature was turned off? How long is "the time being"?

Hi Grumbler!

I sent an email to wikitech-ambassadors@lists.wikimedia.org on October 31st at 11:30 US Eastern time. Its how I've done the bulk of the Cirrus communication at this point and I thought it was the right way to announce it.

I should have updated a recent conversation on en:wp's Village Pump and did when someone pinged me. I should have remembered to do it earlier but I was trying to fix the problem.

As far as how long the time being is - I'm working on it now. I have a hacked together solution the demonstrably works but that is only the first step. The remaining steps are mostly these:

  1. Work with upstream (Lucene) to get a version of the hack that'd be acceptable for them to merge.
  2. Backport those changes to Lucene to the plugins we use for regex search and highlighting.
  3. Release new versions of those.
  4. Deploy them to our search cluster.
  5. Reenable the feature.

You can sort of fudge on 1 and just backport something more hacky but that is somewhat more dangerous from a stability standpoint. But that is life and we'll do it if upstream drags their feet. So far they've been reasonably responsive though.

Estimating a time from that is tricky. A week from today?

Quick update:
Step 1 is moving along pretty quickly. I'm communicating closely with a Lucene committer on getting a patch merged for this. It feels like we'll get something merged today which means its worth waiting for that before moving on to step 2.

Also there is a step 2.5: Update Cirrus to catch the new error message from step #2 and produce a useful message for the user.

SpontaneousGrumbler wrote:

CirrusSearch is set to be rolled out to en.wikipedia on November 19. Please tell me that when that date was set they knew about this outage and were confident that this will be fixed in time for the The Grand Opening.

(In reply to SpontaneousGrumbler from comment #6)

CirrusSearch is set to be rolled out to en.wikipedia on November 19. Please
tell me that when that date was set they knew about this outage and were
confident that this will be fixed in time for the The Grand Opening.

I would hope that Nik and I knew about the outage when we picked that date.

(In reply to SpontaneousGrumbler from comment #6)

CirrusSearch is set to be rolled out to en.wikipedia on November 19. Please
tell me that when that date was set they knew about this outage and were
confident that this will be fixed in time for the The Grand Opening.

Yup.

Here is the status:
The fix for the cause of the outage is live in beta. When I tried it on Saturday I found an error where sometimes the right error message isn't shown when the regex is too complex to use. I've *just* finished the fix for that.

The plan right now is get that to beta today and validate it.

We'll deploy the fix to production on Wednesday. Today or Tuesday would have been better but Tuesday is a US holiday and we'll have less people on hand in the unlikely event that something goes wrong.

That puts us reenabling regex search on Thursday. Its long than we'd thought/hoped.

SpontaneousGrumbler wrote:

Thanks for the update.

Plugins deployed. We'll be pushing code to reenable the searches in our general window which starts in an hour and a half.

SpontaneousGrumbler wrote:

Thanks. It appears to be working now.

Hey, glad its working for you. I keep getting:
An error has occurred while searching: Too many regular expression searches currently running. Please try again later.
which is a pain. I think something is up with the counter because I totally don't see that many regex searches.

SpontaneousGrumbler wrote:

Yes, well, it worked for one search. Ever since then, I keep getting the same error message, "Too many regular expression searches". Yes, something is wrong; failing to decrement the counter seems a likely cause.

Yeah. Something. I'll be able to spend some time with it in the morning. I've he a suspicion that that counter is lying for a while now. Its actually the same counter that we for all kinds of stuff and its pretty hard *not* to decrement it.

I'm at least glad it doesn't just hate me.

OK! I found the problem. Our pool counter work differently then how I thought it did. I've prepared a patch to deploy and we'll sync it out during the Monday morning deploy.

https://gerrit.wikimedia.org/r/#/q/I7586162cfb32ddbe460a25c956c845f2f4a49b0f,n,z

SpontaneousGrumbler wrote:

Good news!

SpontaneousGrumbler wrote:

I would say it has been much better for the last few days. If you want to mark this bug as "fixed", I won't disagree.

Between Nik's fixes and the fact that we segmented enwiki's PoolCounter traffic to its own key I think we're in a way better spot than before.

Resolving FIXED. Please reopen if this becomes a problem again.