Page MenuHomePhabricator

Search index not updating on en.wikipedia
Closed, ResolvedPublic

Description

Author: SpontaneousGrumbler

Description:
Search for "2014 ATP World Tour is the global elite" – results show "2014 ATP World Tour" article was indexed early on 29 May 2014, but its history shows it was updated several times later that day and over the next several days.


Version: wmf-deployment
Severity: normal
Whiteboard: cirrus-fixed

Details

Reference
bz66011

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:22 AM
bzimport set Reference to bz66011.

SpontaneousGrumbler wrote:

So, you are saying that we will not update the production search index until CirrusSearch goes into production? CirrusSearch is not ready for prime time.

(In reply to SpontaneousGrumbler from comment #2)

So, you are saying that we will not update the production search index until
CirrusSearch goes into production? CirrusSearch is not ready for prime time.

Its not a cut and dry as that. The production search index is updating but slowly and so far as I can tell spotily. We're not going to fix it because every time we try it causes more problems. We don't have the expertise to even deploy an update any more much less trace down what's going on in this case.

So we focus on Cirrus because the hard bit (Elasticsearch) has significantly more folks using it so the knowledge gap isn't so bad. Also the documentation is better.

As far as Cirrus not being read for prime time, are you referring to any particular feature shortcoming or "just" the performance aspect? I'd love to know so I can, well, fix it.

SpontaneousGrumbler wrote:

Chris the speller has posted on mediawiki.org/wiki/Talk:Search about hyphens being ignored. Also, I have seen many cases where some hiccup caused CirrusSearch to miss an update; these apparently will never get fixed, whereas the current updater for lsearchd will provide a completely updated index whenever it runs to completion, even if that is not every day, as it should be.

Yes, there are some bugs, which will be solved. Cf. https://www.mediawiki.org/wiki/Thread:Talk:Search/LiquidThreads_archive/%27Old_search%27_is_better

(In reply to SpontaneousGrumbler from comment #4)

Also, I have seen many cases

{{vague}}

where some hiccup caused
CirrusSearch to miss an update; these apparently will never get fixed,

It's enough to dummy edit the page (perhaps even null edit, I don't remember).

whereas the current updater for lsearchd will provide a completely updated
index whenever it runs to completion, even if that is not every day, as it
should be.

Sounds like "whenever the gates of heaven open on earth, even if that is not every day, as it should be". AFAIK this has not happened in years.

It's useless for us editors to pretend otherwise, this bug will not be fixed in this component (lsearchd). Please keep testing CirrusSearch, feedback and criticism are very useful to the devs.

If we're going to wontfix this bug, it seems like we should really have cirrus deployed as primary. Or at least deployed in the very near future...


For reference, people at commons are also complaining that new files (within the last 4 days) aren't being indexed by Lucene.

(In reply to Bawolff (Brian Wolff) from comment #6)

For reference, people at commons are also complaining that new files (within
the last 4 days) aren't being indexed by Lucene.

They should probably have a brief discussion at village pump and then ask file a site request for Cirrus to be primary, then.
There is a timeline at https://www.mediawiki.org/wiki/Search#Wikis but it's slightly out of date.

  • Bug 70984 has been marked as a duplicate of this bug. ***

dempsey-roll wrote:

Pardon me, as I am not particularly technically inclined. As I understand it, the currently primary search engine/search engine backend for English and other Wikipedias is buggy, the bug(s) is/are not going to be corrected any time soon if at all, a new search engine/search engine backend is supposed to be made the primary sometime soon, and in the meantime the users and editors of English Wikipedia have to wait. Oh, and the update time for the search engine index is at least four days. Is that a fair assessment of the situation?

It looks almost fair to me:

  • As for "English and other Wikipedias", there are only four projects left where CirrusSearch (the new search engine, which *doesn't* have problems with updates stalling and *doesn't* need days to update) isn't default: de.wp, en.wp, fr.wp and zh.wp. The remaining 881 (sic) are using the new search engine already.
  • As for "soon", [[mw:Search#Timeline]] says that "Our general goal is to deploy CirrusSearch as the primary search backend for all wikis by the end of September 2014", and this seems realistic to me based on the table of deployments completed there. Two more weeks of waiting sounds reasonable to me.

(In reply to John C. Watson from comment #9)

Is that a fair assessment of
the situation?

The general idea is correct. On the bright side, [[mw:Search#Wikis]] is now up to date and it's probably just a matter of weeks before they're able to enable Cirrus on the last 4 wikis.

SpontaneousGrumbler wrote:

Well, LuceneSearch's index got updated today (24 September 2014). Thanks to the person who got it running!

Chad got it! He fixed it yesterday but it takes a day to really be sure it worked.

dempsey-roll wrote:

How long should it take for the index of new primary search engine to update? (I made some edits over 24 hours ago, but the searches I just made have yet to "notice" them.)

I'm not entirely sure as I hadn't managed to track down *all* of the problems with indexing, just some. There are many :(

Some pages are definitely being updated: [[2014]] is now showing the version from the 27th of September (it was stuck at the 11th!), other articles at a cursory glance do have some recent timestamps as well.

So my advice, much as it's lame, is to be patient and things will continue to slowly update.

dempsey-roll wrote:

Okay—understood. (This issue is important to me because I "patrol" a list of frequently/occasionally misspelled words, some instances of which are valid for one reason or another, so a frequently updated index is very helpful in finding new mistakes and making sure that I really did correct old ones.)