Page MenuHomePhabricator

CirrusSearch: Index not updated automatically
Closed, ResolvedPublic

Description

Since a couple of months the search indexes of our wikis are not updated anymore on the fly. I have to run the maintenance script "maintenance/runJobs.php".
We use mw1.23.2. The "jobrate" is not set to 0 in the LocalSettings.php file ;-)
Help would as always be appreciated....


Version: REL1_23-branch and now also REL1_24-branch
Severity: normal
OS: Linux
Platform: Other

Details

Reference
bz69387

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:37 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz69387.

Are other jobs running? Like, does runJobs.php pick up lots of other, non-cirrus stuff?

Hallo Nik, no it does not.
runJobs.php just picks up CirrusSearch. I am on mw1.23.2 and on these wikis I use the CirrusSeach Extension for 1.23.x (not master).

This is how runJobs.php looks like

2014-08-14 06:05:30 htmlCacheUpdate Windows_7_Pro pages=array(1) rootJobSignature=76d97e7bf1ad98b6ba40570536cb2558b9fc50da rootJobTimestamp=20140811054116 STARTING
2014-08-14 06:05:30 htmlCacheUpdate Windows_7_Pro pages=array(1) rootJobSignature=76d97e7bf1ad98b6ba40570536cb2558b9fc50da rootJobTimestamp=20140811054116 t=2 good
2014-08-14 06:05:30 htmlCacheUpdate Windows_7_Pro pages=array(1) rootJobSignature=6eda7dd7967f9cb9c2b12e2b73cfd21c84f71958 rootJobTimestamp=20140811054116 STARTING
2014-08-14 06:05:30 htmlCacheUpdate Windows_7_Pro pages=array(1) rootJobSignature=6eda7dd7967f9cb9c2b12e2b73cfd21c84f71958 rootJobTimestamp=20140811054116 t=0 good
2014-08-14 06:05:30 cirrusSearchLinksUpdatePrioritized Wien addedLinks=array(0) removedLinks=array(0) prioritize=1 STARTING
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Wien addedLinks=array(0) removedLinks=array(0) prioritize=1 t=1748 good
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Website addedLinks=array(0) removedLinks=array(0) prioritize=1 STARTING
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Website addedLinks=array(0) removedLinks=array(0) prioritize=1 t=344 good
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Wien addedLinks=array(0) removedLinks=array(0) prioritize=1 STARTING
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Wien addedLinks=array(0) removedLinks=array(0) prioritize=1 t=252 good
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Windows_7_Pro addedLinks=array(0) removedLinks=array(0) prioritize=1 STARTING
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Windows_7_Pro addedLinks=array(0) removedLinks=array(0) prioritize=1 t=158 good
2014-08-14 06:05:32 cirrusSearchLinksUpdatePrioritized Windows_7_Pro addedLinks=array(1) removedLinks=array(0) prioritize=1 STARTING
2014-08-14 06:05:33 cirrusSearchLinksUpdatePrioritized Windows_7_Pro addedLinks=array(1) removedLinks=array(0) prioritize=1 t=164 good
2014-08-14 06:05:33 cirrusSearchLinksUpdatePrioritized Kategorie:Wien addedLinks=array(0) removedLinks=array(0) prioritize=1 STARTING
2014-08-14 06:05:33 cirrusSearchLinksUpdatePrioritized Kategorie:Wien addedLinks=array(0) removedLinks=array(0) prioritize=1 t=241 good
2014-08-14 06:05:33 cirrusSearchLinksUpdateSecondary Windows_7_Pro addedLinks=array(1) removedLinks=array(0) STARTING
2014-08-14 06:05:33 cirrusSearchLinksUpdateSecondary Windows_7_Pro addedLinks=array(1) removedLinks=array(0) t=11 good

Those first few jobs aren't part of Cirrus - the htmlCacheUpdate job I mean. It really looks like jobs just aren't picking up and cirrus is the most noticeable thing.

I think you can get better logging out of jobs running with this:
$wgDebugLogGroups['runJobs'] = "some_absolute_path_to_a_log_file.log";

Does that file grow during normal browsing?

It might be worth cronning runJob.php every minute or something rather then digging into it.... I've never liked how we kick of jobs - its so breaky.

oh ok then.... thank you Nik!

I just wanted to make sure everythin g is working as is sould be and thanks to your help I saw this in the log:

2014-08-14 13:13:32 server21 wiki_23: Running 10 job(s) via '/wiki/index.php?title=Special%3ARunJobs&tasks=jobs&maxjobs=10&sigexpiry=1408022016&signature=ddc8289c816abe9ad80f61671de94a2fed0cbae8'
2014-08-14 13:13:32 kc-server21 wiki_23: Failed to start cron API: received 'HTTP/1.1 404 Not Found

OK, now I found the problem. Jobs were all having problems not just CirrusSearch.
This did the magic in LocalSettings.php

$wgRunJobsAsync = false;

Thank you Nik for pointing me in the right direction. Still strange that MW1.22 and MW1.23 have this "job" problem on some wikis...

Uhg..... I'm not a happy camper about the way jobs are run..... And we're (the foundation) moving further and further from what other folks will be doing: we're migrating to a permanently resident hhvm server. Its way faster but its can't be used at all without hhvm.

I'll mark the issue resolved though.

Could you put this "workaround" into the README? Am am very sure this will happen to more and more CirrusSearch users with MW1.22 and 1.23 and so on.

Good call. Reopening and assigning to myself to do just that.

gerritadmin wrote:

Change 154289 had a related patch set uploaded by Manybubbles:
Add quick and dirty job queue fix

https://gerrit.wikimedia.org/r/154289

gerritadmin wrote:

Change 154289 merged by jenkins-bot:
Add quick and dirty job queue fix to README

https://gerrit.wikimedia.org/r/154289

Sorry to bring this up again but in the new release of mediawiki 1.24.0 and the up to date version (master) of CirrusSearch one still has to insert this into LocalSettings.php for the jobque to work.

$wgRunJobsAsync = false;

The runJobs log otherwise outputs this:

wiki_test: Running 1 job(s) via '/wiki_test/index.php?title=Special%3ARunJobs&tasks=jobs&maxjobs=1&sigexpiry=1417696777&signature=ae2adc6151fd213f25d16a8d9d67219d1cbb4c31'
Failed to start cron API: received 'HTTP/1.1 404 Not Found
SmartK lowered the priority of this task from Unbreak Now! to Medium.
SmartK updated the task description. (Show Details)
SmartK set Security to None.

Sorry guys, I now know this is not a CirrusSearch problem but a mediawiki "job" problem described here:
https://www.mediawiki.org/wiki/Extension_talk:Replace_Text#MediaWiki_1.22_and_newer_do_not_execute_jobs

I _know_ its not a CirrusSearch problem but that still doesn't make it right....