Page MenuHomePhabricator

JobQueue does not respect --maxtime
Closed, ResolvedPublic

Description

translatewiki.net suffered degraded service today after changes to Template:Identical caused a huge amount of jobs generated.

The issue was that I had 10+ parallel runJobs.php running for many minutes. First I thought that --exclusive should have prevented this, but on further inspection that never stayed in: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/73198

Then my second thought was that --maxtime=50 should prevent this, but it didn't. It looks like $sTime is correctly set at the beginning, but then it is overwritten again for each job, which means it actually functions like "stop if the last job took longer than maxtime":
// Run the job...
wfProfileIn( METHOD . '-' . get_class( $job ) );
$sTime = microtime( true );

My CRON entry is:

  • * * * * betawiki nice php /www/translatewiki.net/w/maintenance/runJobs.php --exclusive --maxtime=50 --procs=1 --memory-limit=250M >> /www/translatewiki.net/w/logs/jobqueue 2> /dev/null

For now I have disabled the CRON entry and running jobqueue manually until it drains.


Version: 1.24rc
Severity: major

Details

Reference
bz71073

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:58 AM
bzimport set Reference to bz71073.
bzimport added a subscriber: Unknown Object (MLST).

Change 161618 had a related patch set uploaded by Aaron Schulz:
Fixed --maxtime handling by JobRunner

https://gerrit.wikimedia.org/r/161618

Change 161621 had a related patch set uploaded by Aaron Schulz:
Fixed --maxtime handling by JobRunner

https://gerrit.wikimedia.org/r/161621

Change 161618 had a related patch set uploaded by Nikerabbit:
Fixed --maxtime handling by JobRunner

https://gerrit.wikimedia.org/r/161618

Change 161618 merged by jenkins-bot:
Fixed --maxtime handling by JobRunner

https://gerrit.wikimedia.org/r/161618

Change 161626 had a related patch set uploaded by Legoktm:
Fixed --maxtime handling by JobRunner

https://gerrit.wikimedia.org/r/161626

Marking as fixed, Niklas said it worked on twn. I backported it to REL1_24.

Change 161626 merged by jenkins-bot:
Fixed --maxtime handling by JobRunner

https://gerrit.wikimedia.org/r/161626

Change 161621 merged by jenkins-bot:
Fixed --maxtime handling by JobRunner

https://gerrit.wikimedia.org/r/161621