Page MenuHomePhabricator

Special:RunJobs recursively calling Special:RunJobs again (?)
Closed, ResolvedPublic

Description

We had a problem where the WMES wiki was painfully slow. The access log showing thousands of Special%3ARunJobs entries (NOT one per request).

$ grep -c Special%3ARunJobs access.log.2014-06-03
63116
$ grep -c -v Special%3ARunJobs access.log.2014-06-03
472

(that would be 133 RunJobs per request‼)

Changing $wgJobRunRate to 0 from the default of 1 “fixed” the slowness.

From the external appearance, it looks like each Special:RunJobs then called another Special:RunJobs with an infinite recursion until the time limit.

There are 4 jobs with old dates in the queue, but even if they are not completable, they shouldn't have produced so many POSTs.

At the point $wgJobRunRate was disabled, the wiki was running 00f91912c64 (master), but it had been ‘broken’ for weeks.


Version: 1.24rc
Severity: normal

Details

Reference
bz66225

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:19 AM
bzimport set Reference to bz66225.
bzimport added a subscriber: Unknown Object (MLST).

Maybe the jobs where really slow and the run rate of 1 kept letting them build up. They could really some sort of bucketing scheme to set a configurable limit on the number of concurrent requests somehow...

Same here with v1.23.0: for every request, the following gets logged into access.log:

::ffff:11.22.33.44 - - [18/Jun/2014:09:58:14 +0200] "POST /mediawiki/index.php?title=Special%3ARunJobs&tasks=jobs&maxjobs=1&sigexpiry=1403078299&signature=6da0cc79966609c2b51937b9175d7728b290ff0a HTTP/1.1" 411 357 "-" "-"

192.168.0.10 www.example.org - [18/Jun/2014:09:58:23 +0200] "GET /wiki/Main_Page HTTP/1.1" 200 18203 "-" "curl/7.37.0"

Setting wgJobRunRate to "0" seems to "fix" this. I don't understand what "jobs" are now not being run, but if it has any negative impact I shall change it to "0.01" or something to lessen the POST requests.

Aklapper lowered the priority of this task from Medium to Lowest.Apr 9 2015, 12:35 PM

Does this still happen in 1.27?

I had the $wgJobRunRate=0 setting active ever since. I've just upgraded to 1.27.1 and have unset it now -- so far, no Special:RunJobs&tasks... calls were logged in access.log.

Krinkle changed the task status from Open to Stalled.Jul 4 2017, 1:57 AM
Krinkle raised the priority of this task from Lowest to Low.
Krinkle subscribed.

@Platonides @Evilninja The job runner in triggering of it post-shutdown for web requests has changed quite a bit over the past year. Can you try again and let us know if this is still an issue? If not, I propose to close this issue for now.

Evilninja claimed this task.

I'm at MW 1.28.1 now without $wgJobRunRate set and no Special:RunJobs&tasks were logged to the access log in ages. Also, the initial sympton described here ("wiki was painfully slow") is long gone. Closing sounds like the sensible thing to do.