Page MenuHomePhabricator

GWToolset Jobs are not properly picked-up by runJobsLoopService
Closed, ResolvedPublic

Description

expected behavior

when a gwtoolset batch job is created the runner should pick it up and run the gwtoolset* jobs.

actual behavior

• the MetadataJob goes into the queue.
• the MetadataJob is never picked up
• if another MetadatJob is added before approximately 5 minutes that one is also added to the queue, but not picked up
• if another MetadataJob is added to the queue after approximately 5 minutes, the MetadataJobs waiting in the queue are picked up, run and their MediaFileJobs and CleanUp jobs are run. the new MetadataJob waits in the queue and is not picked up until yet another gwtoolset MetadataJob is created.

additional research by hashar

the beta cluster shows gwtoolset* jobs pending:
$ mwscript showJobs.php --wiki=commonswiki --group
cirrusSearchUpdatePages: 0 queued; 7 claimed (0 active, 7 abandoned)
gwtoolsetGWTFileBackendCleanupJob: 0 queued; 1 claimed (1 active, 0 abandoned)
gwtoolsetUploadMediafileJob: 0 queued; 5 claimed (0 active, 5 abandoned)
gwtoolsetUploadMetadataJob: 1 queued; 0 claimed (0 active, 0 abandoned)
webVideoTranscode: 4 queued; 0 claimed (0 active, 0 abandoned)
$

in this case gwtoolsetUploadMetadataJob, however, nextJobDB.php does not find it.

it looks like this is because nextJobDB.php calls JobQueueAggregator::singleton()->getAllReadyWikiQueues(), which does not list gwtoolset jobs.


Version: unspecified
Severity: normal
URL: http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset

Details

Reference
bz58692

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:36 AM
bzimport set Reference to bz58692.
bzimport added a subscriber: Unknown Object (MLST).

Change 102675 had a related patch set uploaded by Dan-nl:
gwtoolset-runners

https://gerrit.wikimedia.org/r/102675

Change 102675 abandoned by Dan-nl:
gwtoolset-runners

Reason:
per aaron’s confirmation, having the jobs listed in one runJobsLoopService statement is fine. we still need to resolve the issue of the job runner not picking up the gwtoolset jobs.

https://gerrit.wikimedia.org/r/102675

Following a conversation with Aaron, that is apparently fixed by https://gerrit.wikimedia.org/r/#/c/102749/ Make executeReadyPeriodicTasks() notify the aggregator when jobs are released/recycled

(In reply to comment #3)

Following a conversation with Aaron, that is apparently fixed by
https://gerrit.wikimedia.org/r/#/c/102749/ Make executeReadyPeriodicTasks()
notify the aggregator when jobs are released/recycled

I couldn't really reproduce problems like this report. That was just some related fix of something I noticed while looking at this.

Created attachment 14144
rijks-10-items

ssh-wikilabs

  1. ssh deployment-bastion.pmtpa.wmflabs
  2. tail -f /data/project/logs/runJobs.log | grep 'gwtoolset'

gwtoolset form

  1. go to http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=Special:GWToolset&gwtoolset-form=metadata-detect
  2. select mediawiki template artwork
  3. place, Metadata Mappings/Dan-nl/Rijksmuseum.json, in the metadata mapping field.
  4. use this attached xml file for the metadata file upload.
  5. click the submit button
  6. add a summary if you wish
  7. click preview batch
  8. click process batch

Attached:

Change 102955 had a related patch set uploaded by Dan-nl:
removing initial delay

https://gerrit.wikimedia.org/r/102955

Change 102955 merged by jenkins-bot:
removing initial delay

https://gerrit.wikimedia.org/r/102955

• at Fri, 20 Dec 2013 10:31:18 GMT i set-up a gwtoolset bath job. it never ran.

• at 2013-12-20 16:04:41 we uploaded a test patch to see if removing the

initial MetadataJob delay would help.

• at 2013-12-20 16:09:38 added another gwtoolset batch job. that one kicked

off the earlier batch job plus itself.

• at 2013-12-20 16:16:04 added another gwtoolset batch job that had a throttle

of 3 so that it would be forced to create another MetadataJob with a delay.
the initial MetadataJob ran and ran the first 3 MediaFileJobs.

there is another MetadataJob in the queue that can now be seen with 
mwscript showJobs.php --wiki=commonswiki --group, but it's never picked up.

so the question now is why does the puppet runner script not pick up the
delayed job?

Could it be that JobQueueAggregator::singleton()->getAllReadyWikiQueues() used by nextJobDB.php does not returns delayed jobs?

https://gerrit.wikimedia.org/r/#/c/103524/ and possibly other commits

seem to have resolved the job runner issue; i was able to successfully
test a small dataset on beta.

• will wait for david haskiya to try a test dataset on production before

closing this bug.

(In reply to dan from comment #10)

• will wait for david haskiya to try a test dataset on production before

closing this bug.

dan: Any way to follow up on this?

• david was able to run an initial upload on productions, so i think

can consider this ticket closed.

http://commons.wikimedia.org/wiki/Category:GWToolset_Batch_Upload