Page MenuHomePhabricator

Should be able to run some jobs from an XML dump at the same time
Closed, DeclinedPublic

Description

Until now all jobs from a dump have been run in sequence; first dump each table, then dump the abstracts, then dump the stubs, etc. None of the table dumps depend on each other, the abstracts could be run independently, etc. The dump script should be able to run any or all of these jobs at the same time, logging the result appropriately.


Version: unspecified
Severity: enhancement

Details

Reference
bz27123

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:23 PM
bzimport set Reference to bz27123.

Do the snapshot hosts currently have enough horsepower to manage further parallelisation?

No they don't, because we're now being cleverer about it. But there are other occasions this can be useful. It would need to be a 2-step procedure, 1) run the parallel jobs, with no locks, writing status and output to separate files per job, 2) update state md5 and the rest, tossing the job-specific logs. I've often wanted to rerun one broken dump step while another is in progress; this would be the way to make that happen.

Declining, pretty much no new features unless it's something we can directly steal for the dumps rewrite. The entire way jobs are scheduled and managed will change, as well as hash summing of files.