Page MenuHomePhabricator

Job queue refreshLinks2 duplicate removal
Closed, ResolvedPublic

Description

The introduction of refreshLinks2 in r40741, while certainly useful, broke the duplicate removal code in Job::pop(). As long as the page ID partitions are precisely the same, it works. But for templates which are very heavily used, it's normal for pages to start or stop using the given template in between closely-spaced template edits. This means that the partitioning changes, so job_params is not the same, so duplicate removal is not done.

This has been observed to cause a lot of extra work for the Wikimedia job runners. Although English Wikipedia administrators are aware of the performance effects of editing heavily-used templates, they often make mistakes and end up doing several edits in a row.

Duplicate removal of this kind should be a design requirement for a rewritten job queue system. While we are waiting for that, a maintenance script which traverses the job table and removes duplicate jobs would be a useful stopgap measure.


Version: 1.18.x
Severity: enhancement

Details

Reference
bz27914

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:23 PM
bzimport set Reference to bz27914.

Merged gerrit 32488 links here, bug maybe resolved

Aaron: Is this fixed by your patch in Gerrit change #32488 ?