Page MenuHomePhabricator

Silence the qacct transfer jobs and monitor them with Icinga instead
Closed, InvalidPublic

Description

During the NFS outage, the qacct transfer jobs pestered the roots' mailboxes every five minutes. Though such an outage of course will never ever happen again :-), it sucked nonetheless.

The transfer job is a service and if we would monitor it as one, we would get better behaviour as well: A nice green or red icon on a web dashboard, and only one (or none?) ping by mail when the status *changes*.

So we should set up Icinga monitoring for that:

a) The transfer job directs all stdout/stderr to a file, saves its exit code in another and periodically these files are queried by Icinga.

b) The transfer job passes its output and exit code directly to an Icinga sentinel that passes it somewhere up the chain.

Whether a) or b) are preferable (or possible for that matter), I haven't figured out yet, but this bug will track the progress on that.


Version: unspecified
Severity: enhancement

Details

Reference
bz50585

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:02 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz50585.

The qacct transfer cron job has been replaced with symlinks that make copying the accounting file unnecessary (cf. Gerrit change #114950 and Gerrit change #118120).