Silence the qacct transfer jobs and monitor them with Icinga instead
Closed, InvalidPublic
Actions

Assigned To

Authored By

	scfc
	Jul 2 2013, 3:13 PM

Description

During the NFS outage, the qacct transfer jobs pestered the roots' mailboxes every five minutes. Though such an outage of course will never ever happen again :-), it sucked nonetheless.

The transfer job is a service and if we would monitor it as one, we would get better behaviour as well: A nice green or red icon on a web dashboard, and only one (or none?) ping by mail when the status *changes*.

So we should set up Icinga monitoring for that:

a) The transfer job directs all stdout/stderr to a file, saves its exit code in another and periodically these files are queried by Icinga.

b) The transfer job passes its output and exit code directly to an Icinga sentinel that passes it somewhere up the chain.

Whether a) or b) are preferable (or possible for that matter), I haven't figured out yet, but this bug will track the progress on that.

Version: unspecified
Severity: enhancement

Details

Reference: bz50585

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Invalid		scfc	T52585 Silence the qacct transfer jobs and monitor them with Icinga instead
		Resolved		RyanLane	T54560 icinga.wmflabs.org is down: "Error: Could not read host and service status information!"

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 22 2014, 2:02 AM

• bzimport added a project: Toolforge.

• bzimport set Reference to bz50585.

scfc created this task.Jul 2 2013, 3:13 PM

http://blog.endpoint.com/2012/04/monitoring-cronjob-exit-codes-with.html has an example how to monitor cron jobs.

The qacct transfer cron job has been replaced with symlinks that make copying the accounting file unnecessary (cf. Gerrit change #114950 and Gerrit change #118120).

Silence the qacct transfer jobs and monitor them with Icinga insteadClosed, InvalidPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Silence the qacct transfer jobs and monitor them with Icinga instead
Closed, InvalidPublic
Actions

Related Objects
Search...