Page MenuHomePhabricator

Icinga should notice people when /home partition on stat1002 fills up
Closed, DeclinedPublic

Description

It seems /home partition on stats1002 filled up between 2014-04-03 and 2014-04-04,
but no one was noticed by Icinga.

I noticed when going through cron-mail and seeing that on 2014-04-04 04:30,
one of my jobs failed with

No space left on device

for /home/qchris on stat1002.

I freed some GBs for now, but $SOME_SERVICE (Icinga?) should warn in time about
disks getting full.

Let's get $SOME_SERVICE to alert about disks getting full.


Version: unspecified
Severity: normal

Details

Reference
bz63522

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:13 AM
bzimport set Reference to bz63522.
bzimport added a subscriber: Unknown Object (MLST).

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1527

I think this was also me; not quite sure how. I was messing around with the sampled logs in my home directory, but I'm not sure how that'd correspond. I'm going to investigate now I'm conscious.

Evidently this is the week of Oliver Accidentally Revealing Oversight Issues With Our Cluster :D

(In reply to Oliver Keyes from comment #2)

I think this was also me

Hahaha.
Sorry to disappoint you again, but a "du" on /home showed that it was
not you :-D

But this bug is not about “Who filled up the disk”. Disks will always
get full. Analyses start small, and grow ... and grow ... and
grow. And then the disk is full. Meh.

Much rather, this bug is about “Why did no service warn about disks
getting full?”.

let's see if we can prioritize this in the next sprint.

Another issue where we need ops support.