Page MenuHomePhabricator

[OPS] Add disk I/O to ganglia reports
Closed, ResolvedPublic

Description

Ganglia on wmflabs is missing disk I/O reporting. The reason we want them, is to be able to tell which instance is doing heavy I/O activities which might be kill GlusterFS (see bug 36993).

There is a Gmetric plugin which we might want to use. Based on /proc/diskstats

https://github.com/ganglia/gmetric/blob/master/disk/diskio.pl/ganglia_disk_stats.pl

We used to have a homegrown ganglia-metrics debian package in /trunk/ganglia_metrics, it is probably obsolete nowadays. Anyway, there was a python script there:

http://svn.wikimedia.org/viewvc/mediawiki/trunk/ganglia_metrics/DiskStats.py?view=markup&pathrev=69278

OR, maybe Ganglia already provides the metrics and it is all about enabling them?


Version: unspecified
Severity: enhancement

Details

Reference
bz36994

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:25 AM
bzimport set Reference to bz36994.

Ryan: Do you plan to work on this (as you're set as assignee)?

I'm the default assignee. I added this for anyone to work on.

Change 85669 had a related patch set uploaded by Hashar:
ganglia wrapper for py plugins (and add diskstat plugin)

https://gerrit.wikimedia.org/r/85669

I wrote a puppet patch which is now pending review/merge by ops.

Change 91351 had a related patch set uploaded by Hashar:
ganglia: diskstat.py plugin

https://gerrit.wikimedia.org/r/91351

Change 91352 had a related patch set uploaded by Hashar:
contint: monitor CI server diskstats in Ganglia

https://gerrit.wikimedia.org/r/91352

Change 91351 merged by Ori.livneh:
ganglia: diskstat.py plugin

https://gerrit.wikimedia.org/r/91351

Change 91352 had a related patch set uploaded by Ori.livneh:
contint: monitor CI server diskstats in Ganglia

https://gerrit.wikimedia.org/r/91352

Change 91352 merged by Ori.livneh:
contint: monitor CI server diskstats in Ganglia

https://gerrit.wikimedia.org/r/91352

We got disk stats on the production continuous integration server (gallium and lanthanum). That was the purpose of this bug and it got solved by the changes above.