Page MenuHomePhabricator

[OPS] gerrit & gitblit needs process monitoring in Icinga
Closed, ResolvedPublic

Description

gitblit is hosted on antinomy.wikimedia.org which only have the default checks: puppet freshness, NTP and SSH.

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=antimony

It would need a check that monitor whether gitblit is running.

templates/icinga/nrpe_local.cfg.erb has a bunch of examples if you look for 'java'. An example for Jenkins:

command[check_jenkins]=/usr/lib/nagios/plugins/check_procs -w 1:1 -c 1:1 --ereg-argument-array '^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war'

Which make sure there is one and only one java process with jenkins.war.


Version: wmf-deployment
Severity: enhancement

Details

Reference
bz51983

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:10 AM
bzimport added projects: Gerrit, acl*sre-team.
bzimport set Reference to bz51983.
bzimport added a subscriber: Unknown Object (MLST).

Widening summary, want this for Gerrit too.

Change 75777 had a related patch set uploaded by Demon:
Add icinga monitoring for Gerrit and Gitblit

https://gerrit.wikimedia.org/r/75777

Setting importance to High cause, well, gerrit ang gitblit have been needing a bit of hand holding lately.

Pinged Chad / Leslie by email to move this forward.

Unassigning from Chad. We need someone with Puppet and firewall rule writing expertise to finish this off.

Filled https://rt.wikimedia.org/Ticket/Display.html?id=6342 to apply the ferm system on the servers hosting Gerrit/Gitblit (antinomy and manganese) and enable monitoring ( https://gerrit.wikimedia.org/r/#/c/75777/ ).

Change 75777 merged by Akosiaris:
Add icinga monitoring for Gerrit and Gitblit

https://gerrit.wikimedia.org/r/75777

This is now in place, all checks look green. Should get appropriate alerts now when things go down badly :)

Thank you everyone!

Alexandros, Ariel and David Zahn have been very helpful adding the ferm firewall configuration. Hurrah!

Ideally we would validate the monitoring are working properly by shutting down gitblit and Gerrit and confirm warnings are issued. But I might be too meticulous.