Page MenuHomePhabricator

ganglia.wmflabs.org unreacheable
Closed, DeclinedPublic

Description

The Ganglia instance inLabs is no more responding: http://ganglia.wmflabs.org/ It is a collector for all labs instances.


Version: unspecified
Severity: normal

Details

Reference
bz62729

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:56 AM
bzimport set Reference to bz62729.
bzimport added a subscriber: Unknown Object (MLST).

That is the whole DNS system which is dead apparently :-] Closing this bug.

Dzahn subscribed.

recycling ticket to report ganglia.wmflabs.org is down again.

http://ganglia.wmflabs.org/

502 Bad Gateway
nginx/1.5.0

yuvipanda claimed this task.
yuvipanda subscribed.

Closing because we don't use ganglia at all on labs. Use graphite.wmflabs.org instead.

We should kill the Domain too.

Closing because we don't use ganglia at all on labs. Use graphite.wmflabs.org instead.

We should kill the Domain too.

Cf. T85318#1054268.

hashar changed the task status from Invalid to Declined.Apr 3 2015, 2:55 PM

Ganglia has been phased out of labs, please keep this Task closed. If there is some more cleanup work to handle just fill sub tasks :) Thx!

Just like with nagios.wmflabs, these monitoring projects in labs can have 2 different purposes.

One would be something that monitors labs instances and services on them. I understand we don't use Nagios/Icinga or Ganglia for this purpose anymore, got it.

But the other purpose, and, imho the original idea when labs was started, is that they would be staging for production, where we actually apply the same puppet roles and can test changes before making them in production. Unfortunately we never got to a state where we had puppetized instances using the same roles.

Since we still use ganglia and icinga in production, deleting everything related to them also means giving up on having that. Just my 2 cents though.

The problem with Icinga is that it depends on exported Puppet resources, and in Labs where these can be influenced by the project admins, this may lead to taking over the puppet master. IIRC this is a rock-solid problem, with not even an idea how to solve it (the problem not being "a user may take over the puppet master", but "we have audited all x million lines of code in our repository and can rule out that any user can take over the puppet master, and we have measures in place that no such code will ever be merged in the future").

Regarding Ganglia, my understanding is that production plans to move away from Ganglia as well and that Labs in that aspect paves the way.