Easy way to define alerts for ganglia data
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	• GWicke
	Dec 2 2013, 8:43 PM

Description

We collect a lot of useful information in ganglia, but currently can't set up alerts based on this information. This means that we often discover issues only after user reports as manual checking does not scale.

Version: wmf-deployment
Severity: normal
See Also:
https://rt.wikimedia.org/Ticket/Display.html?id=6955

Details

Reference: bz57882

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Declined		Arlolra	T59265 Set up better monitoring / alerts for Parsoid varnishes
		Invalid		None	T59882 Easy way to define alerts for ganglia data

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 2:39 AM

• bzimport added projects: WMF-General-or-Unknown, acl*sre-team.

• bzimport set Reference to bz57882.

• bzimport added a subscriber: Unknown Object (MLST).

• GWicke created this task.Dec 2 2013, 8:43 PM

Giuseppe owns the corresponding RT ticket, but no news there.

We have check_graphite now, which is used heavily for labs. I'm improving that constantly, so I'm tempted to consider this done.

Ganglia, I'm not sure at all, however. Let me re-title this to just be ganglia.

yuvipanda renamed this task from Easy way to define alerts for ganglia and graphite data to Easy way to define alerts for ganglia data.Nov 24 2014, 12:49 PM

yuvipanda set Security to None.

• Gage added a project: observability.Dec 19 2014, 6:28 PM

drive-by comment: long term we'd like to have checks based only on graphite and let ganglia be only for aggregates / long term views (or no ganglia at all even)

check_ganglia exists I believe but is not really advisable, check_graphite is used extensively in labs, and some in prod. I am going to close this as not actionable. No further efforts are really being put into ganglia. I think the spirit could continue but is probably included in Grafana or observability work.

@chasemp, are we going to switch the default node monitoring (cpu, memory, network, disk space, IO etc) to graphite any time soon? If not, then I'd propose to keep this ticket open until we have a reasonably easy way to set up alerts on such information.

sure -- I think that's the plan but @fgiunchedi could provide more details, but I have no problem with that

• chasemp lowered the priority of this task from High to Medium.Jan 6 2015, 11:37 PM

In T59882#958528, @GWicke wrote:

@chasemp, are we going to switch the default node monitoring (cpu, memory, network, disk space, IO etc) to graphite any time soon? If not, then I'd propose to keep this ticket open until we have a reasonably easy way to set up alerts on such information.

just a note those things are monitored now I believe but are not currently alerted on from graphite data. But the data should exist.

• chasemp removed • chasemp as the assignee of this task.Jan 7 2015, 10:49 PM

• chasemp unsubscribed.

In T59882#958530, @chasemp wrote:

sure -- I think that's the plan but @fgiunchedi could provide more details, but I have no problem with that

@fgiunchedi: Is this still a valid task, 22 months after setting this to stalled status?

Aklapper removed a subscriber: • chasemp.Nov 3 2016, 1:12 PM

@Aklapper no, we're deprecating ganglia so this is invalid now

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 7:01 PM

Easy way to define alerts for ganglia dataClosed, InvalidPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Easy way to define alerts for ganglia data
Closed, InvalidPublic
Actions

Related Objects
Search...