Page MenuHomePhabricator

WMFLabs: Ganglia down / needs reinstall
Closed, DeclinedPublic

Description

the ganglia installation at http://ganglia.wmflabs.org/ seems to be missing
I see there only a single http://ganglia.wmflabs.org/latest/conf.php nothing more


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=46475

Details

Reference
bz63362

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:03 AM
bzimport added a project: Cloud-VPS.
bzimport set Reference to bz63362.

Indeed. It broke down after migration to eqiad. I think andrewbogott did some work on it to restore it, but it went from a non-responding server to a server with an empty directory listing with conf.php. Not sure what's going on.

I am volunteering for this. Bryan "bd808" Davis seems interested and Andrew Bogott already proposed to review changes.

I have did some config tweaks with: https://gerrit.wikimedia.org/r/#/c/122790/

Figured out how to install the PHP material (we need to git clone to /usr/share/ganglia-webfrontend because the Debian package we ship does not have any PHP material).

Change 123040 had a related patch set uploaded by Hashar:
ganglia: fix some missing paths for labs

https://gerrit.wikimedia.org/r/123040

Change 123040 merged by Andrew Bogott:
ganglia: fix some missing paths for labs

https://gerrit.wikimedia.org/r/123040

Change 123044 had a related patch set uploaded by Hashar:
ganglia: graphdir must be an absolute path

https://gerrit.wikimedia.org/r/123044

Change 123044 merged by Andrew Bogott:
ganglia: graphdir must be an absolute path

https://gerrit.wikimedia.org/r/123044

We need a strong instance in labs. Ganglia is a bit cpu heavy. Will attempt to get the instance resized else will rebuild it from scratch again =)

Will poke at it again next week with Andrew Boggot. I would like us to attempt to resize the instance to a bigger profile (1 cpu -> 4 cpu). Else we will spawn a new instance and update the configuration to point all gmond to the new IP.

Seems to be down still (or again).

There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection refused

(In reply to Antoine "hashar" Musso from comment #9)

Will poke at it again next week with Andrew Boggot. I would like us to
attempt to resize the instance to a bigger profile (1 cpu -> 4 cpu). Else
we will spawn a new instance and update the configuration to point all gmond
to the new IP.

(That was from April 11th, 2014)

Andrew: Can we get some help fixing this Ganglia issue on wmflabs? Maybe Coren or Yuvi?

Ganglia is dead, long live Graphite.

We had a working graphite.wmflabs.org instance for a while, but the same problems that ganglia ran into we ran into with graphite.

So we have provisioned a 'real' machine (labmon1001) that will collect the stats. Graphite and txstatsd are now provisioned on that machine, and I'm awaiting some network config to be completed (RT #8163) before I can turn on stats collection.

After that I'll have to write some way of autogenerating a nomninal set of graphs in a graphite like way by default for all machines (See http://tools.wmflabs.org/giraffe/index.html#dashboard=ToolLabs+Basics&timeFrame=1h for a prototype for toollabs only).

For history purposes:

Ganglia on labs dies because it is on a small instance. Andrew attempted a resize via nova but that definitely does not work.

Since:

  • ganglia is not fully puppetized
  • changing the IP is not straight forward (update all manifest, make sure puppet run on all instance)

I happily gave up to focus ™ on over things.

Yuvi essentially took over as he explained and that includes dishing Ganglia with some real hardware and diamond -> graphite.

Yuvi: +1 on Giraffe :-)