Page MenuHomePhabricator

Labs: Ganglia down due to aggregator1 OOMing
Closed, DeclinedPublic

Description

Half the time the redirect from http://ganglia.wmflabs.org/ to http://ganglia.wmflabs.org/latest/ doesn't work.

And when it does http://ganglia.wmflabs.org/latest/ doesn't respond. It either times out or I get the browser's built-in error for:

Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.

Also the one time I did get it to load and search for something I got 404 Page Not Found (http://ganglia.wmflabs.org/search/).


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63362

Details

Reference
bz46475

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:41 AM
bzimport added a project: Cloud-VPS.
bzimport set Reference to bz46475.

aggregator1 instance is apparently too small to properly run everything now. It's OOMing. I'll need to resize it. I've been working on resize support in openstack, but it isn't terribly reliable. Maybe I'll guinea pig this instance. If it doesn't work, I'll rebuild it and tell all the instances to update their IP for aggregation.

Use graphite.wmflabs.org andhttp://tools.wmflabs.org/nagf instead.