Page MenuHomePhabricator

Set up site service status/uptime/downtime dashboard similar to Google Apps status page
Closed, ResolvedPublic

Description

This has been on the ops todo list for a while but we haven't quite gotten to it.

We want something with the following characteristics:

  • Lightweight, easy to load
  • Probably a static file that's updated periodically
  • Will show current and recent past state with automatic downtime detection of key services
  • Ability for admins to easily add annotations for particular events

We can then link this prominently from our error messages (replacing the old "go check in IRC" links), from the tech blog, etc.

Might want to have it hosted separately from primary sites, but if so we need to make sure it'll handle the traffic in a downtime. ;) Hosting on a standalone server within Tampa, with a backup in Amsterdam, would probably be an acceptable compromise for now.


Version: unspecified
Severity: enhancement
URL: http://www.google.com/appsstatus#hl=en

Details

Reference
bz20083

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:51 PM
bzimport set Reference to bz20083.
bzimport added a subscriber: Unknown Object (MLST).

Note -- pulling in our wikitech twitter/identi.ca data stream might be fun. ;) That'll show a continuous log of our admin activity notes, which might be handy during an event when we're doing our as-we-go notes but might not remember to make human-targeted updates all the time.

Let's also learn from the tool server community about their use of

http://status.toolserver.org/

to keep the interface simple and quick.

mike.lifeguard+bugs wrote:

Do you care about keeping old data like the Google dashboard does?

How would you want it to be different from

(In reply to comment #1)

Note -- pulling in our wikitech twitter/identi.ca data stream might be fun. ;)
That'll show a continuous log of our admin activity notes, which might be handy
during an event when we're doing our as-we-go notes but might not remember to
make human-targeted updates all the time.

I'm not sure how to pull those - but I can listen in IRC (already doing that for http://toolserver.org/~lifeguard/docs/statusbot ).

How would you want to do human-targeted updates? Admin interface on the status site, command in shell...?

http://status.wikimedia.org/ is currently serving an experimental prototype that would likely resolve this bug.

(In reply to comment #4)

http://status.wikimedia.org/ is currently serving an experimental prototype
that would likely resolve this bug.

That was a paternship announced for this, so FIXED.