Page MenuHomePhabricator

[ops] Monitor that LVS config and mw_install are in sync
Closed, InvalidPublic

Description

Author: jeluf

Description:
In order to prevent out-of-date servers delivering corrupted pages, provide a monitoring script that checks whether the servers in the LVS config are also part of mw_install.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=66050

Details

Reference
bz23662

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:12 PM
bzimport set Reference to bz23662.
bzimport added a subscriber: Unknown Object (MLST).

there's another related problem - out of date servers have outdated /tmp/mw-cache/conf* files - as they don't get rebuilt.

this happens when:

  1. server goes down
  2. new configuration is synced
  3. server goes up
  4. server builds cache based on old configuration
  5. server gets synced new configuration
  6. as new configuration timestamps are older than old cache, there's a desync

there would be multiple possible fixes, such as 'sync-common' cleaning /tmp/mw-cache/*, but detecting this somehow is important too...

JeLuF / Domas: Is this still an issue?

Mark: Asking you as JeLuF and Domas are not around anymore: Is this still valid or is this obsolete nowadays?

Mark: Asking you as JeLuF and Domas are not around anymore: Is this still valid
or is this obsolete nowadays?

That is still valid. I heard very recently of an application server being pooled back with a wrong wmf version :-/ This bug probably needs to be filled in RT to attract ops attention.

fgiunchedi subscribed.

we have icinga checks for hosts missing in dsh groups for scap, plus pybal talking to etcd, conftool, and all the rest