Page MenuHomePhabricator

Story: Event Logging writing throughput needs monitoring
Closed, ResolvedPublic

Description

Event Logging writing volume needs monitoring.

We had an incident last Friday where volumen of EL increased dramatically:

https://ganglia.wikimedia.org/latest/graph.php?r=month&z=xlarge&c=Miscellaneous+eqiad&h=vanadium.eqiad.wmnet&jr=&js=&v=391444071&m=eventlogging_all-events&vl=events&ti=all-events

We should have an alert in process for this. We even have a handy script that could pipe its results to the alert (thanks to ori):

https://gist.github.com/atdt/8deed4bc2d311ba0122f#file-el-status-py

Output of script is as follows:

<output>
Listening for 60 seconds...
UniversalLanguageSelector-tofu 64.30% (72.80/sec)
PageContentSaveComplete 9.92% (11.23/sec)
NavigationTiming 5.14% (5.82/sec)
TrackedPageContentSaveComplete 3.86% (4.37/sec)

</output>


Version: unspecified
Severity: enhancement
Whiteboard: u=nuria@wikimedia.org c=EventLogging p=5 s=2014-05-29

Details

Reference
bz65482

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:22 AM
bzimport set Reference to bz65482.
bzimport added a subscriber: Unknown Object (MLST).

Change 137280 had a related patch set uploaded by Nuria:
[WIP] monitoring: monitor eventlogging thresholds

https://gerrit.wikimedia.org/r/137280

Change 137280 merged by Giuseppe Lavagetto:
Monitoring: monitor eventlogging thresholds

https://gerrit.wikimedia.org/r/137280