Page MenuHomePhabricator

[gdash] "(cdn) HTTP Error Rate" would use log scale for 5xx errors
Closed, ResolvedPublic

Description

Currently it's almost impossible to detect trends as 5xx errors are buried at the feet of the peaks of gazillions 500 errors.
Perhaps it should also have a month-old report but that's less important.
(I wanted to check the amount of 504 errors in the last few days/weeks.)


Version: unspecified
Severity: enhancement
URL: https://gdash.wikimedia.org/dashboards/reqerror/
See Also:
https://launchpad.net/bugs/925635

Details

Reference
bz41754

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:11 AM
bzimport set Reference to bz41754.

Created attachment 13762
Example (not representative) of reqerror graph

Note, today's graph is not representative because packet loss between esams and eqiad caused a huge peak of 503 errors.

Something hints me that I should try this myself. :)

17.58 < ori-l> yes, you can submit a patch yourself, let me point you to the right file
18.00 < ori-l> Nemo_bis:
http://git.wikimedia.org/tree/operations%2Fpuppet.git/ca6fe4efc30c6a4b2606b13aab178b9e71914dca/files%2Fgraphite%2Fgdash%2Fdashboards%2Freqerror
18.01 < ori-l> the DSL gdash uses to describe graphs is at https://github.com/ripienaar/graphite-graph-dsl/wiki

Attached:

reqerror-2013-11.png (500×1 px, 49 KB)

Change 95064 had a related patch set uploaded by Nemo bis:
Use log scale for 5xx errors in "(cdn) HTTP Error Rate"

https://gerrit.wikimedia.org/r/95064

Change 95068 had a related patch set uploaded by Nemo bis:
Also add 2 months and 1 year graphs in "(cdn) HTTP Error Rate"

https://gerrit.wikimedia.org/r/95068

Change 95064 merged by Ori.livneh:
Use log scale for 5xx errors in "(cdn) HTTP Error Rate"

https://gerrit.wikimedia.org/r/95064

Change 95068 merged by Ori.livneh:
Also add 2 months and 1 year graphs in "(cdn) HTTP Error Rate"

https://gerrit.wikimedia.org/r/95068

Apart from the wrong log scale setting (which needs to be per-graph), I probably also have to fix the x axis scale, because it seems with the current one it needs an extremely long image to actually show the data.

Change 101065 had a related patch set uploaded by Nemo bis:
Make logscale in reqerror graphs actually work

https://gerrit.wikimedia.org/r/101065

Change 101065 merged by Ori.livneh:
Make logscale in reqerror graphs actually work

https://gerrit.wikimedia.org/r/101065

Change 105614 had a related patch set uploaded by Nemo bis:
Also logbase 2 for the shorter reqerror graphs

https://gerrit.wikimedia.org/r/105614

Change 105614 merged by Ori.livneh:
Also logbase 2 for the shorter reqerror graphs

https://gerrit.wikimedia.org/r/105614

(In reply to comment #10)

Change 105614 merged by Ori.livneh:
Also logbase 2 for the shorter reqerror graphs

https://gerrit.wikimedia.org/r/105614

That worked for a bit, but then regressed again. Will need to check.

Change 117021 had a related patch set uploaded by Nemo bis:
[gdash] Use logscale 10 for reqerror graph, again

https://gerrit.wikimedia.org/r/117021

I still have no idea where to start to find a way that works...

fgiunchedi claimed this task.

spurious 500s have been fixed by https://phabricator.wikimedia.org/T88412 so it looks much more reasonable now, closing