Page MenuHomePhabricator

CI: mw-api-siteinfo.py should more clearly report site failure
Closed, ResolvedPublic

Description

When betalabs was completely down with all page and API requests giving a 404, browser tests failed, bug 70648. That's obviously expected, but the console log for failing tests ends with mw-api-siteinfo.py reporting an opaque json parsing error (below) which took developers time to translate into "site was down". The console log should contain

"ERROR 404 response from http://en.wikipedia.beta.wmflabs.org/w/api.php , aggghhh danger!"

The fix in jenkins.git bin/mw-api-siteinfo.py is to check the response object

response = requests.get(mw_api_url, params=API_QUERY)

for validity/errors and die with an informative failure before going on to

siteinfo = json.loads(response.content)

According to http://docs.python-requests.org/en/latest/user/quickstart/#response-status-codes , just adding

response.raise_for_status()

causes an exception4 when the API request fails.

From bug 70648, console log for the failed job
https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/36/console
contained:

03:37:02 + GEM_HOME=/mnt/jenkins-workspace/workspace/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/../gems
03:37:02 ++ /srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py http://en.wikipedia.beta.wmflabs.org/w/api.php git_branch
03:37:02 Traceback (most recent call last):
03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 90, in <module>
03:37:02 main()
03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 78, in main
03:37:02 siteinfo = json.loads(response.content)
03:37:02 File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
03:37:02 return _default_decoder.decode(s)
03:37:02 File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
03:37:02 raise ValueError(errmsg("Extra data", s, end, len(s)))
03:37:02 ValueError: Extra data: line 1 column 4 - line 1 column 18 (char 4 - 18)


Version: unspecified
Severity: normal

Details

Reference
bz70695

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:51 AM
bzimport set Reference to bz70695.

It would be nice if the script would have helpful error messages, but I do not think this script's job is to check if beta is down. We should have a separate job that checks that.

(In reply to Željko Filipin from comment #1)

I do not think this script's job is to check if beta is down.

I agree, but it should die when its request fails, not later when it json.loads() the bad response. I think my proposed 1-line fix will do this.

(In reply to spage from comment #2)

I agree, but it should die when its request fails, not later when it
json.loads() the bad response. I think my proposed 1-line fix will do this.

+1

Watch out, we have python requests 0.8.2 which might not have response.raise_for_status()

gerritadmin wrote:

Change 160616 had a related patch set uploaded by Hashar:
mw-api-siteinfo: raise on http error

https://gerrit.wikimedia.org/r/160616

gerritadmin wrote:

Change 160616 merged by jenkins-bot:
mw-api-siteinfo: raise on http error

https://gerrit.wikimedia.org/r/160616

Excellent finding S! I have copy pasted your one-liner fix and made you an Author: :]

Thank you!