Page MenuHomePhabricator

Tool labs web servers give 503's
Closed, ResolvedPublic

Description

No tools seem to be working anymore. All tools give a 503:

Service Temporarily Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

If I look in the access log I see that one of my tools last served a request at [30/Nov/2013:23:42:24 +0000]


Version: unspecified
Severity: major
URL: http://tools.wmflabs.org/

Details

Reference
bz57794

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:33 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz57794.

(In reply to comment #1)

Do you have any url?

For me it just works fine:
http://tools.wmflabs.org/

(php)
https://tools.wmflabs.org/wikiviewstats/

I notice that some tools work and some don't. Maybe it's related with PHP?
My tool for example is written in PHP and doesn't work:
https://tools.wmflabs.org/intersect-contribs/

metatron wrote:

Mine is php, too - and it works. But there seems to be an issue though.

Grid-Status doesn't show up:
http://tools.wmflabs.org/?status

btw: Are you using new web already and is your http service in list when »qstat«?

As writing these lines: qstat fails w/ error. Maybe the Grid is partially broken.

local-wikiviewstats@tools-login:~$ qstat
error: commlib error: got select error (No route to host)
error: unable to send message to qmaster using port 6444 on host "tools-master.pmtpa.wmflabs": got send error

(In reply to comment #3)

Mine is php, too - and it works. But there seems to be an issue though.

Oops, that's true :)

Grid-Status doesn't show up:
http://tools.wmflabs.org/?status

btw: Are you using new web already and is your http service in list when
»qstat«?

My tools aren't using the new web service; "webservice start" just hangs.

As writing these lines: qstat fails w/ error. Maybe the Grid is partially
broken.

local-wikiviewstats@tools-login:~$ qstat
error: commlib error: got select error (No route to host)
error: unable to send message to qmaster using port 6444 on host
"tools-master.pmtpa.wmflabs": got send error

qstat fails with the same error for me.

From 208.80.153.237 icmp_seq=14 Destination Host Unreachable
^C

  • ee-dashboard.wmflabs.org ping statistics ---

14 packets transmitted, 0 received, +6 errors, 100% packet loss, time 13001ms

(In reply to comment #5)

From 208.80.153.237 icmp_seq=14 Destination Host Unreachable
^C

  • ee-dashboard.wmflabs.org ping statistics ---

14 packets transmitted, 0 received, +6 errors, 100% packet loss, time 13001ms

My tool's web interface seems working: https://tools.wmflabs.org/liangent-php/index.php/enwiki?title=Special:BlankPage

Looks like a number of hosts are down: http://ganglia.wmflabs.org/latest/?r=hour&cs=&ce=&s=by+name&c=tools&tab=m&vn=

I restarted the grid master and the webservers, let's see what happens

(In reply to comment #6)

My tool's web interface seems working:
https://tools.wmflabs.org/liangent-php/index.php/enwiki?title=Special:
BlankPage

Webgrid is working, so newweb tools will continue to work - just can't submit any new ones. One of the webservers (and the proxy) is also operational, so there's a 1/3 chance of your tool working even if you are using apache.

So the instances I tried to reboot seem to be stuck in a 'rebooting' state, and I'm not able to check what the console says either (wikitech just says 'failed to get console output'). Looks like resolving this will need someone with higher powers than what I have.

Looks like tools using the new webserver are immune. Xtools, all written in PHP btw, are running just fine.

(In reply to comment #10)

Looks like tools using the new webserver are immune. Xtools, all written in
PHP btw, are running just fine.

Or not. It just seems to hang now.

One of the hardware servers providing virtual servers had crashed during the night, disrupting service from those virtual environments running on it (and only those). It's in the process of returning to service now.

Something broke. Every internal link on xtools is being redirected else where and doesn't work.

For example when I hover over a link on the edit counter, it points to http://tools.wmflabs.org/xtools/ec/, when I click on it, it instead goes to tools-webgrid-01:4040/xtools/ec/

which virt box crashed? And, any idea why?

Did you close this again on purpose or was it a mid-air collision?

(In reply to comment #14)

which virt box crashed? And, any idea why?

virt10: https://ganglia.wikimedia.org/latest/?c=Virtualization%20cluster%20pmtpa&h=virt10.pmtpa.wmnet&m=cpu_report&r=day&s=by%20name&hc=4&mc=2