Page MenuHomePhabricator

"webservice stop" leaves blocking php-cgi processes behind
Closed, ResolvedPublic

Description

The webservice for my "commonshelper" tool is running, but I can't load the web page(s). Examples:

http://tools.wmflabs.org/commonshelper/index.php (tool)
http://tools.wmflabs.org/commonshelper/index_test.php (simple test page)

The pages are just loading "forver".

I got similar bug reports for multiple tools since yesterday, which were resolved with restarting the web service, but apparently not this one.


Version: unspecified
Severity: blocker
URL: http://tools.wmflabs.org/commonshelper/

Details

Reference
bz64095

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:16 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz64095.

metatron wrote:

I've seen this problem before. lighttpd webservice stops, but old php-cgi processes remain. $webservice start then starts /one/ lighhtpd process, but can't start new php-cgi's. So plain html or py is served just fine, while php requests are "stuck".

This is the output from webgrid for commonshelper:

tools-webgrid-01: (13:51:40)

608 tools.co  20   0 48668 2116 1312 S    0  0.0   0:00.03 lighttpd

11144 tools.co 20 0 281m 11m 7748 S 0 0.1 0:00.03 php-cgi
11146 tools.co 20 0 288m 11m 4764 S 0 0.1 2:04.32 php-cgi
11147 tools.co 20 0 288m 11m 4680 S 0 0.1 0:36.61 php-cgi
11148 tools.co 20 0 288m 11m 4760 S 0 0.1 1:29.41 php-cgi
11149 tools.co 20 0 288m 11m 4756 S 0 0.1 2:42.74 php-cgi

tools-webgrid-02: (13:51:40)
19567 tools.co 20 0 281m 11m 7764 S 0 0.1 0:00.01 php-cgi
19575 tools.co 20 0 283m 9844 4320 S 0 0.1 0:35.07 php-cgi
19576 tools.co 20 0 283m 9836 4312 S 0 0.1 0:01.24 php-cgi
19577 tools.co 20 0 283m 9912 4272 S 0 0.1 0:34.99 php-cgi
19578 tools.co 20 0 283m 9796 4272 S 0 0.1 0:35.88 php-cgi

I figured out this workaround. Make this a script & execute:

#!/bin/bash
webservice stop
sleep 5
ssh tools-webgrid-01 'pkill -9 -U tools.commonshelper php-cgi'
ssh tools-webgrid-02 'pkill -9 -U tools.commonshelper php-cgi'
sleep 5
webservice start

metatron is correct; I recently had to purge some old processes (cf. [[wikitech:Nova Resource:Tools/SAL#April 10]]).

To fix Magnus' issue, I killed the blocking php-cgi processes; the tool should be working again.

The underlying problem is that "webservice stop" uses qdel which by default uses SIGKILL. That kills the lighttpd process and its workers, but not the spawned php-cgi processes.

Testing shows that on SIGTERM lighttpd correctly ends its workers and the spawned php-cgi processes.

I recently filed bug #61102 to use SIGTERM for the general case of jsub; the same logic applies to this bug as well.

Thanks Tim, metatron, it works again!

It works for now :-), but the general problem hasn't been solved yet.

Ha! I knew I had jotted down something about the problem earlier.

  • This bug has been marked as a duplicate of bug 63878 ***