Page MenuHomePhabricator

Apache randomly dies using up all CPU cycles after upgrade from 1.8.2 to trunk
Closed, ResolvedPublic

Description

I've tested this on two different servers, both running Apache/2.0.58 and PHP 5.1.6.

The same MediaWiki version is also running on another server with a blank database and works correctly.

When clicking different wiki links very often apache dies (request times out) and the certain process consumes all
free CPU ressources until it is killed manually with SIGKILL (SIGTERM doesn't work). It seems to happen especially
if I click a link when the page is still loading. Non-wiki links don't lead to this problem.

This happens on both servers with two different MW 1.8.2 installations which were upgraded to 1.9svn.

I also tried different apache workers (mpm-prefork, mpm-worker), it happens both with threads enabled or without
threads, no luck so far.

As it only happens on updated systems I guess there is a database problem. To be sure I also wiped out one existing
installation and copied the files from the clean, working installation without luck.


Version: 1.9.x
Severity: critical
OS: Linux
Platform: PC

Details

Reference
bz8148

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 9:30 PM
bzimport set Reference to bz8148.
bzimport added a subscriber: Unknown Object (MLST).

Maybe you can trace system calls ? Under linux:

strace -p <pid of the cpu eating apache>

That might help in finding the trouble.

interesting - it shows nothing.

I tried something new - I installed another MediaWiki instance with a newer revision - I also get the problems there even it is a blank installation, but not that often and -
after waiting around one or two minutes apache still lives and finally delivers a page (while using up all idle CPU cycles in between).

This is the first part of strace's output - as stated above there was no output as long as apache was "hnaging", but the "--- SIGPROF (Profiling timer expired) @ 0 (0) ---"
sound interesting to me.

-:[/var/www/localhost/htdocs/u00023/wiki]#> strace -p24333
Process 24333 attached - interrupt to quit

  • SIGPROF (Profiling timer expired) @ 0 (0) ---

setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) = 0
rt_sigaction(SIGPROF, {0xb7655846, [PROF], SA_RESTART}, {0xb7655846, [PROF], SA_RESTART}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [PROF], NULL, 8) = 0
stat64("/", {st_mode=S_IFDIR|0755, st_size=432, ...}) = 0
brk(0x8d4d000) = 0x8d4d000
mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb672e000
munmap(0xb672e000, 135168) = 0
brk(0x8d3d000) = 0x8d3d000
brk(0x8d2d000) = 0x8d2d000
pwrite64(242, "wsUserID|i:3;wsUserName|s:6:\"Man"..., 137, 0) = 137
close(242) =
0

I recompiled PHP without the concurrentmodphp patch.

The problem still exists, in fact the process is no really dying and not proceeding after some time.

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

19937 apache 22 0 274m 36m 6096 S 97.7 4.1 0:31.00 apache2

-:[/etc/apache2/modules.d]#> strace -p19937
Process 19937 attached - interrupt to quit
read(7,

That's all.

After sending SIGTERM:

-:[/etc/apache2/modules.d]#> strace -p19937
Process 19937 attached - interrupt to quit
read(7, 0xbff02437, 1) = ? ERESTARTSYS (To be restarted)

  • SIGTERM (Terminated) @ 0 (0) ---

sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
read(7,

I used the debug log to find out what MediaWiki is doing when apache dies:

Start request
GET /Netphone/TQM
User-Agent: Opera/9.02 (X11; Linux x86_64; U; de)
Host: wiki.fhbb-online.de
Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Accept-Language: de,en;q=0.9
Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0
Referer: http://wiki.fhbb-online.de/Netphone/TQM
If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
Cookie: db00023_mw_UserID=3; db00023_mw_UserName=Manuel; db00023_mw_Token=c7cfad10e52f13a96da298abcac55d8d; db00023_mw__session=20e5734361bb46732dd03d2bdf12bfef
Cookie2: $Version=1
Connection: Keep-Alive, TE
TE: deflate, gzip, chunked, identity, trailers

Main cache: FakeMemCachedClient
Message cache: MediaWikiBagOStuff
Parser cache: MediaWikiBagOStuff
Fully initialised
Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal
Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead
Cache miss for user 3
Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB
Logged in from cookie
Unstubbing $wgOut on call of $wgOut->setSquidMaxage from MediaWiki::performAction
OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::checkLastModified: -- we might send Last-Modified : Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::sendCacheControl: private caching; Sun, 10 Dec 2006 13:04:30 GMT **
OutputPage::checkLastModified: CACHED client: 20061210130430 ; user: 20061210001119 ; page: 20061210090716 ; site 20061210130430

The next request killed apache again:

Start request
GET /Netphone
User-Agent: Opera/9.02 (X11; Linux x86_64; U; de)
Host: wiki.fhbb-online.de
Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Accept-Language: de,en;q=0.9
Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0
Referer: http://wiki.fhbb-online.de/Netphone/TQM
If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
Cookie: db00023_mw_UserID=3; db00023_mw_UserName=Manuel; db00023_mw_Token=c7cfad10e52f13a96da298abcac55d8d; db00023_mw__session=20e5734361bb46732dd03d2bdf12bfef
Cookie2: $Version=1
Connection: Keep-Alive, TE
TE: deflate, gzip, chunked, identity, trailers

Main cache: FakeMemCachedClient
Message cache: MediaWikiBagOStuff
Parser cache: MediaWikiBagOStuff
Fully initialised
Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal
Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead
Cache miss for user 3
Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB
Logged in from cookie
Unstubbing $wgOut on call of $wgOut->setSquidMaxage from MediaWiki::performAction
OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::checkLastModified: -- we might send Last-Modified : Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::sendCacheControl: private caching; Sun, 10 Dec 2006 13:04:30 GMT **
OutputPage::checkLastModified: CACHED client: 20061210130430 ; user: 20061210001119 ; page: 20061210090716 ; site 20061210130430

Well, I found out that all requests with "OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT" let apache die.

If I wait a minute before sending another request I get "OutputPage::checkLastModified: client did not send If-Modified-Since header" and it works.

I found the line of code where it happens by adding some additional wfDebug() statements:

// Don't output compressed blob
while( $status = ob_get_status() ) {
        ob_end_clean();
}

This loop runs forever.

Any comments?

ehr, forgot the location:

includes/OutputPage.php line 139

geez, as soon as I add a var_dump( $status ); it works...

// Don't output compressed blob
while( $status = ob_get_status() ) {
var_dump( $status );
ob_end_clean();
}

That's not a good solution, but it works so far and maybe there is someone who can figure out where the problem is.

this seems to be a bug in php, or at least odd and unexpected behavior. Can you
please try replacing the loop with the followign code:

while (ob_get_level()) {
    ob_end_clean();
}
$status = ob_get_status();

Comments on http://www.php.net/manual/de/function.ob-get-level.php seem to
suggest that this sometimes may also fail, though. They suggest yet another way:

while (ob_get_length () !== FALSE) {
    ob_end_clean();
}
$status = ob_get_status();

It seems unclear if "internal" buffers of PHP are counted, and if they can be
"popped" using ob_end_clean(). If you want to do some further experiments, try
to disable all "internal" output buffereing done by php. The following ini
options come to mind: zlib.output_compression, tidy.clean_output,
session.use_trans_sid, and output_buffering (see
http://www.php.net/manual/de/ini.php for reference).

Ok, so both code snippets don't work for me.

Current PHP settings:
output_buffering = Off
zlib.output_compression = On
session.use_trans_sid = 0

After disabling zlib.output_compression it works - all solutions, including the original file as in r18220.

But it is stil unclear now if "disabling zlib" should be a requirement of MediaWiki or if there is a workaround.

Should be fixed on trunk with r18253, please confirm.