Page MenuHomePhabricator

[upstream] Jenkins: MediaWiki unit tests segfault on gallium
Closed, ResolvedPublic

Description

Change https://gerrit.wikimedia.org/r/#/c/43775/ made against mediawiki/core.git on branch 1.21wmf7, cause our PHPUnit tests to segfault (exit code 139).

Under the misc tests https://integration.mediawiki.org/ci/job/mediawiki-core-phpunit-misc/1244/console :

phpunit-misc:

[echo] Builddir: /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace
[echo] Logdir..: /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace/logs/
[echo] Indir...: /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace/tests/phpunit
[echo] Opts....: --group Database --exclude-group API,Dump,Parser,Broken,ParserFuzz,Stub -- 
[exec] PHPUnit 3.7.10 by Sebastian Bergmann.
[exec]  
[exec] Configuration read from /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace/tests/phpunit/suite.xml
[exec]  
[exec] .........................................
[exec] ....................   61 / 5298 (  1%)

BUILD FAILED
/var/lib/jenkins/jobs/_shared/build.xml:452: The following error occurred while executing this line:
/var/lib/jenkins/jobs/_shared/build.xml:473: exec returned: 139

Tim ran the test under gdb and it showed a segfault in preg_match_all() in
PHPUnit_Util_Test::getRequirements(), when running
self::REGEX_REQUIRES. Since we don't seem to use @requires, I just
replaced getRequirements() with "return array()", and then my
changeset passed all tests.

Here's the full backtrace:

Program received signal SIGSEGV, Segmentation fault.
zval_mark_grey (pz=0xa7f82a0) at
/root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:368
368 /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c: No such file
or directory.
(gdb) bt
#0 zval_mark_grey (pz=0xa7f82a0) at
/root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:368
#1 0x00000000006b73ac in zval_mark_grey (pz=<optimized out>)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:379

#2 0x00000000006b7e75 in gc_mark_roots () at
/root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:435
#3 gc_collect_cycles () at
/root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:664
#4 0x00000000006b8174 in gc_zval_possible_root (zv=<optimized out>)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:166

#5 0x00000000006a7e30 in zend_hash_destroy (ht=0xa7f80f0)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_hash.c:729

#6 0x00000000006994df in _zval_dtor_func (zvalue=0xa7e7598)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_variables.c:46

#7 0x0000000000473c08 in _zval_dtor (zvalue=0xa7e7598)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_variables.h:35

#8 php_pcre_match_impl (pce=0x8fbfcb0,

subject=0xa7aba48 "/**\n * These tests should work regardless of

$wgCapitalLinks\n * @group Database\n */\n/**\n\t * Make sure
MediaWikiTestCase extending classes have called their\n\t * parent
setUp method\n\t */", subject_len=184, return_value=0xa7ec5e0,
subpats=0xa7e7598, global=1, use_flags=0,

flags=0, start_offset=0) at

/root/wikimedia/php5/php5-5.3.10/ext/pcre/php_pcre.c:549
#9 0x0000000000473e6b in php_do_pcre_match (ht=3,
return_value=0xa7ec5e0, global=1,

return_value_ptr=<optimized out>, this_ptr=<optimized out>,

return_value_used=<optimized out>)

at /root/wikimedia/php5/php5-5.3.10/ext/pcre/php_pcre.c:519

#10 0x000000000070f80d in zend_do_fcall_common_helper_SPEC
(execute_data=0x7ffff7ee1f00)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:320

#11 0x00000000006c037b in execute (op_array=0x1d5f6c0)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:107

#12 0x000000000068d8bc in zend_call_function (fci=0x7fffffffba60,
fci_cache=<optimized out>)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_execute_API.c:969

#13 0x00000000005d0178 in zif_call_user_func_array (ht=<optimized
out>, return_value=0xa722870,

return_value_ptr=<optimized out>, this_ptr=<optimized out>,

return_value_used=<optimized out>)

at

/root/wikimedia/php5/php5-5.3.10/ext/standard/basic_functions.c:4803
#14 0x000000000070f80d in zend_do_fcall_common_helper_SPEC
(execute_data=0x7ffff7edf5c0)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:320

#15 0x00000000006c037b in execute (op_array=0x901c008)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:107

#16 0x000000000069b8e0 in zend_execute_scripts (type=8, retval=0x0,
file_count=3)

at /root/wikimedia/php5/php5-5.3.10/Zend/zend.c:1308

#17 0x0000000000647f53 in php_execute_script (primary_file=0x7fffffffe1d0)

at /root/wikimedia/php5/php5-5.3.10/main/main.c:2323

#18 0x000000000042c797 in main (argc=10, argv=0x7fffffffe3d8)

at /root/wikimedia/php5/php5-5.3.10/sapi/cli/php_cli.c:1188

Version: unspecified
Severity: major

Details

Reference
bz43972

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:14 AM
bzimport set Reference to bz43972.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 11627
backtrace by Tim of a unit test segfault

Above backtrace attached in a text file for convenience.

attachment unit-test-segfault.txt ignored as obsolete

We probably want to further isolate the unit test that cause that issue
and report them upstream (PHP and PHPUnit) and make it easier to reproduce. If it backtrace, we might want to try out another PHP version / some nightly.

I think recompiling PHP with the bundled PCRE source rather than the system library would be the first thing to try. Faidon may be able to help with that. If that doesn't fix it, then you could try PHP 5.3.x git head.

It's probably best to try a different PHP version before you isolate and report the issue, since the folks at bugs.php.net are unlikely to be interested in a segfault in a package they don't maintain.

If the bug isn't present in the latest 5.3.x, then it will probably be our responsibility to fix or work around it.

High priority since this is blocking merge in wmf branches and several people complained about it since yesterday.

I have removed the hack in PHPUnit and upgraded it to 3.7.13. Running out of a local copy works for me as well as using the workspace of change 44039 which did segfault :/

ah I manage to reproduce the segfault from time to time using Gerrit change 44221 patchset 1.

Command used:
WORKSPACE=/home/hashar/core JOB_NAME=testing_segfault_job_name ant -file /var/lib/jenkins/jobs/_shared/build.xml phpunit-databaseless

SELF NOTE:

On gallium I did:

My private clone of mediawiki

cd ~/core/tests/phpunit

Apply change 44221 patchset 1:

git fetch https://gerrit.wikimedia.org/r/mediawiki/core refs/changes/21/44221/1 && git checkout -b 44221/1 FETCH_HEAD

  1. change to the subdir, apparently running from the root directory
  2. of the working copy does not trigger the segfault (or i havent tried enough)

cd tests/phpunit

run gdb:

gdb --args php phpunit.php --conf /home/hashar/core/LocalSettings.php --exclude-group Database,Broken,ParserFuzz,Stub --log-junit /home/hashar/core/logs/junit.xml --; echo $?
(gdb) run

wait for segfault

Program received signal SIGSEGV, Segmentation fault.
zval_mark_grey (pz=0xa196d18) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:368
368 /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c: No such file or directory.

Ask for a backtrace:

(gdb) bt

  1. snip backtrace, which is above already.

The line #2 of the backtrace reference gc_mark_roots, googling for PHP segfault gc_mark_roots gives out https://bugs.php.net/bug.php?id=63055 which has the same backtrace when running test suite for Drupal and or Symfony2.

Laruence __ php.net says:
any usage of zval_dtor with recursive array may trigger this segfault.

We indeed see a call to _zval_dtor in our backtrace (line #7).

Created attachment 11638
backtrace with Zend functions shown

Using the .gdbinit from PHP, I found out what Tim found ages ago, aka that is caused by a preg_match_all()

(gdb) source /home/hashar/gdbinit
(gdb) zbacktrace

0x7ffff7ee37c8] preg_match_all("/@requires\s+(?P<name>function|extension)\s(?P<value>([^\40]+))\r?$/m", "\12/**\12\11\40*\40@dataProvider\40provideWfMatchesDomainList\12\11\40*/", array(7)[0xa196c78]) /usr/share/php/PHPUnit/Util/Test.php:125
[0x7ffff7ee32d8] PHPUnit_Util_Test::getRequirements("GlobalTest", "testWfMatchesDomainList") /usr/share/php/PHPUnit/Framework/TestCase.php:557
[0x7ffff7ee2a00] PHPUnit_Framework_TestCase->setRequirementsFromAnnotation() /usr/share/php/PHPUnit/Framework/TestCase.php:585
[0x7ffff7ee12c0] PHPUnit_Framework_TestCase->checkRequirements() /usr/share/php/PHPUnit/Framework/TestCase.php:822
[0x7fffffffbab0] PHPUnit_Framework_TestCase->runBare()
[0x7ffff7ee0e88] call_user_func_array(array(2)[0x9ad9a28], array(0)[0xa195ff8]) /usr/share/php/PHP/Invoker.php:93
[0x7ffff7edf4c0] PHP_Invoker->invoke(array(2)[0x9ad9a28], array(0)[0xa195ae0], 2) /usr/share/php/PHPUnit/Framework/TestResult.php:646
[0x7ffff7ede140] PHPUnit_Framework_TestResult->run(object[0x2334d50]) /usr/share/php/PHPUnit/Framework/TestCase.php:769
[0x7ffff7edd438] PHPUnit_Framework_TestCase->run(object[0x9225990]) /home/hashar/core/tests/phpunit/MediaWikiTestCase.php:116
[0x7ffff7edd320] MediaWikiTestCase->run(object[0x9225990]) /usr/share/php/PHPUnit/Framework/TestSuite.php:775
[0x7ffff7edbb10] PHPUnit_Framework_TestSuite->runTest(object[0x2334d50], object[0x9225990]) /usr/share/php/PHPUnit/Framework/TestSuite.php:745
[0x7ffff7eda2e8] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x9225d70], array(4)[0x9225d20], false) /usr/share/php/PHPUnit/Framework/TestSuite.php:705
[0x7ffff7ed8ac0] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x9225d70], array(4)[0x9225d20], false) /usr/share/php/PHPUnit/Framework/TestSuite.php:705
[0x7ffff7ed7298] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x9533340], array(4)[0x95334f0], false) /usr/share/php/PHPUnit/Framework/TestSuite.php:705
[0x7ffff7ed45b0] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x95343a0], array(4)[0x9534550], false) /usr/share/php/PHPUnit/TextUI/TestRunner.php:346
[0x7ffff7ed39f8] PHPUnit_TextUI_TestRunner->doRun(object[0x1b4adf8], array(7)[0x9535338]) /usr/share/php/PHPUnit/TextUI/Command.php:176
[0x7ffff7ed3800] PHPUnit_TextUI_Command->run(array(10)[0x3638e90], false) /home/hashar/core/tests/phpunit/MediaWikiPHPUnitCommand.php:61
[0x7ffff7ed34b0] MediaWikiPHPUnitCommand->run(array(10)[0x3639e68], true) /home/hashar/core/tests/phpunit/MediaWikiPHPUnitCommand.php:47
[0x7ffff7ed3068] MediaWikiPHPUnitCommand::main() /home/hashar/core/tests/phpunit/phpunit.php:107

attachment unit-test-zbacktrace.txt ignored as obsolete

Tim proposed to use a different PHP version and or PECL version. According to upstream bug 63055, the bug is in PHP-5.4.x as well so I have reinstated Tim live hack to PHPUnit:

vim /usr/share/php/PHPUnit/Util/Test.php

 public static function getRequirements($className, $methodName)
{
    // HASHAR TIM hack bug https://bugzilla.wikimedia.org/43972
    return array();
...
}

That is a workaround for the bug.

  • Bug 43390 has been marked as a duplicate of this bug. ***

Lowering priority since we have applied a workaround

  • Bug 44306 has been marked as a duplicate of this bug. ***

Upstream bug apparently got solved http://git.php.net/?p=php-src.git;a=commit;h=ccc519b7a92bfe4b191c0e2e3869516171247ac2

That commit is in:

$ git branch -r --contains ccc519b7a92bfe4b191c0e2e3869516171247ac2

origin/HEAD -> origin/master
origin/PHP-5.4
origin/PHP-5.4.10
origin/PHP-5.4.11
origin/PHP-5.4.9
origin/PHP-5.5
origin/immutable-date
origin/master

So I guess PHP >= 5.4.9 is fine :-)

Moving bug back in poll. This will be fixed whenever we upgrade to PHP 5.3.19+

Got another occurrence when running the full test suite:

https://integration.wikimedia.org/ci/job/mediawiki-core-master-phpunit-all/1454/consoleFull

/var/lib/jenkins/jobs/_shared/build.xml:437: The following error occurred while executing this line:
/var/lib/jenkins/jobs/_shared/build.xml:482: exec returned: 139

I can confirm the workaround described in Comment #11 is still present. So we must have yet another segfault issue :(

  • Bug 47069 has been marked as a duplicate of this bug. ***

Pinged ops-l list about it. Seems to me we want to cherry-pick the upstream change in our PHP package.

No activity on RT, I have pinged it.

Alexandros provided some new packages. I have manually installed them on gallium:

dpkg -i \libapache2-mod-php5_5.3.10-1ubuntu3.7+wmf1_amd64.deb \php5-cli_5.3.10-1ubuntu3.7+wmf1_amd64.deb \php5-common_5.3.10-1ubuntu3.7+wmf1_amd64.deb \php5-curl_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-dbg_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-dev_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-gd_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-intl_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-mysql_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-pgsql_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-sqlite_5.3.10-1ubuntu3.7+wmf1_amd64.deb \
php5-tidy_5.3.10-1ubuntu3.7+wmf1_amd64.deb

Created attachment 13224
backtrace with PHP 5.3.10-1ubuntu3.7+wmf1 provided by Alexandros

Attached:

Created attachment 13225
2nd backtrace with PHP 5.3.10-1ubuntu3.7+wmf1

Another backtrace. zbacktrace has now clue, phpbt yields:

No symbol "execute_data" in current context.

Attached:

phpunit 3.7.24 has been deployed last week on gallium.

I am upgrading the PHP packages to keep them in sync with production. That get rid of Alexandros PHP patches but since PHPUnit has a workaround, that should be fine.

Retriggering the coverage job at https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/

Change 83940 had a related patch set uploaded by Hashar:
disable suoshin mem handler for code coverage

https://gerrit.wikimedia.org/r/83940

Created attachment 13273
3rd backtrace with suhosin canary mm disabled

After running the job with SUHOSIN_MM_USE_CANARY_PROTECTION=0 disabling suhosin's mm there was a different bt. Attaching it here.

Attached:

PHP still segfaults but it happens very late in PHP execution (during shutdown), so the HTML is actually generated and published at https://integration.wikimedia.org/cover/mediawiki-core/master/php/

Change 83940 abandoned by Hashar:
disable suoshin mem handler for code coverage

Reason:
does not prevent PHP from segfaulting ..

https://gerrit.wikimedia.org/r/83940

Just want to note that Wikibase also has troubles with phpunit on PHP 5.3.27 (on travis-ci).

Backtrace:
http://pastebin.com/Me7zsvmk

Created attachment 14213
backtrace of Wikibase tests on travis

Attaching to bug the backtrace pasted at http://pastebin.com/Me7zsvmk

Attached:

Change 116093 had a related patch set uploaded by Hashar:
Coverage now ignore phpunit ignores

https://gerrit.wikimedia.org/r/116093

Change 116093 merged by jenkins-bot:
Coverage now ignore phpunit ignores

https://gerrit.wikimedia.org/r/116093

All patches merged; resetting ticket status

There is no more segfaults happening.

Quite an old bug that has been worked around by ignoring PHP exit code while generating the code coverage (ex: php phpunit <coverage args> || :.

Upstream bug is https://bugs.php.net/bug.php?id=63055 @akosiaris is apparently working on including that patches in Wikimedia PHP 5.3 package.

s/is/has been/ and I 've been wondering about the future of the wikimedia php5.3 package in general which is why thing got some more info again.

The two patches included in wikimedia PHP 5.3 package (for precise only) are:

https://bugs.php.net/bug.php?id=63055 as mentioned by @hashar
https://bugs.php.net/bug.php?id=65873 that could probably find it's way into Ubuntu security packages

Not any new actionables though to add here