Page MenuHomePhabricator

RequestContextTest::testImportScopedSession fails on Travis CI's hhvm
Closed, ResolvedPublic

Description

Cf. https://travis-ci.org/wikimedia/mediawiki/jobs/40335649:

1) RequestContextTest::testImportScopedSession
Correct restored IP address.
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'127.0.0.1'
+'192.0.2.0'
/home/travis/build/wikimedia/mediawiki/tests/phpunit/includes/RequestContextTest.php:91
/home/travis/build/wikimedia/mediawiki/tests/phpunit/MediaWikiTestCase.php:141

Version: unspecified
Severity: normal
See Also:
https://github.com/hhvm/packaging/issues/81
https://github.com/travis-ci/travis-ci/issues/2942

Details

Reference
bz73177

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:59 AM
bzimport set Reference to bz73177.

This test expects unset( $sc ); to trigger running the __destruct() method of the ScopedCallback object held in $sc. It appears that under the Travis-CI this destructor is not being called.

There is an ini setting (hhvm.enable_obj_destruct_call) to ensure that destructors are called which we set to "true" in the MediaWiki-Vagrant and WMF cluster config files.

gerritadmin wrote:

Change 172100 had a related patch set uploaded by BryanDavis:
travis-ci: Force object destructors to run under HHVM

https://gerrit.wikimedia.org/r/172100

Is this something restricted to the unit test, or do we need to add a note to $somewhere (INSTALL?) that "in order to (experimentally) run MediaWiki with hhvm, you need to set hhvm.enable_obj_destruct_call to $something"? Also, the current patch set's .travis.hhvm.ini has "hhvm.enable_obj_destruct_call = false", while above you wrote that it is set to true in MediaWiki-Vagrant and at the WMF cluster.

(In reply to Tim Landscheidt from comment #3)

Is this something restricted to the unit test, or do we need to add a note
to $somewhere (INSTALL?) that "in order to (experimentally) run MediaWiki
with hhvm, you need to set hhvm.enable_obj_destruct_call to $something"?

When we figure out the settings that are required I think it would deserve a release note of some kind and optimally a startup warning if the settings are incorrect.

Also, the current patch set's .travis.hhvm.ini has
"hhvm.enable_obj_destruct_call = false", while above you wrote that it is
set to true in MediaWiki-Vagrant and at the WMF cluster.

I intended to set it to true but apparently got some additional testing settings included. This points more strongly to disabling JIT as the thing that fixes the test. I'll run a few more iterations on Travis to see if I can narrow down what actually fixes the failure there.

(In reply to Bryan Davis from comment #4)

(In reply to Tim Landscheidt from comment #3)

Also, the current patch set's .travis.hhvm.ini has
"hhvm.enable_obj_destruct_call = false", while above you wrote that it is
set to true in MediaWiki-Vagrant and at the WMF cluster.

I intended to set it to true but apparently got some additional testing
settings included. This points more strongly to disabling JIT as the thing
that fixes the test. I'll run a few more iterations on Travis to see if I
can narrow down what actually fixes the failure there.

It turns out that disabling JIT is all that is needed to make the test past (https://travis-ci.org/bd808/mediawiki/jobs/40476541).

I'm hoping this is just an artifact of something in the cli process and not a systemic problem in a running wiki. HHMV without JIT is not highly compelling speed improvement for the investment we have made in it.

Does anyone have an example of a page/feature in a live MediaWiki instance that would be highly sensitive to this sort of scoped destructor behavior? I think we need to figure out if the problem is isolated to the unit test or more systemic.

(In reply to Bryan Davis from comment #5)

It turns out that disabling JIT is all that is needed to make the test past
(https://travis-ci.org/bd808/mediawiki/jobs/40476541).

I'm hoping this is just an artifact of something in the cli process and not
a systemic problem in a running wiki. HHMV without JIT is not highly
compelling speed improvement for the investment we have made in it.

Is it possible that this is a bug in HHVM's JIT?

I've run numerous experiments on Travis in an attempt to isolate the setting changes that made the test pass [0]. At first I thought that disabling the JIT fixed things [1], but in additional tests using the -v Eval.Jit=false form of this configuration i found that Travis continued to fail the test.

After many variations I settled on a suite of tests that isolate the required change [2]:

  • [passes] hhvm -c empty.ini tests/phpunit/phpunit.php
  • [passes] hhvm --no-config tests/phpunit/phpunit.php
  • [fails] hhvm --php tests/phpunit/phpunit.php
  • [fails] hhvm -vEval.Jit=false tests/phpunit/phpunit.php
  • [fails] hhvm tests/phpunit/phpunit.php

The empty.ini content is:

; this file intentionally empty

The --no-config flag basically does the same thing as passing an empty config, namely it disables loading the system default configuration files. This is the real cause of the test failure on Travis. The evidence can be seen in the original test run given in comment #1 [3]:

Destructor threw an object exception: open(/var/lib/php5/sess_d612ee607c87e749ef14da4983a702cd, O_RDWR) failed: No such file or directory (2)
/home/travis/build/wikimedia/mediawiki/includes/context/RequestContext.php:520
/home/travis/build/wikimedia/mediawiki/includes/context/RequestContext.php:555
/home/travis/build/wikimedia/mediawiki/includes/libs/ScopedCallback.php:70
/home/travis/build/wikimedia/mediawiki/tests/phpunit/includes/RequestContextTest.php:88
/home/travis/build/wikimedia/mediawiki/tests/phpunit/MediaWikiTestCase.php:141

This is evidence that the destructor is actually being called that I missed. The real problem is that the callback invoked by the destructor is failing part way through which leaves the environment only partially rolled back. The results of the --no-config and empty config tests indicate that the crash is triggered by something in the default Travis configuration.

Using another testing environment running on Travis I was able to dump the default configuration and found that it sets "session.save_path = /var/lib/php5" in addition to a few other settings. I also confirmed that the /var/lib/php5 path does not exist under the hhvm runtime configuration [4].

The default ini file comes from the upstream HHVM nightly package [5]. The have apparently setup some parts of the default configuration to mirror the php5 package, but they have neglected to ensure that the package creates the directory they have named. I think this is really a bug in the HHVM nightly package [6], but it could be worked around in either our test environment configuration or by Travis in their configuration [7].

[0]: https://travis-ci.org/bd808/mediawiki/jobs/40440444
[1]: https://travis-ci.org/bd808/mediawiki/jobs/40476541
[2]: https://travis-ci.org/bd808/mediawiki/builds/40509957
[3]: https://travis-ci.org/wikimedia/mediawiki/jobs/40335649#L120-L126
[4]: https://travis-ci.org/bd808/travis-hhvm-experiments/jobs/40511484#L30-L47
[5]: https://github.com/hhvm/packaging/blob/9bd53edcfd853efc65fe0062c7f5f5bdd73b5079/hhvm/deb/skeleton/etc/hhvm/php.ini#L3https://github.com/hhvm/packaging/blob/master/hhvm/deb/skeleton/etc/hhvm/php.ini#L3
[6]: https://github.com/hhvm/packaging/issues/81
[7]: https://github.com/travis-ci/travis-ci/issues/2942

Thanks for the detailed track-down! I assume, with the nature of nightlies and the general Travis CI configuration, it is worth to wait a few days for upstream to fix this.

gerritadmin wrote:

Change 172100 abandoned by BryanDavis:
travis-ci: Configure HHVM runtime

Reason:
Let's get this fixed properly upstream rather than with a local hack.

https://gerrit.wikimedia.org/r/172100

I have submitted a pull request upstream to Travis that attempts to work around the problem there (https://github.com/travis-ci/travis-build/pull/327). Let's see what happens.

(In reply to Bryan Davis from comment #10)

I have submitted a pull request upstream to Travis that attempts to work
around the problem there
(https://github.com/travis-ci/travis-build/pull/327). Let's see what
happens.

This pull request has been merged (https://github.com/travis-ci/travis-build/pull/327#event-192671549). I'm not sure how long it takes before the changes are deployed.