Page MenuHomePhabricator

Jenkins job fails when all tests pass
Closed, ResolvedPublic

Description

From time to time a Jenkins job will fail when all tests fail. Console log also does not contain any information on why the job failed.

Example:

https://wmf.ci.cloudbees.com/job/MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox/275/console

...
Build step 'Execute shell' marked build as failure
...


Version: unspecified
Severity: normal

Details

Reference
bz60037

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:53 AM
bzimport set Reference to bz60037.
bzimport added a subscriber: Unknown Object (MLST).

Cloudbees support suggested adding --backtrace to "bundle exec cucumber":

https://gerrit.wikimedia.org/r/#/c/106260

It did not help, no additional information was displayed in Jenkins console log.

The next advice was to add this:

echo "Failure in cucumber"

to the end of every "bundle exec cucumber":

https://gerrit.wikimedia.org/r/#/c/107164/

It did not help. It actually made things worse. When a test failed, instead of marking a job as failed, it marked it as unstable. That caused no e-mail notifications to be sent.

Change 107363 had a related patch set uploaded by Zfilipin:
Send e-mail for every unstable Jenkins job

https://gerrit.wikimedia.org/r/107363

Change 107363 merged by Cmcmahon:
Send e-mail for every unstable Jenkins job

https://gerrit.wikimedia.org/r/107363

Change 108493 had a related patch set uploaded by Zfilipin:
removed debugging code from Jenkins jobs

https://gerrit.wikimedia.org/r/108493

  • ask cucumber people if anything else but verbose and backtrace would give more information
  • ask cloudbees support if we could get shell script exit code
  • parse jenkins logs for jobs that fail with "Build step 'Execute shell' marked build as failure" but no failed tests

Change 108493 merged by jenkins-bot:
removed debugging code from Jenkins jobs

https://gerrit.wikimedia.org/r/108493

Cloudbees support said this will return shell error code:

do_something_that_may_fail || (echo failed with $?; false)

You might want to run cucumber with some increased verbosity. In its current mode it might be hiding a stacktrace and exit 1 without any message.

Looking at the failing job:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/

The console does not show anything relevant beside the job failing:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/console

Looking at the test report we get way more details:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/testReport/

With a nice stack trace and way more details for each tests:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/testReport/(root)/Page/Move_existing_page_dialog/

config/cucumber.yml of qa/browsertests.git has:

ci: --format Cucumber::Formatter::Sauce --out reports/junit

That define a profile named 'ci' with options which are passed to cucumber. That configuration means cucumber will write output using the Sauce format to a file (reports/junit) discarding output to the console.

It might be possible to add a second formatter and have write to stdout which would show up on the console.

Another possibility is to intercept cucumber exit code and echo a clear message saying the job has some failure and craft an URL pointing to the test report. Something like:

bundle exec cucumber --backtrace --verbose --profile ci --tags @en.wikipedia.beta.wmflabs.org || (echo -e "\nJob as failed (exit code: $?).\nSee test report at $BUILD_URL/testReport/\n")

Jenkins env variables are listed at:

https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables

Change 110345 had a related patch set uploaded by Zfilipin:
Increases verbosity of Cucumber output

https://gerrit.wikimedia.org/r/110345

Change 110345 merged by Cmcmahon:
Increases verbosity of Cucumber output

https://gerrit.wikimedia.org/r/110345

Chris has noticed this:

...
@en.wikipedia.beta.wmflabs.org @test2.wikipedia.org
Feature: File

Scenario: Anonymous goes to file that does not exist         # features/file.feature:15
  Given I am at file that does not exist                     # features/step_definitions/file_steps.rb:12
  Then page text should contain No file by this name exists. # features/step_definitions/page_steps.rb:123

@login
Scenario: Logged-in user goes to file that does not exist   # features/file.feature:20

too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 70344678247960, last used 120.314153617 seconds ago (Net::HTTP::Persistent::Error)
...

1: https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-firefox/622/console

The next step we need to do is to download all xml files from ws/reports/junit/ folder when a job fails but no tests fail. We should inspect the xml files and figure out if the failure is recorded there and Jenkins is not seeing it, or if the failure is not reported there for some reason.

The latest example:

https://wmf.ci.cloudbees.com/view/uls/job/UniversalLanguageSelector-language-browsertests.wmflabs.org-linux-firefox/40/console

...
Scenario: Applying the live preview of interface font # features/font_selection_default_enabled.feature:44

bad URI(is not URI?):  (URI::InvalidURIError)

...

The scenario is marked "skipped" instead of "failed":

https://wmf.ci.cloudbees.com/view/uls/job/UniversalLanguageSelector-language-browsertests.wmflabs.org-linux-firefox/40/testReport/(root)/Font%20selection/Applying_the_live_preview_of_interface_font/

From TEST-features-font_selection_default_enabled.xml (attached):

...
<testcase classname="Font selection" name="Applying the live preview of interface font" time="65.901432">

<skipped/>
<system-out/>
<system-err/>

</testcase>
...

Created attachment 14821
junit xml file generated by cucumber

Attached:

(In reply to Nemo from comment #22)

Same, right?
https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-databaseless/23010/consoleFull

That one is a PHP segfault tracked by bug 62623 and is completely unrelated.

Chris, I do not remember seeing this since we moved to wikimedia jenkins. Can this be resolved as fixed?

Looks like this is fixed. Please reopen if the problem happens again.