Maniphest T62037

Jenkins job fails when all tests pass
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	zeljkofilipin
	Jan 14 2014, 12:56 PM

Description

From time to time a Jenkins job will fail when all tests fail. Console log also does not contain any information on why the job failed.

Example:

https://wmf.ci.cloudbees.com/job/MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox/275/console

...
Build step 'Execute shell' marked build as failure
...

Version: unspecified
Severity: normal

Details

Reference: bz60037

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Invalid		None	T62338 Investigate how often browser tests fail because of Jenkins performance problems (tracking)
		Resolved		None	T62037 Jenkins job fails when all tests pass

Event Timeline

• bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:53 AM

• bzimport added a project: Quality-Assurance.

• bzimport set Reference to bz60037.

• bzimport added a subscriber: Unknown Object (MLST).

zeljkofilipin created this task.Jan 14 2014, 12:56 PM

This is tracked as Cloudbees ticket #15192:

https://cloudbees.zendesk.com/requests/15192

Cloudbees support suggested adding --backtrace to "bundle exec cucumber":

https://gerrit.wikimedia.org/r/#/c/106260

It did not help, no additional information was displayed in Jenkins console log.

The next advice was to add this:

echo "Failure in cucumber"

to the end of every "bundle exec cucumber":

https://gerrit.wikimedia.org/r/#/c/107164/

It did not help. It actually made things worse. When a test failed, instead of marking a job as failed, it marked it as unstable. That caused no e-mail notifications to be sent.

Example:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/602/console

Change 107363 had a related patch set uploaded by Zfilipin:
Send e-mail for every unstable Jenkins job

https://gerrit.wikimedia.org/r/107363

Change 107363 merged by Cmcmahon:
Send e-mail for every unstable Jenkins job

https://gerrit.wikimedia.org/r/107363

Change 108493 had a related patch set uploaded by Zfilipin:
removed debugging code from Jenkins jobs

https://gerrit.wikimedia.org/r/108493

ask cucumber people if anything else but verbose and backtrace would give more information
ask cloudbees support if we could get shell script exit code
parse jenkins logs for jobs that fail with "Build step 'Execute shell' marked build as failure" but no failed tests

Change 108493 merged by jenkins-bot:
removed debugging code from Jenkins jobs

https://gerrit.wikimedia.org/r/108493

Created attachment 14398
def of a successful and failing consoles

Diff of two consoles:
https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/618/consoleText
https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/consoleText

Attached:

file_60037.txt1 KBDownload

Cloudbees support said this will return shell error code:

do_something_that_may_fail || (echo failed with $?; false)

You might want to run cucumber with some increased verbosity. In its current mode it might be hiding a stacktrace and exit 1 without any message.

Looking at the failing job:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/

The console does not show anything relevant beside the job failing:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/console

Looking at the test report we get way more details:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/testReport/

With a nice stack trace and way more details for each tests:

https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-chrome/619/testReport/(root)/Page/Move_existing_page_dialog/

config/cucumber.yml of qa/browsertests.git has:

ci: --format Cucumber::Formatter::Sauce --out reports/junit

That define a profile named 'ci' with options which are passed to cucumber. That configuration means cucumber will write output using the Sauce format to a file (reports/junit) discarding output to the console.

It might be possible to add a second formatter and have write to stdout which would show up on the console.

Another possibility is to intercept cucumber exit code and echo a clear message saying the job has some failure and craft an URL pointing to the test report. Something like:

bundle exec cucumber --backtrace --verbose --profile ci --tags @en.wikipedia.beta.wmflabs.org || (echo -e "\nJob as failed (exit code: $?).\nSee test report at $BUILD_URL/testReport/\n")

Jenkins env variables are listed at:

https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables

Change 110345 had a related patch set uploaded by Zfilipin:
Increases verbosity of Cucumber output

https://gerrit.wikimedia.org/r/110345

Change 110345 merged by Cmcmahon:
Increases verbosity of Cucumber output

https://gerrit.wikimedia.org/r/110345

Chris has noticed this:

...
@en.wikipedia.beta.wmflabs.org @test2.wikipedia.org
Feature: File

Scenario: Anonymous goes to file that does not exist         # features/file.feature:15
  Given I am at file that does not exist                     # features/step_definitions/file_steps.rb:12
  Then page text should contain No file by this name exists. # features/step_definitions/page_steps.rb:123

@login
Scenario: Logged-in user goes to file that does not exist   # features/file.feature:20

too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 70344678247960, last used 120.314153617 seconds ago (Net::HTTP::Persistent::Error)
...

1: https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-firefox/622/console

Apparently this is also the source of our intermittent "Unable to pick a platform" test failures, see https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-firefox/625/testReport/(root)/AFTv5/Check_if_AFTv5_is_on_the_page/ from 5 February

The next step we need to do is to download all xml files from ws/reports/junit/ folder when a job fails but no tests fail. We should inspect the xml files and figure out if the failure is recorded there and Jenkins is not seeing it, or if the failure is not reported there for some reason.

The XML files in question are located at e.g. https://wmf.ci.cloudbees.com/job/browsertests-en.wikipedia.beta.wmflabs.org-linux-firefox/ws/reports/junit/

The latest example:

https://wmf.ci.cloudbees.com/view/uls/job/UniversalLanguageSelector-language-browsertests.wmflabs.org-linux-firefox/40/console

...
Scenario: Applying the live preview of interface font # features/font_selection_default_enabled.feature:44

bad URI(is not URI?):  (URI::InvalidURIError)

...

The scenario is marked "skipped" instead of "failed":

https://wmf.ci.cloudbees.com/view/uls/job/UniversalLanguageSelector-language-browsertests.wmflabs.org-linux-firefox/40/testReport/(root)/Font%20selection/Applying_the_live_preview_of_interface_font/

From TEST-features-font_selection_default_enabled.xml (attached):

...
<testcase classname="Font selection" name="Applying the live preview of interface font" time="65.901432">

<skipped/>
<system-out/>
<system-err/>

</testcase>
...

Created attachment 14821
junit xml file generated by cucumber

Attached:

TEST-features-font_selection_default_enabled.xml1 KBDownload

Same, right? https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-databaseless/23010/consoleFull

(In reply to Nemo from comment #22)

Same, right?
https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-databaseless/23010/consoleFull

That one is a PHP segfault tracked by bug 62623 and is completely unrelated.

Chris, I do not remember seeing this since we moved to wikimedia jenkins. Can this be resolved as fixed?

Looks like this is fixed. Please reopen if the problem happens again.

	F12898: TEST-features-font_selection_default_enabled.xml
	Nov 22 2014, 2:53 AM

Jenkins job fails when all tests passClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Jenkins job fails when all tests pass
Closed, ResolvedPublic
Actions

Related Objects
Search...