Page MenuHomePhabricator

Jenkins merged a faulty change
Closed, ResolvedPublic

Description

Aaron Schulz wrote:

I noticed that https://gerrit.wikimedia.org/r/#/c/33971/ passed the tests but after it was merged, the new tests started failing for everything. The commit to revert it also failed so I override Jenkins and merged anyway, and the failures went away for new commits. This indicates that something broken is going, possibly Jenkins running tests just against master rather than master + the patch, which would explain this problem.


Version: unspecified
Severity: normal

Details

Reference
bz46723

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:30 AM
bzimport set Reference to bz46723.

Related URL: https://gerrit.wikimedia.org/r/58283 (Gerrit Change I4b3fadccaae9c35964a0c47d63b22c4f35148a24)

From bug 47031 : https://gerrit.wikimedia.org/r/#/c/57436/ has been merged although it is faulty.

The unit tests ran on patchset upload did catch the issue:

https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-misc/5222/console : FAILURE

But the gating run after CR+2 did not catch it:

https://integration.wikimedia.org/ci/job/mediawiki-core-phpunit-misc/5223/console : SUCCESS

The root cause is that despite the ZUUL_REF points to the proper merge commit, the Jenkins Git plugin seems to use the current origin/master to build.

build #5223

Workspace did get wiped:
02:46:53 Wiping out workspace first.

It check out the revision:
02:46:56 Checking out Revision 4c69569db71d149feff6c4b10ea7a493425d67fd (origin/master)

That is the master revision NOT the change. The commit should have been
7dd3356a51951f8cdfe463552b5e5aae272e8e60


The related merge job
https://integration.wikimedia.org/ci/job/mediawiki-core-merge/11333/console

02:44:17 Commencing build of Revision 7dd3356a51951f8cdfe463552b5e5aae272e8e60 (origin/master)
02:44:17 Checking out Revision 7dd3356a51951f8cdfe463552b5e5aae272e8e60 (origin/master)


The ZUUL_REF has probably not been resolved properly and the git plugin fallback to master.

There is also the possibility that the mediawiki-core-phpunit-misc job was using ZUUL_COMMIT as a refspec instead of ZUUL_REF. That might prevent the plugin from fetching the revision. The job history is no more accessible due to an unexpected upgrade (see bug 47040).

Created attachment 12065
python script parsing build logs to find Zuul commit vs Git plugin checkout

Attached:

Created attachment 12066
output of checkbug46723.py

The result script output highlight that some builds are not testing what they should be testing because they check out a parent commit. By looking at the Jenkins Git plugin source code, it seems that whenever the reference is not parseable (aka: git rev-parse $ZUUL_REF), the plugin fallback to use master or some parent commit.

I need to improve the script to find out if that happens in a specific pipeline or for some specific refs.

Attached:

Extract for the two builds referenced somewhere above:

Verifying /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/builds/5222/log
Zuulcommit: 8cc0b601aa2db6db09ac0e4d70847293d75875aa
Checkedout: 8cc0b601aa2db6db09ac0e4d70847293d75875aa
Verifying /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/builds/5223/log
Zuulcommit: 7dd3356a51951f8cdfe463552b5e5aae272e8e60
Checkedout: 4c69569db71d149feff6c4b10ea7a493425d67fd (MISMATCH)

We can see that build 5223 did not used the proper commit :-]

I suspect git plugin does not fetch the proper references / can't find it. That result internally in an unknown sha1 and then git plugin fallback to master or something else.

I will try to reproduce the issue in labs with git plugin set to verbose. That needs to start Jenkins with -Dhudson.plugins.git.GitSCM.verbose=true

I have traced the issue as far as mediawiki-core-lint build #19 from made on November 22nd 2012).

MISMATCH in /var/lib/jenkins/jobs/mediawiki-core-lint/builds/19/log
Pipeline: gate
Zuulcommit: 76606b66b006ac0e62087e6d00b1e4bdd56fff09
Checkedout: 232e34733fc68739ba96cccc31d3ff88f9484a23

We are lacking the git plugin verbose mode in production due to a bug. It is corrected with https://gerrit.wikimedia.org/r/58489 . That will help find out what the plugin is doing internally.

ZUUL_COMMIT=76cb37f0c69dcd69884fc6e66681e77c8045a08e

but it fetched origin/master instead :-(

The branch specifier in the git plugin is set to ZUUL_BRANCH which is 'master'.

In the git plugin (at git-plugin/src/main/java/hudson/plugins/git/util/DefaultBuildChooser.java ), the getCandidateRevisions() will recognize whether the branch looks like a sha1 (if it matches /[0-9a-f]{6,40}/) and in such a case will create a detached branch using that commit.

Seems the Jenkins job macro should then use ZUUL_COMMIT as a branch specifier.

Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc)

Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc)

I have updated mediawiki-core-whitespaces job to use ZUUL_COMMIT as a refspec specifier. The job is non voting so that is not going to do any harm.

The experimental change is https://gerrit.wikimedia.org/r/58865

(In reply to comment #13)

Related URL: https://gerrit.wikimedia.org/r/58865 (Gerrit Change
Iafebfffe480886fc8956e56517291b1b3b1fc0cc)

Why is this comment duplicated?

  • Bug 47208 has been marked as a duplicate of this bug. ***

https://gerrit.wikimedia.org/r/58865 (Gerrit Change Iafebfffe480886fc8956e56517291b1b3b1fc0cc) | change APPROVED and MERGED [by Hashar]

https://gerrit.wikimedia.org/r/#/c/58865/ has been deployed.

I am now manually updating the jobs which are not under JJB:

analytics-libanon
analytics-udp-filters
analytics-webstatscollector
analytics-wikistats
mwext-PoolCounter-pep8
mwext-VisualEditor-docgen
operations-debs-python-voluptuous-debbuild
parsoid-parse-tool-check
parsoid-roundtrip-test-check
parsoid-runTests
test-mediawiki-merge

Will monitor over the next few days. Lowering priority for now.

hashar@gallium:~$ ./checkbug46723.py mediawiki-core-phpunit-api --filter 2013-04-16*
Found 0 mismatches in 29 log files.
hashar@gallium:~$ ./checkbug46723.py mediawiki-core-phpunit-misc --filter 2013-04-16*
Found 0 mismatches in 29 log files.
$

Seems it got fixed :-] Will verify again during the week, but so far that looks good.

I have verified the jobs triggered over the past few days. Seems to work fine now :-) The root cause was using ZUUL_BRANCH as a branch specifier instead of ZUUL_COMMIT.

Change 117045 had a related patch set uploaded by Hashar:
Parsoid: uses ZUUL_COMMIT as a git refspec to build

https://gerrit.wikimedia.org/r/117045

Change 117045 merged by jenkins-bot:
Parsoid: uses ZUUL_COMMIT as a git refspec to build

https://gerrit.wikimedia.org/r/117045

hashar lowered the priority of this task from Unbreak Now! to Medium.Mar 3 2015, 10:26 AM
hashar raised the priority of this task from Medium to Unbreak Now!.