Page MenuHomePhabricator

Race condition in Parsoid test suite
Closed, ResolvedPublic

Description

This one seems to happen consistently whenever a patch is +2ed. So, we suspect this is some misconfiguration or bad checkout or missing submodule update or something else.

https://integration.wikimedia.org/ci/job/parsoidsvc-parsertests-run-harder/622/console
https://integration.wikimedia.org/ci/job/parsoidsvc-parsertests-run-harder/601/console


Version: unspecified
Severity: normal

Details

Reference
bz61351

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:52 AM
bzimport added a project: Parsoid-Tests.
bzimport set Reference to bz61351.

Both jobs above have been triggered by changes made against the deploy repository:

https://integration.wikimedia.org/ci/job/parsoidsvc-parsertests-run-harder/601/console
https://gerrit.wikimedia.org/r/#/c/112945/

The job is triggered by changes made against either parsoid or parsoid/deploy repository. It fetches both repositories master branch and apply the patch that triggered the job. The src directory of the deploy repository is NOT used.

If one look at the full console for the job, a summary of actions is:

Refreshing mediawiki/services/parsoid
Your branch is behind 'origin/master' by 12 commits, and can be fast-forwarded.
+ git reset --hard remotes/origin/master
HEAD is now at 028e9de Merge "Bug 61243: Catch errors sent by the Parsoid server in roundtrip-test.js"

Refreshing mediawiki/services/parsoid/deploy
Applying patch refs/zuul/master/Z84552b78386c434d99b94f881fbe96dc

+ git reset --hard FETCH_HEAD
HEAD is now at 77f4aaf... Bump src to 96c127472 for deploy
Refreshing mediawiki/services/parsoid/deploy submodules...
Submodule 'src' () registered for path 'src'
From https://gerrit.wikimedia.org/r/p/mediawiki/services/parsoid

4b751fd..028e9de  master     -> origin/master

Submodule path 'src': checked out '96c1274722bce3b3326954b60cc65ec84c389cea'

You end up with the following hierarchy:

/parsoid @ whatever master branch was at, i.e. 028e9de
/deploy @ 028e9de (which is https://gerrit.wikimedia.org/r/#/c/112945/ patch applied on top of master)
/deploy/src Parsoid code @ 96c12747 (since that is what the submodule configuration points to).

The tests are run using:

NODE_PATH=/deploy/node_modules
node parsoid/tests/parserTests...

Which mean the Parsoid code being used is the latest from Parsoid when we should test the one mentioned as a submodule :-( That probably cause the issues mentioned.

I have to write down somewhere the different use cases and find out what code we actually want to exercice.

Change 113779 had a related patch set uploaded by Hashar:
Rework parsoidsvc jobs

https://gerrit.wikimedia.org/r/113779

Change 113781 had a related patch set uploaded by Hashar:
Rework parsoidsvc jobs (non voting)

https://gerrit.wikimedia.org/r/113781

Change 113781 merged by jenkins-bot:
Rework parsoidsvc jobs (non voting)

https://gerrit.wikimedia.org/r/113781

Change 113782 had a related patch set uploaded by Hashar:
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/113782

Change 113783 had a related patch set uploaded by Hashar:
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/113783

Change 113779 merged by jenkins-bot:
Rework parsoidsvc jobs

https://gerrit.wikimedia.org/r/113779

I found out a rather nasty race condition which occurs when running the job "parsertests-run-harder". That basically run the command:

node parserTests --wt2html --wt2wt --html2wt --html2html \
  --selser --no-color --quiet --blacklist

I have triggered three occurrences of the same patch to have them run concurrently and the two last build failed.

The job is:

http://integration.wikimedia.org/ci/job/parsoidsvc-deploy-parsertests-run-harder/

The three builds are #3 #4 and #5. The first passed, the last two failed:

http://integration.wikimedia.org/ci/job/parsoidsvc-deploy-parsertests-run-harder/3/console
http://integration.wikimedia.org/ci/job/parsoidsvc-deploy-parsertests-run-harder/4/console
http://integration.wikimedia.org/ci/job/parsoidsvc-deploy-parsertests-run-harder/5/console

Related bug is: https://bugzilla.wikimedia.org/61351

Would it make sense to split each action (selser, wt2wt..) in separated jobs?
I have zero clue what could be the race condition, any pointer would be nice.

Subbu wrote:

Theory confirmed. I ran two copies of parserTests in two terminals concurrently. The run that started later had additional failures/successes and they are all image tests which confirms the theory that this is the mock server that is causing this. Will either fix or file a bug report for this later today.

[subbu@earth tests] grep UNEXP /tmp/r1
UNEXPECTED FAIL: Image: New block level image should have \n before and after (existing content) (wt2html)
UNEXPECTED FAIL: Image: New block level image should have \n before and after (existing content) (html2html)
UNEXPECTED FAIL: Images: upright option (parsoid) (wt2html)
UNEXPECTED FAIL: Images: upright option is ignored on inline and frame images (parsoid) (wt2html)
UNEXPECTED PASS:Images: upright option is ignored on inline and frame images (parsoid) (wt2wt)
UNEXPECTED FAIL: Images: upright option is ignored on inline and frame images (parsoid) (html2html)

Moving bug to Parsoid, there is a race condition in the test suite. That follow up a mail conversation we had with Subbu a few weeks ago.

Change 113782 abandoned by Hashar:
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/113782

Change 113783 abandoned by Hashar:
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/113783

(In reply to ssastry from comment #11)

This has been fixed in https://gerrit.wikimedia.org/r/#/c/113989/

Awesome. Thank you guys!

Change 113782 restored by Hashar:
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/113782