Page MenuHomePhabricator

Automatic test runs and merging broken
Closed, ResolvedPublic

Description

Looks like some part of CI/Jenkins/Zuul is stuck again. I've been trying to merge ULS patches since yesterday afternoon, I got one merged this morning and then it was stuck again.


Version: unspecified
Severity: critical

Details

Reference
bz49294

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:58 AM
bzimport set Reference to bz49294.

Tentatively assigning to hashar - feel free to reassign.

Niklas reminded me that Hashar is probably on vacation...

Greg, any idea who else to assign this to?

Looks like the Jenkins slave on deployment-bastion is not reachable. I will get a look at the Jenkins issue as soon as I reach my laptop, hopefully in roughly an hour.

Not much I can do right now, I have no clue about what is wrong.

The deployment-bastion slave had less than 1GB of disk space left. Jenkins thus unspoiled it and all jobs that were supposed to run on it were waiting for it to come back. I have lowered the dusk threshold to 300mb, that has bring the slave back up and dequeued pending jobs.

I retriggered some gerrit changes by either rebasing them or commenting 'recheck' but the events do not reach Zuul on gallium.

I still have no ssh access, if someone see this please ask ops or demon to restart the zuul service on gallium, then rebase (or use 'recheck') a change to see if it trigger a job.

A possibility is that gerrit no more sends events to zuul. That can be confirmed on gallium by tailling /var/log/zuul/debug.log on gallium. Whenever a comment is the added I' Gerrit the file should show a bunch of JSON. If nothing is received on zuul side, we might want to restart Gerrit as well.

(In reply to comment #4)

I still have no ssh access, if someone see this please ask ops or demon to
restart the zuul service on gallium, then rebase (or use 'recheck') a change
to
see if it trigger a job.

Done.

A possibility is that gerrit no more sends events to zuul. That can be
confirmed on gallium by tailling /var/log/zuul/debug.log on gallium.
Whenever a
comment is the added I' Gerrit the file should show a bunch of JSON. If
nothing
is received on zuul side, we might want to restart Gerrit as well.

Gerrit should be fine, haven't done any restarting or upgrading today.

Chad told me on IRC that the Gerrit replication was failing, that it turn filled the events queue and no more events were being sent over gerrit stream-events which is used by Zuul.

Thanks Chad!!

Happened again, see bug 49330.

Marking as dupe since it wasn't fixed. It is a bug between Gerrit-Zuul that just pops up a again ~ 12 hours after Gerrit is restarted. it was not fixed.

  • This bug has been marked as a duplicate of bug 49330 ***