Page MenuHomePhabricator

sync of integration/slave-scripts has minions stuck pending
Closed, ResolvedPublic

Description

Whenever I deploy integration/slave-scripts from tin.eqiad.wmnet, the minions are stuck in pending state.

ssh tin.eqiad.wmnet
$ cd /srv/deployment/integration/slave-scripts
$ git deploy start
$ git pull
$ git deploy sync
...

  1. INFO : created tag 'integration/slave-scripts-20140205-131314' ...

Running: sudo salt-call -l quiet publish.runner deploy.fetch 'integration/slave-scripts'
Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314

2 minions pending (2 reporting)
Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry):

Details show:

Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): d

Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314

lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 3 mins, last-return: 2 mins])
lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 3 mins, last-return: 3 mins])

2 minions pending (2 reporting)

Looking at the minion lanthanum.eqiad.wmnet in /srv/deployment/integration/slave-scripts , the tag has been fetched:

lanthanum$ git tag |fgrep integration/slave-scripts-20131210-164114
integration/slave-scripts-20131210-164114
lanthanum$

I then continue:

Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): y
Running: sudo salt-call -l quiet publish.runner deploy.checkout 'integration/slave-scripts,False'

Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314

2 minions pending (2 reporting)
Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): d

Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314

lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 0 mins, last-return: 0 mins])
lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 0 mins, last-return: 0 mins])

2 minions pending (2 reporting)

Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry):

Looking on the minion, the tag has been checked out properly.

At that point I just validate (y).

I guess salt is broken somehow with lanthanum.eqiad.wmnet and not reporting back properly.


Version: wmf-deployment
Severity: normal

Details

Reference
bz60891

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:57 AM
bzimport added a project: Deployments.
bzimport set Reference to bz60891.
bzimport added a subscriber: Unknown Object (MLST).

Is this still an issue? Was this on the initial deployment or on a subsequent deployment?

The issue has been around since december apparently (not the tag 2013-12-10). I must have done something wrong at that time which cause the minions to be out of sync :-/

I am not sure how to fix it, but it still happens for sure. I end up looking at each minions during sync to make sure the tag is fetched/checked out properly.

Was duplicated as bug 63029. It is fixed indeed.