Page MenuHomePhabricator

deployment-salt can't talk to itself, git deploy hangs
Closed, ResolvedPublic

Description

cscott-free: git deploy sync is hanging (no console output at all) on beta.
bd808: lame. I wonder if salt is sad in beta
cscott-free: how can i tell?
bd808: "[WARNING ] Master hostname: salt not found. Retrying in 30 second"
bd808: cscott: I ran sudo salt-call saltutil.sync_all on deployment-bastion and got that error
bd808: Looks like salt is borked in beta
bd808: "This master address: 'salt' was previously resolvable but now fails to resolve! The previously resolved ip addr will continue to be used"
cscott-free: might be a nice entry for https://wikitech.wikimedia.org/wiki/Trebuchet once we figure out the problem
matanya: bd808: it is actully : Error: /Stage[main]/Role::Salt::Minions/Salt::Grain[instanceproject]/Exec[ensure_instanceproject_deployment-prep]/unless: Check "/usr/local/s
matanya: bin/grain-ensure contains instanceproject deployment-prep" exceeded timeout
bd808: I get "Master hostname: salt not found. Retrying in 30 seconds" error on the salt master too :(
matanya: bd808: the root casue is: Warning: Unable to fetch my node definition, but the agent run will continue:
matanya: Warning: Connection refused - connect(2)
matanya: the host can't connect to puppet master
bd808: matanya: agreed
matanya: due to ferm rule changes i suspect
***bd808 shakes fist at ferm again
bd808: matanya: iptables -L is empty on deployment-salt and it can't talk to itself either
matanya: i rest my case :)


Version: unspecified
Severity: normal

Details

Reference
bz70868

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:45 AM
bzimport set Reference to bz70868.
bzimport added a subscriber: Unknown Object (MLST).

On deployment-salt:

$ salt '*' cmd.run hostname
i-00000396.eqiad.wmflabs:
    deployment-pdf01
i-00000504.eqiad.wmflabs:
    deployment-mathoid
i-00000388.eqiad.wmflabs:
    deployment-stream

So only those 3 hosts are connected to the salt master.

I tried restarting the salt-master process on deployment-salt and the salt-minion process on deployment-bastion and this didn't seem to help anything.

2 prerequisites before booting salt-minion:

all are now salty except:

  • deployment-saio
  • deployment-parsoidcache02
  • deployment-soa-cache01

all are now salty except:

  • deployment-saio
  • deployment-parsoidcache02
  • deployment-soa-cache01

Those instances haven't been migrated to the beta cluster puppet/salt masters.