Page MenuHomePhabricator

scap not reporting rsync failures
Closed, ResolvedPublic

Description

During the 1.23wmf18 deploy and again today scap failed to update servers in row D of eqiad. Neither time did the scap UI report the failures. Running sync-common manually on one of the failing hosts reveals that it is correctly reporting the error:

bd808@mw1202:~$ sync-common mw1010.eqiad.wmnet mw1070.eqiad.wmnet
00:20:39 DEBUG - Copying to mw1202.eqiad.wmnet from mw1010.eqiad.wmnet
00:20:39 DEBUG - Started rsync common
@Error: access denied to common from mw1202.eqiad.wmnet (10.64.48.34)
rsync error: error starting client-server protocol (code 5) at main.c(1534) [Rec
eiver=3.0.9]
00:20:39 INFO - Finished rsync common (duration: 00m 00s)
00:20:39 DEBUG - Unhandled error:
Traceback (most recent call last):

File "/srv/scap/scap/cli.py", line 201, in run
  exit_status = app.main(extra_args)
File "/srv/scap/scap/main.py", line 70, in main
  tasks.sync_common(self.config, self.arguments.servers)
File "/srv/scap/scap/tasks.py", line 167, in sync_common
  subprocess.check_call(rsync)
File "/usr/lib/python2.7/subprocess.py", line 511, in check_call
  raise CalledProcessError(retcode, cmd)

CalledProcessError: Command '('sudo', '-u', 'mwdeploy', '/usr/bin/rsync', '-a',
'--delete-delay', '--delay-updates', '--compress', '--delete', '--exclude=/.sv
n/lock', '--exclude=
/.git/objects', '--exclude=/.git//objects', '--exclude

**/cache/l10n/*.cdb', '--no-perms', 'mw1010.eqiad.wmnet::common', '/usr/local/a

pache/common-local')' returned non-zero exit status 5
00:20:39 ERROR - sync-common failed: <CalledProcessError> Command '('sudo', '
-u', 'mwdeploy', '/usr/bin/rsync', '-a', '--delete-delay', '--delay-updates', '-
-compress', '--delete', '--exclude=/.svn/lock', '--exclude=/.git/objects', '
--exclude=/.git//objects', '--exclude=**/cache/l10n/*.cdb', '--no-perms', 'm
w1010.eqiad.wmnet::common', '/usr/local/apache/common-local')' returned non-zero
exit status 5


Version: wmf-deployment
Severity: critical
See Also:
https://rt.wikimedia.org/Ticket/Display.html?id=7080

Details

Reference
bz62862

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:05 AM
bzimport added a project: Deployments.
bzimport set Reference to bz62862.

Change 121571 had a related patch set uploaded by BryanDavis:
Return exit_status from Application._before_exit

https://gerrit.wikimedia.org/r/121571

Change 121571 merged by jenkins-bot:
Return exit_status from Application._before_exit

https://gerrit.wikimedia.org/r/121571