Page MenuHomePhabricator

Exec['migrate legacy files'] failing repeatedly when both /mnt/vagrant and /srv/vagrant are present
Closed, ResolvedPublic

Description

Author: physik

Description:
I get a problem on my labs-vagrant instances mlp and math-prview while running sudo puppet agent -tv
The error is
https://gist.github.com/physikerwelt/f3e85373d1452ce7709f


Version: unspecified
Severity: normal

Details

Reference
bz72234

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:48 AM
bzimport added a project: Labs-Vagrant.
bzimport set Reference to bz72234.

physik wrote:

after manually running rm -rf /srv/vagrant one puppet run can be executed without problems. However, another round causes the same problem again.

This is caused by a partial/failed migration of files from the legacy /mnt/vagrant install location to the new /srv/vagrant location. The instances are now in a partially migrated state which the puppet provisioning scripts are not equipped to correct. Renaming /mnt/vagrant will stop puppet from attempting further migrations.

If the instances are working as expected, the /mnt/vagrant directory and its contents can be removed entirely. If not, a manual migration from the old location to the new location may be attempted:

/bin/mkdir "/srv/vagrant" &&
(cd "/mnt/vagrant"; /bin/tar cf - .) |
(cd "/srv/vagrant"; /bin/tar xf -) &&
/bin/rm -rf "/mnt/vagrant"

This is the script that puppet itself is attempting to execute. The presence of /mnt/vagrant following a puppet run indicates that this command pipeline is failing for some reason. Without logging output from the failed run I can only speculate as to the cause of the failure.

physik wrote:

(In reply to Bryan Davis from comment #2)

This is caused by a partial/failed migration of files from the legacy
/mnt/vagrant install location to the new /srv/vagrant location. The
instances are now in a partially migrated state which the puppet
provisioning scripts are not equipped to correct. Renaming /mnt/vagrant will
stop puppet from attempting further migrations.

If the instances are working as expected, the /mnt/vagrant directory and its
contents can be removed entirely. If not, a manual migration from the old
location to the new location may be attempted:

/bin/mkdir "/srv/vagrant" &&
(cd "/mnt/vagrant"; /bin/tar cf - .) |
(cd "/srv/vagrant"; /bin/tar xf -) &&
/bin/rm -rf "/mnt/vagrant"

This is the script that puppet itself is attempting to execute. The presence
of /mnt/vagrant following a puppet run indicates that this command pipeline
is failing for some reason. Without logging output from the failed run I can
only speculate as to the cause of the failure.

Can you try to log into mlp.eqiad.wmflabs? There is nothing else installed on this brand new instance.

Investigation on mlp.eqiad.wmflabs shows:

$ mount|grep vd-second--local--disk
/dev/mapper/vd-second--local--disk on /mnt type ext4 (rw)
/dev/mapper/vd-second--local--disk on /srv type ext4 (rw)

So on this instance, the same disk is mounted at both /mnt and /srv. The instance seems to only have role::labs::vagrant applied via puppet. That role requires role::labs::lvm::srv which would provision and mount /srv. It is not immediately obvious to me what would have added /mnt to /etc/fstab.

The duplicate mount is the source of the error. Initially neither /srv/vagrant nor /mnt/vagrant exist. Puppet then provisions /srv/vagrant as a git clone of mediawiki/vagrant. On a subsequent puppet run, puppet checks for the existence of /mnt/vagrant and finds it because the /srv and /mnt locations both mount the same disk. This causes puppet to run the shell script that was meant to migrate older labs_vagrant installs from the primary disk to the lvm volume mounted on /srv. This script then fails because the /srv/vagrant target directory already exists.

I manually unmounted /mnt and removed the mount description for it from /etc/fstab. Then I forced a puppet run with debug level logging. /mnt was not remounted nor was it re-added to /etc/fstab.

Looking through the instance configuration history, I think the likely cause of this "interesting" configuration was that role::labs::lvm::mnt was manually applied to the instance https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:I-000006ac.eqiad.wmflabs&diff=131469&oldid=131468 and then subsequently removed https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:I-000006ac.eqiad.wmflabs&diff=next&oldid=131469. Since removing a role does not remove any configuration that was applied by the role, this left the /etc/fstab line to mount /dev/vd/second-local-disk. The later application of role::labs::vagrant https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:I-000006ac.eqiad.wmflabs&diff=next&oldid=131471 added a second /etc/fstab entry to mount the same /dev/vd/second-local-disk partition on /srv.