Page MenuHomePhabricator

debian-glue jobs broken: can't "create cowbuilder base"
Closed, ResolvedPublic

Description

I saw some IRC chatter about hashar still being in the process of rebuilding stuff in eqiad, but just so that this issue doesn't fall through the cracks: The job labs-toollabs-debian-glue fails with (cf. https://integration.wikimedia.org/ci/job/labs-toollabs-debian-glue/28/console):

[...]
18:33:35 * Creating cowbuilder base /var/cache/pbuilder/base-precise-amd64.cow for arch amd64 and distribution precise *
18:33:35 + sudo cowbuilder --create --basepath /var/cache/pbuilder/base-precise-amd64.cow --distribution precise --debootstrapopts --arch --debootstrapopts amd64 --debootstrapopts --variant=buildd --configfile=/tmp/tmp.xBNYHZb7uc --hookdir /usr/share/jenkins-debian-glue/pbuilder-hookdir/
18:33:36 mkdir: No such file or directory
18:33:36 + '[' 1 -eq 0 ']'
18:33:36 + bailout 1 'Error: Failed to create cowbuilder base /var/cache/pbuilder/base-precise-amd64.cow.'
18:33:36 + '[' -n 1 ']'
18:33:36 + EXIT=1
18:33:36 + '[' -n 'Error: Failed to create cowbuilder base /var/cache/pbuilder/base-precise-amd64.cow.' ']'
18:33:36 + echo 'Error: Failed to create cowbuilder base /var/cache/pbuilder/base-precise-amd64.cow.'
18:33:36 Error: Failed to create cowbuilder base /var/cache/pbuilder/base-precise-amd64.cow.
[...]

NB: The job itself (i. e. the test suite) isn't necessarily meant to succeed at the moment, but it would fail at a different point.


Version: unspecified
Severity: normal

Details

Reference
bz63232

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:54 AM
bzimport set Reference to bz63232.
bzimport added a subscriber: Unknown Object (MLST).

Forgot to mail about it but indeed the debian-glue jobs are broken. The reason is that I deleted the instance in pmtpa labs and the jobs are now running on eqiad instances.

The eqiad instances /var is a 2GB partition and cowbuilder does not have enough disk space to create its image under /var/cache/pbuilder/ .

I made a bunch of patches in operations/puppet.git to have the default cowbuilder images under /mnt/, regardless jenkins-debian-glue scripts hardcode /var/cache/pbuilder everywhere.

The real fix would be:

  • ensure /mnt/ is a mount
  • create a directory /mnt/pbuilder/
  • get /var/cache/pbuilder/ deleted and replace it with a symlink to /mnt/pbuilder/

Moaar puppet tricks needed. Sorry to have broken the job, will definitely attempt to get it fixed next week.

Looking at https://integration.wikimedia.org/ci/job/labs-toollabs-debian-glue/37/console, this seems to have been resolved for labs/toollabs. Okay to close this bug as FIXED?

Yup that got fixed sorry I forgot to ping you. The root cause is that jenkins-debian-glue runs cow builder which creates images under /var/cache/pbuilder. On eqiad instances that is a 2GB partition which is quickly filled up, hence cow builder bails out trying to create an image.

Instead we mount the remaining instance disk space somewhere under /mnt and have /var/cache/pbuilder to be a symbolic link. We are no more doing hardlinks but at least the images are build.