Page MenuHomePhabricator

GlusterFS readonly on integration project
Closed, ResolvedPublic

Description

On integration labs project, the GlusterFS volume for /home is screwed and is marked read-only:

  1. touch /home/jenkins-deploy/foobar touch: cannot touch `foobar': Read-only file system #

The mount point is:

projectstorage.pmtpa.wmnet:/integration-home on /home type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

/var/log/glusterfs/home.log shows a bunch of:

remote operation failed: Transport endpoint is not connected

On Sunday 9 Feb at 6:40 we had:

[2014-02-09 06:41:01.523691] W [socket.c:1512:__socket_proto_state_machine] 0-integration-home-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.0.0.41:24448)
[2014-02-09 06:41:01.541383] I [client.c:2090:client_rpc_notify] 0-integration-home-client-0: disconnected
[2014-02-09 06:41:13.483376] E [socket.c:1715:socket_connect_finish] 0-integration-home-client-0: connection to 10.0.0.41:24448 failed (Connection refused)
[2014-02-09 06:47:17.486672] I [glusterfsd.c:889:reincarnate] 0-glusterfsd: Fetching the volume file from server...

then we get message saying it lacks quorum:

0-integration-home-replicate-0: failing truncate due to lack of quorum


Version: unspecified
Severity: major

Details

Reference
bz61141

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:56 AM
bzimport set Reference to bz61141.
bzimport added a subscriber: Unknown Object (MLST).

Although this is an import issue, it is not that much of a priority since the only impact was jenkins-deploy user not being writable by Jenkins jobs.

I have mitigated that issue by moving jenkins-deploy homedir under /mnt (bug 61144) which also solves potential race condition with jobs on different instances attempting to write in a shared directory (/home/jenkins-deploy).

Fixed by the migration to EQIAD. We are now using NFS.