Page MenuHomePhabricator

/public/datasets/public missing on some instances
Closed, ResolvedPublic

Description

I noticed this one a few yesterday that got fixed but I seem to have found some more.

Execution nodes

  • tools-exec-02
  • tools-exec-03
  • tools-exec-04
  • tools-exec-05
  • tools-exec-06
  • tools-exec-09

It might also be worth someone checking the webservers
Also is there any way we can get ganglia to check this?


Version: unspecified
Severity: normal

Details

Reference
bz61803

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:04 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz61803.

I take the blame for fixing some last night, but due to some hosts being not conveniently accessible, I gave up at some point :-) (pdsh is awesome if all hosts accept your credentials).

autofs sucks big time. Just now, on tools-exec-02, I "service autofs reload" and later "service autofs stop && service autofs start", but it does neither mount /public/datasets nor does it give any log information why it failed.

So I've mounted manually:

sudo mkdir /public/datasets && sudo mount -t nfs -o nfsvers=3,ro labstore1.pmtpa.wmnet:/publicdata-project /public/datasets

on all the above hosts.

We should use Icinga for monitoring & alerts; Ganglia is more for performance data. I'll add some checks to my personal poor-mans-icinga script for now.

Thankfully, in eqiad we will get rid of autofs and use Puppet mounts instead. My € 0,02: We shouldn't wait that long, but use it in pmtpa as well.

I meant Icinga :)
Thanks for your work Tim!
*is looking forward to eqiad*