Page MenuHomePhabricator

Current puppet does not allow to bring up a cluster in labs
Closed, DeclinedPublic

Description

When trying to bring up a namenode in labs, puppet fails with

Error: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist
Error: /Stage[main]/Cdh::Hadoop::Namenode/File[/var/lib/hadoop/name]/ensure: change from absent to directory failed: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist

With puppet at commit ebcbef50568960d424fcb95fc79ba3be945a905e,
everything is working, and setting up a cluster in labs works.

With 87bd718e678d290b80b0916d255f1bae8666e7d7 (i.e.: the child
following the above ebcdef commit) + cherry-picking
a38770013716dd39ee5df90380473b734e0cebbb on top [1], puppet fails to
set up namenode. Puppet runs fail with the above error message.

So it seems 87bd718e678d290b80b0916d255f1bae8666e7d7 is the culprit.
But as this commit is doing much reshuffling (~800 lines changed),
I'll leave it to CDH+puppet experts to dig deeper.

  • Steps to Reproduce
    • Add a new instance 'demo-master' (m1.small, ubuntu-12.04-precise)
    • Wait for the instance to come up.
    • Configure the instance by adding role role::analytics::hadoop::master and setting hadoop_namenodes to demo-master.eqiad.wmflabs
    • Wait for the next puppet run
  • Expected result Puppet passes without errors
  • Actual result Puppet fails with Error: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist Error: /Stage[main]/Cdh::Hadoop::Namenode/File[/var/lib/hadoop/name]/ensure: change from absent to directory failed: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist

[1] Plain 87bd718e678d290b80b0916d255f1bae8666e7d7 fails with

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate parameter 'mapreduce_output_compression' for on Class[Cdh::Hadoop] at /etc/puppet/manifests/role/analytics/hadoop.pp:201 on node qchris-master-87bd718.eqiad.wmflabs

which was fixed upstream in commit
a38770013716dd39ee5df90380473b734e0cebbb.


Version: unspecified
Severity: normal

Details

Reference
bz68161

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:40 AM
bzimport set Reference to bz68161.
bzimport added a subscriber: Unknown Object (MLST).

Btw. bringing up hadoop workers with current puppet also fails with
a (different) directory in /var/lib/hadoop not existing.

(Again, when using puppet at ebcbef50568960d424fcb95fc79ba3be945a905e
hadoop workers are brought up by puppet without issues.)