Hadoop meet Rex(How to construct hadoop cluster with rex)

Hadoop meet (R)?ex
- How to use Rexify for Hadoop cluster construct
Original Rex base image http://rexify.org
2013-08-26
Original Hadoop image http://hadoop.apahce.org

Mission
• I’m not S/W developer any more
• I’m not system engineer
• But, I had to construct hadoop
cluster
– Moreover, in various types...
http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate-
Why-Mock-Me-Homer-Magnet-SM130.html

Hadoop is
• The hadoop cluster is consist of many linux
boxes
• The hadoop has many configuration files and
parameters
• Besides hadoop, variety S/W of the hadoop eco
system should be installed.
• Except Hadoop & Hadoop eco, many types
S/W should be installed & configured
– Tomcat, apache, DBMS, other develop tools, other
utils/libs…
• And so on …

At first time,
• I have did it manually
– Install & Configure..
– Install & Configure
– ….
Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html

Tiresome !!
• It is really tedious & horrible job !!
Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/

Find to other way
• I decide to find other way!!
• I’ve started to survey for other solutions
Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html

Variety solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
http://www.cbsnews.com/8301-505125_162-31042083/duke-
research-monkeys-like-humans-want-variety/

Hadoop Managers
Hortonworks Management Center™
Clouder’s CDH™
* Apache Ambari

Provisioning Tools
Fabric(Python)

Parallel SSH Tools
http://dev.naver.com/projects/dist/
https://code.google.com/p/parallel-ssh/
http://sourceforge.net/projects/clusterssh/

Examination(1/3)
• Hadoop Managers
↑ Specialized in the
hadoop
↑ Aleardy confirmed
↑ Comportable
↓ Commercial or
restrict license
↓ No support other
App/libs, excluding
Java/Hadoop/Hadoop
Eco

Other solutions
• Hadoop Managers
http://www.bizbuilder.com/how-much-does-an-inexpensive-
franchise-cost/
 I have no money
 I want to use more extra resource
※Recently, there are many changes in license policy.
Please check it!!

Examination(2/3)
• Other provisioning tools
↑ Powerful
↑ Many features
↑ Detailed control
↑
↓ Complicatedness
↓ Need a lot of study

Other solutions
• Hadoop Managers
source :www.mbc.co.kr
 I don’t like to study

Examination(3/3)
• Other pararell ssh tools
↑ Simple
↑ Useful
↑ No need to install
extra agent
↓ There are some
insufficient features
↓ All exceptional cases
are should be
considered

Other solutions
• Hadoop Managers
http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm
 Yes, I’m a greedy

● Simple &
● Powerful &
● No cost &
● Expandable &
● Smart way???
http://plug.hani.co.kr/heihei9999/459415
So, What is?

http://rexify.org/
It is Rex!!

● uses just ssh
● no agent required
● seamless intergration
● no conflicts
● easy to use
● easy to extend
● easy to learn
● can use advanced perl’s
power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over-
and-let-holiday.html
Rex is

Rex options
[onycom@onydev: ~]$rex -h
(R)?ex - (Remote)? Execution
-b Run batch
-e Run the given code fragment
-E Execute task on the given environment
-H Execute task on these hosts
-G Execute task on these group
-u Username for the ssh connection
-p Password for the ssh connection
-P Private Keyfile for the ssh connection
-K Public Keyfile for the ssh connection
-T List all known tasks.
-Tv List all known tasks with all information.
-f Use this file instead of Rexfile
-h Display this help
-M Load Module instead of Rexfile
-v Display (R)?ex Version
-F Force. Don't regard lock file
-s Use sudo for every command
-S Password for sudo
-d Debug
-dd More Debug (includes Profiling Output)
-o Output Format
-c Turn cache ON
-C Turn cache OFF
-q Quiet mode. No Logging output
-Q Really quiet. Output nothing.
-t Number of threads to use

Basic Gramma - Authentication
From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3

Basic Gramma - Server Group

Basic Gramma - Task

Lets get down to the main
subject!

This presentaion is
● How to easy install & configure Hadoop
– Not “How to optimize & performance tunning”
● To easy understanding,
– exceptional cases are excluded
● No explain to OS installation
– no discuss about “PXE /kicstart”
● Reduced environment conditions
– ex) security, network, other servers/Apps, …
● I’ll not talk about perl language as possible
– It is no needed
● TMTOWTDI
– Even if it’s not refined, I’ll show variety way as possible

Network
vmaster
(Name node/
Job Tracker)
L2 switch
Onydev
(Provision Server)
vnode0
(Data node)
vnode1
(Data node)
vnode2
(Data node)
vmonitor
(Monitoring Server)
Topology
[spec]
 Machine : 6 ea
(hadoop has just 4 ea)
 OS : CentOS 6.4 64bit
 Memory : 32GB(NN)
16GB(DN)
 CPU : 4 core(i7, 3.5GHz)
 Interface : 1G Ethernet
 Disk : 250G SDD
1T HDD
※ I’ve configured NN and JT on
the same machine

Our hadoop Env. is
● There is one control account
– ‘hadoop-user’
● hadoop & hadoop eco is installed in
‘hadoop-user’ account

Prepare – All machines
● On the each machine,
– same OS version would be installed
(at least, hadoop cluster )
– has own fixed IP address
– can be connect with SSH
– has one more normal user account & it’s sudoe
rs edit work
(just optional)

Prepare – Provision Server(1/2)
● Develop tools & envrionment
– ex: gcc, glib, make/cmake, perl, etc...
● Install Perl modules
– yum install perl-ExtUtil*
– yum install perl-CPAN*
– excute ‘cpan’ command

Prepare – Provision Server(2/2)
● After execute ‘cpan’ command
– cpan 3> install Rex
– You may get fail!!
– This all story is based on the CentOS 6.XX
● So, I recommend ‘perl brew’
– If you want to use more perl power
※In my guess, redhat may dislike perl language

To Install Rex (1/3)
adduser brew-user
passwd brew-user
curl -L http://install.perlbrew.pl | bash
cd /home
chmod 755 brew-user
cd ~brew-user
chmod -R 755 ./perl5
echo "export PERLBREW_ROOT="/home/brew-user/perl5/perlbrew"" >> /home/brew-user/.bashrc
##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc
source ~brew-user/.bashrc

## In the brew-user account,
perlbrew init
perlbrew available
### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11)
perlbrew install perl-5.18.0
perlbrew switch perl-5.18.0
[brew-user@onydev: ~]$perlbrew switch perl-5.18.0
Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34.
.........
A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.

● cpanm Rex
● cpan
● http://rexify.org/get/

Test for Rex
[onycom@onydev: ~]$which rex
/home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex
[onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'"
[2013-10-08 15:36:06] INFO - Running task eval-line on localhost
[2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom)
[2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate.
[2013-10-08 15:36:07] INFO - Successfully authenticated on localhost.
onydev
[onycom@onydev: ~]$
● Rexfile
● plain text file

/etc/hosts - Provision Server
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
... skip .................
192.168.2.100 onydev
... skip .................
192.168.2.51 vmaster
192.168.2.52 vnode0
192.168.2.53 vnode1
192.168.2.54 vnode2
192.168.2.59 vmonitor
~

SSH connection
● Between
 Provision server and
other target servers
 Hadoop master node
and data nodes

[onycom@onydev: ~]$ ssh-keygen –t rsa
Enter file in which to save the key (/home/onycom/.ssh/id_rsa):
Created directory '/home/onycom/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/onycom/.ssh/id_rsa.
Your public key has been saved in /home/tasha/.ssh/id_rsa.pub.
Prepare SSH public key

Create User
use Rex::Commands::User;
group "hadoop_node" => "vmaster", "vnode[0..2]" ;
group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor";
my $USER = “hadoop-user”;
desc "Create user";
task "new_user", group => “all_vm_node”, sub {
create_user “$USER",
home => "/home/$USER",
comment=>"Account for _hadoop",
password => "blabla",
};
onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user

Setup SSH for user
desc "setup ssh for user";
task "setup_ssh_user", group => “all_vm_node”, sub {
run "mkdir /home/$USER/.ssh";
file "/home/$USER/.ssh/authorized_keys",
source => "/home/onycom/.ssh/id_rsa.pub",
owner => "$USER",
group => "$USER",
mode => 644;
run "chmod 700 /home/$USER/.ssh";
};
onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user
※ Ok!! Done.
Now you can login to each servers without password
Then, do same thing for hadoop NN/DN nodes.

Install packages
parallelism 4;
desc "Install packages for java";
task "install_java", group => “all_vm_node”, sub {
install package => “java-1.6.*";
};
onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java
• Some packages are should be installed globaly(ex: java, wget, etc)
• For the hadoop 1.1.x, java 1.6 is recommanded.
• use parallelism keyword (if long time is required)

Install hadoop(1/3)
user "hadoop-user";
private_key "/home/onycom/.ssh/id_rsa";
public_key "/home/onycom/.ssh/id_rsa.pub";
group "hadoop_node" => "vmaster", "vnode[0..2]" ;
group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor";
desc "prepare_dir";
task "prepare_dir", group=>"hadoop_node", sub {
run "mkdir Work";
run "mkdir Download";
run "mkdir src“;
run “mkdir tmp”;
};
hd1.Rexfile
onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir

Install hadoop(2/3)
desc "hadoop 1.1.2 download with wget";
task "get_hadoop", group=>"hadoop_node", sub {
my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop-
1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src";
say $f;
};
...skip....
desc "pig 0.11.1 download with wget";
task "get_pig", group=>"hadoop_node", sub {
my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz",
cwd=>"/home/hadoop-user/src";
say $f;
};
! hadoop ver. & hadoop eco s/w ver. should be matched
This topic is get off the subject on this presentation

Install hadoop(3/3)
my $HADOOP_SRC_DIR = "/home/hadoop-user/src";
desc "unzip hadoop source files";
task "unzip_src",group=>"hadoop_node", sub {
run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR";
run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR";
run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR";
};
desc "make link for hadoop source files";
task "link_src", group=>"hadoop_node", sub {
run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR;
run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR;
run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR;
};

Configuration files(1/3)
● System
– /etc/hosts
● Hadoop(../hadoop/conf)
– masters & slave
– hadoop-env.sh
– hdfs-site.xml
– core-site.xml
– mapred-site.xml

● Hadoop eco systems & other tools
– ex) Ganglia
– ex) Flume – agent/collector/master
– ex) Oozie or flamingo
– Skip these on this PPT.
● User rc file
 These are just default & no consider optimization

Provision
Server
Hadoop
NN
Hadoop
DN 1
Hadoop
DN n
Hadoop configuration files
(../hadoop_conf_repo)
SSH/SCP
(R)ex
※ Of course, this is just my policy

Edit hosts file
my $target_file = “/etc/hosts”;
my $host_list =‘<<END’
192.168.2.51 vmaster
192.168.2.52 vnode0
192.168.2.53 vnode1
192.168.2.54 vnode2
192.168.2.59 vmonitor
END
desc "Add hosts";
task "add_host", group => “all_vm_node", sub {
my $exist_cnt = cat $target_file;
my $fh = file_write $target_file;
$fh->write( $exist_cnt );
$fh->write($host_list);
$fh->close;
};
※ You can consider ‘Augeas tool’ to handle system files.
Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’

Setup .bashrc for user(1/2)
... skip .....
my $hadoop_rc=<<'END';
#Hadoop Configuration
export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64"
export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext"
export HADOOP_USER="/home/hadoop-user"
export HADOOP_SRC="$HADOOP_USER/src"
export HADOOP_HOME="$HADOOP_USER/hadoop"
export PIG_HOME="$HADOOP_SRC/pig"
export HIVE_HOME="$HADOOP_SRC/hive"
END
... skip .....

Setup .bashrc for user(2/2)
desc "setup hadoop-user's .rc file";
task "setup_rc_def", group=>"hadoop_node", sub {
my $fh = file_append ".bashrc";
$fh->write($base_rc);
$fh->write($hadoop_rc);
$fh->close();
};
desc "setup hadoop master node .rc file";
task "setup_rc_master", "vmaster", sub {
my $fh = file_append ".bashrc";
$fh->write($master_rc);
$fh->close();
};
.......... skip ............

Configure Hadoop(1/6)
● ‘masters’
[hadoop-user@vmaster: ~]$cd hadoop/conf
[hadoop-user@vmaster: conf]$cat masters
vmaster
● ‘slaves’
[hadoop-user@vmaster: conf]$cat slaves
vnode0
vnode1
vnode2

• hadoop-env.sh
... skip ...
The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
#hadoop-user
#Remove warring message for "HADOOP_HOME" is deprecated
export HADOOP_HOME_WARN_SUPPRESS=TRUE

• hdfs-site.xml
... skip ...
<configuration>

<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop-user/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop-user/hdfs/data</value>
</property>
</configuration>
※ This ‘replication’ value is depend on our env.

• core-site.xml
... skip ...
<configuration>

<property>
<name>fs.default.name</name>
<value>hdfs://vmaster:9000</value>
</property>
</configuration>

• mapred-site.xml
.. skip ..
<property>
<name>mapred.job.tracker</name>
<value>vmaster:9001</value>
</property>

<property>
<name>mapred.task.timeout</name>
<value>1800000</value>
<description>The number of milliseconds before
a task will be terminated if it neither reads an input, writes
an output, nor updates its status string.
</description>
</property>
※ This ‘timeout’ value is just depend on our env.

my $CNF_REPO="hadoop_conf_repo";
... skip ...
my $MAPRED="mapred-site.xml";
task "upload_mapred", group=>"hadoop_node", sub {
file "$HD_CNF/$MAPRED",
owner => $HADOOP_USER,
group => $HADOOP_USER,
source => "$CNF_REPO/$MAPRED";
};
my $CORE_SITE="core-site.xml";
task "upload_core", group=>"hadoop_node", sub {
file "$HD_CNF/$CORE_SITE",
owner => $HADOOP_USER,
group => $HADOOP_USER,
source => "$CNF_REPO/$CORE_SITE";
};
... skip ....

Before going any further
● Stop selinux
– If it is enforcing
● modify policy of iptables
– I recommend to stop it while configure working

Lets start hadoop
● login to master node with hadoop-user
– ssh –X hadoop-user@vmaster
● hadoop namenode format
– hadoop namenode format
● execute start script
– ex) start-all.sh

Check hadoop status
[hadoop-user@vmaster: ~]$jps -l
22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
22260 org.apache.hadoop.mapred.JobTracker
21968 org.apache.hadoop.hdfs.server.namenode.NameNode
27896 sun.tools.jps.Jps
[hadoop-user@vmaster: ~]$hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp
※ It seems to be OK. Really?

But, life is not easy
http://www.trulygraphics.com/tg/weekend/

Check status for all DNs
task "show_jps", "vnode[0..2]", sub {
say run "hostname";
my $r = run "jps";
say $r;
};
[onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps
vnode0
12682 Jps
12042 TaskTracker
11934 DataNode
vnode1
11669 DataNode
11778 TaskTracker
12438 Jps
vnode2
11128 DataNode
11237 TaskTracker
11895 Jps

If there is some problem,
http://blog.lib.umn.edu/isss/undergraduate/2011/11/y
ou-do-have-any-tech-problem.html
● Check again
– /etc/hosts
– selinux & iptables
– name & data dir./permission in hdfs
– and so on...
(on the each node)

If you did not meet any
problems or fixed those,

Now you have hadoop
https://hadoopworld2011.eventbrite.com/
Automatic MGM/Prov. solution
yonhap
&

What more can we do?(1/2)
● add/remove data node
● add/remove storage
● Intergrate with monitoring
– ex: Ganglia/Nagios
● Intergrate with other hadoop eco
– Flume, flamingo, Oozie
● Intergrate other device or server
– ex: Switch, DB server

What more can we do?(2/2)
● sophisticated hadoop paramer control
– ex: use XML parsing
● workflow control & batch
● backup
● periodic file system management
– ex: log files
● web GUI
● make a framework for your purpose

Ref.
• http://hadoop.apache.org/
• http://pig.apache.org/
• http://hive.apache.org/
• http://confluence.openflamingo.org
• http://www.openankus.org
• http://www.rexify.org
• https://groups.google.com/forum/#!forum/re
x-users
• http://modules.rexify.org/search?q=hadoop

http://www.projects2crowdfund.com/what-can-i-do-with-
crowdfunding/

Thanksjunkim@onycom.com
/
rainmk6@gmail.com

Hadoop meet Rex(How to construct hadoop cluster with rex)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop meet Rex(How to construct hadoop cluster with rex)

Similar to Hadoop meet Rex(How to construct hadoop cluster with rex) (20)

Recently uploaded

Recently uploaded (20)

Hadoop meet Rex(How to construct hadoop cluster with rex)