SlideShare a Scribd company logo
1 of 73
Download to read offline
Hadoop meet (R)?ex
- How to use Rexify for Hadoop cluster construct
Original Rex base image http://rexify.org
2013-08-26
Original Hadoop image http://hadoop.apahce.org
Background
Mission
• I’m not S/W developer any more
• I’m not system engineer
• But, I had to construct hadoop
cluster
– Moreover, in various types...
http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate-
Why-Mock-Me-Homer-Magnet-SM130.html
Hadoop is
• The hadoop cluster is consist of many linux
boxes
• The hadoop has many configuration files and
parameters
• Besides hadoop, variety S/W of the hadoop eco
system should be installed.
• Except Hadoop & Hadoop eco, many types
S/W should be installed & configured
– Tomcat, apache, DBMS, other develop tools, other
utils/libs…
• And so on …
At first time,
• I have did it manually
– Install & Configure..
– Install & Configure
– Install & Configure
– Install & Configure
– ….
Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html
Tiresome !!
• It is really tedious & horrible job !!
Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/
Find to other way
• I decide to find other way!!
• I’ve started to survey for other solutions
Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html
Survey
Variety solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
http://www.cbsnews.com/8301-505125_162-31042083/duke-
research-monkeys-like-humans-want-variety/
Hadoop Managers
Hortonworks Management Center™
Clouder’s CDH™
* Apache Ambari
Provisioning Tools
Fabric(Python)
Parallel SSH Tools
http://dev.naver.com/projects/dist/
https://code.google.com/p/parallel-ssh/
http://sourceforge.net/projects/clusterssh/
Examination(1/3)
• Hadoop Managers
↑ Specialized in the
hadoop
↑ Aleardy confirmed
↑ Comportable
↓ Commercial or
restrict license
↓ No support other
App/libs, excluding
Java/Hadoop/Hadoop
Eco
Other solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
http://www.bizbuilder.com/how-much-does-an-inexpensive-
franchise-cost/
 I have no money
 I want to use more extra resource
※Recently, there are many changes in license policy.
Please check it!!
Examination(2/3)
• Other provisioning tools
↑ Powerful
↑ Many features
↑ Detailed control
↑
↓ Complicatedness
↓ Need a lot of study
Other solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
source :www.mbc.co.kr
 I don’t like to study
Examination(3/3)
• Other pararell ssh tools
↑ Simple
↑ Useful
↑ No need to install
extra agent
↓ There are some
insufficient features
↓ All exceptional cases
are should be
considered
Other solutions
• Hadoop Managers
• Provisioning Tools
• Parallas SSH Tools
http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm
 Yes, I’m a greedy
● Simple &
● Powerful &
● No cost &
● Expandable &
● Smart way???
http://plug.hani.co.kr/heihei9999/459415
So, What is?
I have found solution
http://rexify.org/
It is Rex!!
● uses just ssh
● no agent required
● seamless intergration
● no conflicts
● easy to use
● easy to extend
● easy to learn
● can use advanced perl’s
power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over-
and-let-holiday.html
Rex is
Rex options
[onycom@onydev: ~]$rex -h
(R)?ex - (Remote)? Execution
-b Run batch
-e Run the given code fragment
-E Execute task on the given environment
-H Execute task on these hosts
-G Execute task on these group
-u Username for the ssh connection
-p Password for the ssh connection
-P Private Keyfile for the ssh connection
-K Public Keyfile for the ssh connection
-T List all known tasks.
-Tv List all known tasks with all information.
-f Use this file instead of Rexfile
-h Display this help
-M Load Module instead of Rexfile
-v Display (R)?ex Version
-F Force. Don't regard lock file
-s Use sudo for every command
-S Password for sudo
-d Debug
-dd More Debug (includes Profiling Output)
-o Output Format
-c Turn cache ON
-C Turn cache OFF
-q Quiet mode. No Logging output
-Q Really quiet. Output nothing.
-t Number of threads to use
Basic Gramma - Authentication
From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
Basic Gramma - Server Group
From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
Basic Gramma - Task
From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
Lets get down to the main
subject!
Construct hadoop with
(R)?ex
This presentaion is
● How to easy install & configure Hadoop
– Not “How to optimize & performance tunning”
● To easy understanding,
– exceptional cases are excluded
● No explain to OS installation
– no discuss about “PXE /kicstart”
● Reduced environment conditions
– ex) security, network, other servers/Apps, …
● I’ll not talk about perl language as possible
– It is no needed
● TMTOWTDI
– Even if it’s not refined, I’ll show variety way as possible
Network
vmaster
(Name node/
Job Tracker)
L2 switch
Onydev
(Provision Server)
vnode0
(Data node)
vnode1
(Data node)
vnode2
(Data node)
vmonitor
(Monitoring Server)
Topology
[spec]
 Machine : 6 ea
(hadoop has just 4 ea)
 OS : CentOS 6.4 64bit
 Memory : 32GB(NN)
16GB(DN)
 CPU : 4 core(i7, 3.5GHz)
 Interface : 1G Ethernet
 Disk : 250G SDD
1T HDD
※ I’ve configured NN and JT on
the same machine
Our hadoop Env. is
● There is one control account
– ‘hadoop-user’
● hadoop & hadoop eco is installed in
‘hadoop-user’ account
Prepare – All machines
● On the each machine,
– same OS version would be installed
(at least, hadoop cluster )
– has own fixed IP address
– can be connect with SSH
– has one more normal user account & it’s sudoe
rs edit work
(just optional)
Prepare – Provision Server(1/2)
● Develop tools & envrionment
– ex: gcc, glib, make/cmake, perl, etc...
● Install Perl modules
– yum install perl-ExtUtil*
– yum install perl-CPAN*
– excute ‘cpan’ command
Prepare – Provision Server(2/2)
● After execute ‘cpan’ command
– cpan 3> install Rex
– You may get fail!!
– This all story is based on the CentOS 6.XX
● So, I recommend ‘perl brew’
– If you want to use more perl power
※In my guess, redhat may dislike perl language
To Install Rex (1/3)
adduser brew-user
passwd brew-user
curl -L http://install.perlbrew.pl | bash
cd /home
chmod 755 brew-user
cd ~brew-user
chmod -R 755 ./perl5
echo "export PERLBREW_ROOT="/home/brew-user/perl5/perlbrew"" >> /home/brew-user/.bashrc
##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc
source ~brew-user/.bashrc
To Install Rex (2/3)
## In the brew-user account,
perlbrew init
perlbrew available
### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11)
perlbrew install perl-5.18.0
perlbrew switch perl-5.18.0
[brew-user@onydev: ~]$perlbrew switch perl-5.18.0
Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34.
.........
A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.
● cpanm Rex
● cpan
● http://rexify.org/get/
To Install Rex (3/3)
Test for Rex
[onycom@onydev: ~]$which rex
/home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex
[onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'"
[2013-10-08 15:36:06] INFO - Running task eval-line on localhost
[2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom)
[2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate.
[2013-10-08 15:36:07] INFO - Successfully authenticated on localhost.
onydev
[onycom@onydev: ~]$
● Rexfile
● plain text file
/etc/hosts - Provision Server
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
... skip .................
192.168.2.100 onydev
... skip .................
192.168.2.51 vmaster
192.168.2.52 vnode0
192.168.2.53 vnode1
192.168.2.54 vnode2
192.168.2.59 vmonitor
~
SSH connection
● Between
 Provision server and
other target servers
 Hadoop master node
and data nodes
[onycom@onydev: ~]$ ssh-keygen –t rsa
Enter file in which to save the key (/home/onycom/.ssh/id_rsa):
Created directory '/home/onycom/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/onycom/.ssh/id_rsa.
Your public key has been saved in /home/tasha/.ssh/id_rsa.pub.
Prepare SSH public key
Create User
use Rex::Commands::User;
group "hadoop_node" => "vmaster", "vnode[0..2]" ;
group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor";
my $USER = “hadoop-user”;
desc "Create user";
task "new_user", group => “all_vm_node”, sub {
create_user “$USER",
home => "/home/$USER",
comment=>"Account for _hadoop",
password => "blabla",
};
onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user
Setup SSH for user
desc "setup ssh for user";
task "setup_ssh_user", group => “all_vm_node”, sub {
run "mkdir /home/$USER/.ssh";
file "/home/$USER/.ssh/authorized_keys",
source => "/home/onycom/.ssh/id_rsa.pub",
owner => "$USER",
group => "$USER",
mode => 644;
run "chmod 700 /home/$USER/.ssh";
};
onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user
※ Ok!! Done.
Now you can login to each servers without password
Then, do same thing for hadoop NN/DN nodes.
Install packages
parallelism 4;
desc "Install packages for java";
task "install_java", group => “all_vm_node”, sub {
install package => “java-1.6.*";
};
onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java
• Some packages are should be installed globaly(ex: java, wget, etc)
• For the hadoop 1.1.x, java 1.6 is recommanded.
• use parallelism keyword (if long time is required)
Install hadoop(1/3)
user "hadoop-user";
private_key "/home/onycom/.ssh/id_rsa";
public_key "/home/onycom/.ssh/id_rsa.pub";
group "hadoop_node" => "vmaster", "vnode[0..2]" ;
group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor";
desc "prepare_dir";
task "prepare_dir", group=>"hadoop_node", sub {
run "mkdir Work";
run "mkdir Download";
run "mkdir src“;
run “mkdir tmp”;
};
hd1.Rexfile
onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir
Install hadoop(2/3)
desc "hadoop 1.1.2 download with wget";
task "get_hadoop", group=>"hadoop_node", sub {
my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop-
1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src";
say $f;
};
...skip....
desc "pig 0.11.1 download with wget";
task "get_pig", group=>"hadoop_node", sub {
my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz",
cwd=>"/home/hadoop-user/src";
say $f;
};
! hadoop ver. & hadoop eco s/w ver. should be matched
This topic is get off the subject on this presentation
Install hadoop(3/3)
my $HADOOP_SRC_DIR = "/home/hadoop-user/src";
desc "unzip hadoop source files";
task "unzip_src",group=>"hadoop_node", sub {
run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR";
run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR";
run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR";
};
desc "make link for hadoop source files";
task "link_src", group=>"hadoop_node", sub {
run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR;
run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR;
run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR;
};
Configuration files(1/3)
● System
– /etc/hosts
● Hadoop(../hadoop/conf)
– masters & slave
– hadoop-env.sh
– hdfs-site.xml
– core-site.xml
– mapred-site.xml
Configuration files(2/3)
● Hadoop eco systems & other tools
– ex) Ganglia
– ex) Flume – agent/collector/master
– ex) Oozie or flamingo
– Skip these on this PPT.
● User rc file
 These are just default & no consider optimization
Configuration files(3/3)
Provision
Server
Hadoop
NN
Hadoop
DN 1
Hadoop
DN n
Hadoop configuration files
(../hadoop_conf_repo)
SSH/SCP
(R)ex
※ Of course, this is just my policy
Edit hosts file
my $target_file = “/etc/hosts”;
my $host_list =‘<<END’
192.168.2.51 vmaster
192.168.2.52 vnode0
192.168.2.53 vnode1
192.168.2.54 vnode2
192.168.2.59 vmonitor
END
desc "Add hosts";
task "add_host", group => “all_vm_node", sub {
my $exist_cnt = cat $target_file;
my $fh = file_write $target_file;
$fh->write( $exist_cnt );
$fh->write($host_list);
$fh->close;
};
※ You can consider ‘Augeas tool’ to handle system files.
Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’
Setup .bashrc for user(1/2)
... skip .....
my $hadoop_rc=<<'END';
#Hadoop Configuration
export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64"
export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext"
export HADOOP_USER="/home/hadoop-user"
export HADOOP_SRC="$HADOOP_USER/src"
export HADOOP_HOME="$HADOOP_USER/hadoop"
export PIG_HOME="$HADOOP_SRC/pig"
export HIVE_HOME="$HADOOP_SRC/hive"
END
... skip .....
Setup .bashrc for user(2/2)
desc "setup hadoop-user's .rc file";
task "setup_rc_def", group=>"hadoop_node", sub {
my $fh = file_append ".bashrc";
$fh->write($base_rc);
$fh->write($hadoop_rc);
$fh->close();
};
desc "setup hadoop master node .rc file";
task "setup_rc_master", "vmaster", sub {
my $fh = file_append ".bashrc";
$fh->write($master_rc);
$fh->close();
};
.......... skip ............
Configure Hadoop(1/6)
● ‘masters’
[hadoop-user@vmaster: ~]$cd hadoop/conf
[hadoop-user@vmaster: conf]$cat masters
vmaster
● ‘slaves’
[hadoop-user@vmaster: conf]$cat slaves
vnode0
vnode1
vnode2
Configure Hadoop(2/6)
• hadoop-env.sh
... skip ...
The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
#hadoop-user
#Remove warring message for "HADOOP_HOME" is deprecated
export HADOOP_HOME_WARN_SUPPRESS=TRUE
Configure Hadoop(3/6)
• hdfs-site.xml
... skip ...
<configuration>
<!-- modified by hadoop-user -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop-user/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop-user/hdfs/data</value>
</property>
</configuration>
※ This ‘replication’ value is depend on our env.
Configure Hadoop(4/6)
• core-site.xml
... skip ...
<configuration>
<!--modified by hadoop-user -->
<property>
<name>fs.default.name</name>
<value>hdfs://vmaster:9000</value>
</property>
</configuration>
Configure Hadoop(5/6)
• mapred-site.xml
.. skip ..
<property>
<name>mapred.job.tracker</name>
<value>vmaster:9001</value>
</property>
<!-- 2013.9.11. Increse the setting timeout for fail to report status error -->
<property>
<name>mapred.task.timeout</name>
<value>1800000</value>
<description>The number of milliseconds before
a task will be terminated if it neither reads an input, writes
an output, nor updates its status string.
</description>
</property>
※ This ‘timeout’ value is just depend on our env.
Configure Hadoop(6/6)
my $CNF_REPO="hadoop_conf_repo";
... skip ...
my $MAPRED="mapred-site.xml";
task "upload_mapred", group=>"hadoop_node", sub {
file "$HD_CNF/$MAPRED",
owner => $HADOOP_USER,
group => $HADOOP_USER,
source => "$CNF_REPO/$MAPRED";
};
my $CORE_SITE="core-site.xml";
task "upload_core", group=>"hadoop_node", sub {
file "$HD_CNF/$CORE_SITE",
owner => $HADOOP_USER,
group => $HADOOP_USER,
source => "$CNF_REPO/$CORE_SITE";
};
... skip ....
Before going any further
● Stop selinux
– If it is enforcing
● modify policy of iptables
– I recommend to stop it while configure working
Lets start hadoop
● login to master node with hadoop-user
– ssh –X hadoop-user@vmaster
● hadoop namenode format
– hadoop namenode format
● execute start script
– ex) start-all.sh
Check hadoop status
[hadoop-user@vmaster: ~]$jps -l
22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
22260 org.apache.hadoop.mapred.JobTracker
21968 org.apache.hadoop.hdfs.server.namenode.NameNode
27896 sun.tools.jps.Jps
[hadoop-user@vmaster: ~]$hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp
※ It seems to be OK. Really?
But, life is not easy
http://www.trulygraphics.com/tg/weekend/
Check status for all DNs
task "show_jps", "vnode[0..2]", sub {
say run "hostname";
my $r = run "jps";
say $r;
};
[onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps
vnode0
12682 Jps
12042 TaskTracker
11934 DataNode
vnode1
11669 DataNode
11778 TaskTracker
12438 Jps
vnode2
11128 DataNode
11237 TaskTracker
11895 Jps
If there is some problem,
http://blog.lib.umn.edu/isss/undergraduate/2011/11/y
ou-do-have-any-tech-problem.html
● Check again
– /etc/hosts
– selinux & iptables
– name & data dir./permission in hdfs
– and so on...
(on the each node)
If you did not meet any
problems or fixed those,
Now you have hadoop
https://hadoopworld2011.eventbrite.com/
Automatic MGM/Prov. solution
yonhap
&
Advnaced Challenge
What more can we do?(1/2)
● add/remove data node
● add/remove storage
● Intergrate with monitoring
– ex: Ganglia/Nagios
● Intergrate with other hadoop eco
– Flume, flamingo, Oozie
● Intergrate other device or server
– ex: Switch, DB server
What more can we do?(2/2)
● sophisticated hadoop paramer control
– ex: use XML parsing
● workflow control & batch
● backup
● periodic file system management
– ex: log files
● web GUI
● make a framework for your purpose
Ref.
• http://hadoop.apache.org/
• http://pig.apache.org/
• http://hive.apache.org/
• http://confluence.openflamingo.org
• http://www.openankus.org
• http://www.rexify.org
• https://groups.google.com/forum/#!forum/re
x-users
• http://modules.rexify.org/search?q=hadoop
http://www.projects2crowdfund.com/what-can-i-do-with-
crowdfunding/
Thanksjunkim@onycom.com
/
rainmk6@gmail.com

More Related Content

What's hot

Ansible is the simplest way to automate. MoldCamp, 2015
Ansible is the simplest way to automate. MoldCamp, 2015Ansible is the simplest way to automate. MoldCamp, 2015
Ansible is the simplest way to automate. MoldCamp, 2015Alex S
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with PuppetKris Buytaert
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setupMohammad_Tariq
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentationKumar Y
 
Failsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo HomepageFailsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo HomepageKit Chan
 
Apache Traffic Server & Lua
Apache Traffic Server & LuaApache Traffic Server & Lua
Apache Traffic Server & LuaKit Chan
 
Lesson 9. The Apache Web Server
Lesson 9. The Apache Web ServerLesson 9. The Apache Web Server
Lesson 9. The Apache Web Serverwebhostingguy
 
Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2Marcus Deglos
 
Herd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration managementHerd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration managementFrederik Engelen
 
HBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockHBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockMichael Stack
 
Reverse proxies & Inconsistency
Reverse proxies & InconsistencyReverse proxies & Inconsistency
Reverse proxies & InconsistencyGreenD0g
 
Replacing Squid with ATS
Replacing Squid with ATSReplacing Squid with ATS
Replacing Squid with ATSKit Chan
 
Dev ninja -> vagrant + virtualbox + chef-solo + git + ec2
Dev ninja  -> vagrant + virtualbox + chef-solo + git + ec2Dev ninja  -> vagrant + virtualbox + chef-solo + git + ec2
Dev ninja -> vagrant + virtualbox + chef-solo + git + ec2Yros
 
Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點William Yeh
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh thingsMarcus Deglos
 
Caching with Varnish
Caching with VarnishCaching with Varnish
Caching with Varnishschoefmax
 
Ansible roles done right
Ansible roles done rightAnsible roles done right
Ansible roles done rightDan Vaida
 

What's hot (20)

Ansible presentation
Ansible presentationAnsible presentation
Ansible presentation
 
Ansible is the simplest way to automate. MoldCamp, 2015
Ansible is the simplest way to automate. MoldCamp, 2015Ansible is the simplest way to automate. MoldCamp, 2015
Ansible is the simplest way to automate. MoldCamp, 2015
 
Automating complex infrastructures with Puppet
Automating complex infrastructures with PuppetAutomating complex infrastructures with Puppet
Automating complex infrastructures with Puppet
 
Docker, c'est bonheur !
Docker, c'est bonheur !Docker, c'est bonheur !
Docker, c'est bonheur !
 
FreeBSD: Dev to Prod
FreeBSD: Dev to ProdFreeBSD: Dev to Prod
FreeBSD: Dev to Prod
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setup
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentation
 
Failsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo HomepageFailsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo Homepage
 
Apache Traffic Server & Lua
Apache Traffic Server & LuaApache Traffic Server & Lua
Apache Traffic Server & Lua
 
Lesson 9. The Apache Web Server
Lesson 9. The Apache Web ServerLesson 9. The Apache Web Server
Lesson 9. The Apache Web Server
 
Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2
 
Herd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration managementHerd your chickens: Ansible for DB2 configuration management
Herd your chickens: Ansible for DB2 configuration management
 
HBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockHBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with Clusterdock
 
Reverse proxies & Inconsistency
Reverse proxies & InconsistencyReverse proxies & Inconsistency
Reverse proxies & Inconsistency
 
Replacing Squid with ATS
Replacing Squid with ATSReplacing Squid with ATS
Replacing Squid with ATS
 
Dev ninja -> vagrant + virtualbox + chef-solo + git + ec2
Dev ninja  -> vagrant + virtualbox + chef-solo + git + ec2Dev ninja  -> vagrant + virtualbox + chef-solo + git + ec2
Dev ninja -> vagrant + virtualbox + chef-solo + git + ec2
 
Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點Ansible 實戰:top down 觀點
Ansible 實戰:top down 觀點
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh things
 
Caching with Varnish
Caching with VarnishCaching with Varnish
Caching with Varnish
 
Ansible roles done right
Ansible roles done rightAnsible roles done right
Ansible roles done right
 

Similar to Hadoop meet Rex(How to construct hadoop cluster with rex)

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Using filesystem capabilities with rsync
Using filesystem capabilities with rsyncUsing filesystem capabilities with rsync
Using filesystem capabilities with rsyncHazel Smith
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Setting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesSetting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesShabir Ahmad
 
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis OverviewLeo Lorieri
 
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Hosting a Rails App
Hosting a Rails AppHosting a Rails App
Hosting a Rails AppJosh Schramm
 
Linux advanced privilege escalation
Linux advanced privilege escalationLinux advanced privilege escalation
Linux advanced privilege escalationJameel Nabbo
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOpsОмские ИТ-субботники
 
Virtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetVirtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetOmar Reygaert
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013grim_radical
 
Linux basic for CADD biologist
Linux basic for CADD biologistLinux basic for CADD biologist
Linux basic for CADD biologistAjay Murali
 
20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasag20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasaggarrett honeycutt
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context ConstraintsAlessandro Arrichiello
 

Similar to Hadoop meet Rex(How to construct hadoop cluster with rex) (20)

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Using filesystem capabilities with rsync
Using filesystem capabilities with rsyncUsing filesystem capabilities with rsync
Using filesystem capabilities with rsync
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Setting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesSetting up LAMP for Linux newbies
Setting up LAMP for Linux newbies
 
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
[EXTENDED] Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
 
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
 
Adhocr T-dose 2012
Adhocr T-dose 2012Adhocr T-dose 2012
Adhocr T-dose 2012
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Hosting a Rails App
Hosting a Rails AppHosting a Rails App
Hosting a Rails App
 
Linux advanced privilege escalation
Linux advanced privilege escalationLinux advanced privilege escalation
Linux advanced privilege escalation
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
 
Virtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetVirtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + Puppet
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013
 
Linux basic for CADD biologist
Linux basic for CADD biologistLinux basic for CADD biologist
Linux basic for CADD biologist
 
Puppet
PuppetPuppet
Puppet
 
20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasag20090514 Introducing Puppet To Sasag
20090514 Introducing Puppet To Sasag
 
grate techniques
grate techniquesgrate techniques
grate techniques
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Hadoop meet Rex(How to construct hadoop cluster with rex)

  • 1. Hadoop meet (R)?ex - How to use Rexify for Hadoop cluster construct Original Rex base image http://rexify.org 2013-08-26 Original Hadoop image http://hadoop.apahce.org
  • 3. Mission • I’m not S/W developer any more • I’m not system engineer • But, I had to construct hadoop cluster – Moreover, in various types... http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate- Why-Mock-Me-Homer-Magnet-SM130.html
  • 4. Hadoop is • The hadoop cluster is consist of many linux boxes • The hadoop has many configuration files and parameters • Besides hadoop, variety S/W of the hadoop eco system should be installed. • Except Hadoop & Hadoop eco, many types S/W should be installed & configured – Tomcat, apache, DBMS, other develop tools, other utils/libs… • And so on …
  • 5. At first time, • I have did it manually – Install & Configure.. – Install & Configure – Install & Configure – Install & Configure – …. Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html
  • 6. Tiresome !! • It is really tedious & horrible job !! Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/
  • 7. Find to other way • I decide to find other way!! • I’ve started to survey for other solutions Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html
  • 9. Variety solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.cbsnews.com/8301-505125_162-31042083/duke- research-monkeys-like-humans-want-variety/
  • 10. Hadoop Managers Hortonworks Management Center™ Clouder’s CDH™ * Apache Ambari
  • 13. Examination(1/3) • Hadoop Managers ↑ Specialized in the hadoop ↑ Aleardy confirmed ↑ Comportable ↓ Commercial or restrict license ↓ No support other App/libs, excluding Java/Hadoop/Hadoop Eco
  • 14. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://www.bizbuilder.com/how-much-does-an-inexpensive- franchise-cost/  I have no money  I want to use more extra resource ※Recently, there are many changes in license policy. Please check it!!
  • 15. Examination(2/3) • Other provisioning tools ↑ Powerful ↑ Many features ↑ Detailed control ↑ ↓ Complicatedness ↓ Need a lot of study
  • 16. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools source :www.mbc.co.kr  I don’t like to study
  • 17. Examination(3/3) • Other pararell ssh tools ↑ Simple ↑ Useful ↑ No need to install extra agent ↓ There are some insufficient features ↓ All exceptional cases are should be considered
  • 18. Other solutions • Hadoop Managers • Provisioning Tools • Parallas SSH Tools http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm  Yes, I’m a greedy
  • 19. ● Simple & ● Powerful & ● No cost & ● Expandable & ● Smart way??? http://plug.hani.co.kr/heihei9999/459415 So, What is?
  • 20. I have found solution
  • 22. ● uses just ssh ● no agent required ● seamless intergration ● no conflicts ● easy to use ● easy to extend ● easy to learn ● can use advanced perl’s power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over- and-let-holiday.html Rex is
  • 23. Rex options [onycom@onydev: ~]$rex -h (R)?ex - (Remote)? Execution -b Run batch -e Run the given code fragment -E Execute task on the given environment -H Execute task on these hosts -G Execute task on these group -u Username for the ssh connection -p Password for the ssh connection -P Private Keyfile for the ssh connection -K Public Keyfile for the ssh connection -T List all known tasks. -Tv List all known tasks with all information. -f Use this file instead of Rexfile -h Display this help -M Load Module instead of Rexfile -v Display (R)?ex Version -F Force. Don't regard lock file -s Use sudo for every command -S Password for sudo -d Debug -dd More Debug (includes Profiling Output) -o Output Format -c Turn cache ON -C Turn cache OFF -q Quiet mode. No Logging output -Q Really quiet. Output nothing. -t Number of threads to use
  • 24. Basic Gramma - Authentication From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • 25. Basic Gramma - Server Group From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • 26. Basic Gramma - Task From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3
  • 27. Lets get down to the main subject!
  • 29. This presentaion is ● How to easy install & configure Hadoop – Not “How to optimize & performance tunning” ● To easy understanding, – exceptional cases are excluded ● No explain to OS installation – no discuss about “PXE /kicstart” ● Reduced environment conditions – ex) security, network, other servers/Apps, … ● I’ll not talk about perl language as possible – It is no needed ● TMTOWTDI – Even if it’s not refined, I’ll show variety way as possible
  • 30. Network vmaster (Name node/ Job Tracker) L2 switch Onydev (Provision Server) vnode0 (Data node) vnode1 (Data node) vnode2 (Data node) vmonitor (Monitoring Server) Topology [spec]  Machine : 6 ea (hadoop has just 4 ea)  OS : CentOS 6.4 64bit  Memory : 32GB(NN) 16GB(DN)  CPU : 4 core(i7, 3.5GHz)  Interface : 1G Ethernet  Disk : 250G SDD 1T HDD ※ I’ve configured NN and JT on the same machine
  • 31. Our hadoop Env. is ● There is one control account – ‘hadoop-user’ ● hadoop & hadoop eco is installed in ‘hadoop-user’ account
  • 32. Prepare – All machines ● On the each machine, – same OS version would be installed (at least, hadoop cluster ) – has own fixed IP address – can be connect with SSH – has one more normal user account & it’s sudoe rs edit work (just optional)
  • 33. Prepare – Provision Server(1/2) ● Develop tools & envrionment – ex: gcc, glib, make/cmake, perl, etc... ● Install Perl modules – yum install perl-ExtUtil* – yum install perl-CPAN* – excute ‘cpan’ command
  • 34. Prepare – Provision Server(2/2) ● After execute ‘cpan’ command – cpan 3> install Rex – You may get fail!! – This all story is based on the CentOS 6.XX ● So, I recommend ‘perl brew’ – If you want to use more perl power ※In my guess, redhat may dislike perl language
  • 35. To Install Rex (1/3) adduser brew-user passwd brew-user curl -L http://install.perlbrew.pl | bash cd /home chmod 755 brew-user cd ~brew-user chmod -R 755 ./perl5 echo "export PERLBREW_ROOT="/home/brew-user/perl5/perlbrew"" >> /home/brew-user/.bashrc ##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc source ~brew-user/.bashrc
  • 36. To Install Rex (2/3) ## In the brew-user account, perlbrew init perlbrew available ### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11) perlbrew install perl-5.18.0 perlbrew switch perl-5.18.0 [brew-user@onydev: ~]$perlbrew switch perl-5.18.0 Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34. ......... A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.
  • 37. ● cpanm Rex ● cpan ● http://rexify.org/get/ To Install Rex (3/3)
  • 38. Test for Rex [onycom@onydev: ~]$which rex /home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex [onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'" [2013-10-08 15:36:06] INFO - Running task eval-line on localhost [2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom) [2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate. [2013-10-08 15:36:07] INFO - Successfully authenticated on localhost. onydev [onycom@onydev: ~]$ ● Rexfile ● plain text file
  • 39. /etc/hosts - Provision Server 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ... skip ................. 192.168.2.100 onydev ... skip ................. 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor ~
  • 40. SSH connection ● Between  Provision server and other target servers  Hadoop master node and data nodes
  • 41. [onycom@onydev: ~]$ ssh-keygen –t rsa Enter file in which to save the key (/home/onycom/.ssh/id_rsa): Created directory '/home/onycom/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/onycom/.ssh/id_rsa. Your public key has been saved in /home/tasha/.ssh/id_rsa.pub. Prepare SSH public key
  • 42. Create User use Rex::Commands::User; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; my $USER = “hadoop-user”; desc "Create user"; task "new_user", group => “all_vm_node”, sub { create_user “$USER", home => "/home/$USER", comment=>"Account for _hadoop", password => "blabla", }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user
  • 43. Setup SSH for user desc "setup ssh for user"; task "setup_ssh_user", group => “all_vm_node”, sub { run "mkdir /home/$USER/.ssh"; file "/home/$USER/.ssh/authorized_keys", source => "/home/onycom/.ssh/id_rsa.pub", owner => "$USER", group => "$USER", mode => 644; run "chmod 700 /home/$USER/.ssh"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user ※ Ok!! Done. Now you can login to each servers without password Then, do same thing for hadoop NN/DN nodes.
  • 44. Install packages parallelism 4; desc "Install packages for java"; task "install_java", group => “all_vm_node”, sub { install package => “java-1.6.*"; }; onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java • Some packages are should be installed globaly(ex: java, wget, etc) • For the hadoop 1.1.x, java 1.6 is recommanded. • use parallelism keyword (if long time is required)
  • 45. Install hadoop(1/3) user "hadoop-user"; private_key "/home/onycom/.ssh/id_rsa"; public_key "/home/onycom/.ssh/id_rsa.pub"; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; desc "prepare_dir"; task "prepare_dir", group=>"hadoop_node", sub { run "mkdir Work"; run "mkdir Download"; run "mkdir src“; run “mkdir tmp”; }; hd1.Rexfile onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir
  • 46. Install hadoop(2/3) desc "hadoop 1.1.2 download with wget"; task "get_hadoop", group=>"hadoop_node", sub { my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop- 1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ...skip.... desc "pig 0.11.1 download with wget"; task "get_pig", group=>"hadoop_node", sub { my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ! hadoop ver. & hadoop eco s/w ver. should be matched This topic is get off the subject on this presentation
  • 47. Install hadoop(3/3) my $HADOOP_SRC_DIR = "/home/hadoop-user/src"; desc "unzip hadoop source files"; task "unzip_src",group=>"hadoop_node", sub { run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR"; }; desc "make link for hadoop source files"; task "link_src", group=>"hadoop_node", sub { run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR; run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR; run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR; };
  • 48. Configuration files(1/3) ● System – /etc/hosts ● Hadoop(../hadoop/conf) – masters & slave – hadoop-env.sh – hdfs-site.xml – core-site.xml – mapred-site.xml
  • 49. Configuration files(2/3) ● Hadoop eco systems & other tools – ex) Ganglia – ex) Flume – agent/collector/master – ex) Oozie or flamingo – Skip these on this PPT. ● User rc file  These are just default & no consider optimization
  • 50. Configuration files(3/3) Provision Server Hadoop NN Hadoop DN 1 Hadoop DN n Hadoop configuration files (../hadoop_conf_repo) SSH/SCP (R)ex ※ Of course, this is just my policy
  • 51. Edit hosts file my $target_file = “/etc/hosts”; my $host_list =‘<<END’ 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor END desc "Add hosts"; task "add_host", group => “all_vm_node", sub { my $exist_cnt = cat $target_file; my $fh = file_write $target_file; $fh->write( $exist_cnt ); $fh->write($host_list); $fh->close; }; ※ You can consider ‘Augeas tool’ to handle system files. Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’
  • 52. Setup .bashrc for user(1/2) ... skip ..... my $hadoop_rc=<<'END'; #Hadoop Configuration export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64" export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext" export HADOOP_USER="/home/hadoop-user" export HADOOP_SRC="$HADOOP_USER/src" export HADOOP_HOME="$HADOOP_USER/hadoop" export PIG_HOME="$HADOOP_SRC/pig" export HIVE_HOME="$HADOOP_SRC/hive" END ... skip .....
  • 53. Setup .bashrc for user(2/2) desc "setup hadoop-user's .rc file"; task "setup_rc_def", group=>"hadoop_node", sub { my $fh = file_append ".bashrc"; $fh->write($base_rc); $fh->write($hadoop_rc); $fh->close(); }; desc "setup hadoop master node .rc file"; task "setup_rc_master", "vmaster", sub { my $fh = file_append ".bashrc"; $fh->write($master_rc); $fh->close(); }; .......... skip ............
  • 54. Configure Hadoop(1/6) ● ‘masters’ [hadoop-user@vmaster: ~]$cd hadoop/conf [hadoop-user@vmaster: conf]$cat masters vmaster ● ‘slaves’ [hadoop-user@vmaster: conf]$cat slaves vnode0 vnode1 vnode2
  • 55. Configure Hadoop(2/6) • hadoop-env.sh ... skip ... The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 #hadoop-user #Remove warring message for "HADOOP_HOME" is deprecated export HADOOP_HOME_WARN_SUPPRESS=TRUE
  • 56. Configure Hadoop(3/6) • hdfs-site.xml ... skip ... <configuration> <!-- modified by hadoop-user --> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop-user/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop-user/hdfs/data</value> </property> </configuration> ※ This ‘replication’ value is depend on our env.
  • 57. Configure Hadoop(4/6) • core-site.xml ... skip ... <configuration> <!--modified by hadoop-user --> <property> <name>fs.default.name</name> <value>hdfs://vmaster:9000</value> </property> </configuration>
  • 58. Configure Hadoop(5/6) • mapred-site.xml .. skip .. <property> <name>mapred.job.tracker</name> <value>vmaster:9001</value> </property> <!-- 2013.9.11. Increse the setting timeout for fail to report status error --> <property> <name>mapred.task.timeout</name> <value>1800000</value> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description> </property> ※ This ‘timeout’ value is just depend on our env.
  • 59. Configure Hadoop(6/6) my $CNF_REPO="hadoop_conf_repo"; ... skip ... my $MAPRED="mapred-site.xml"; task "upload_mapred", group=>"hadoop_node", sub { file "$HD_CNF/$MAPRED", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$MAPRED"; }; my $CORE_SITE="core-site.xml"; task "upload_core", group=>"hadoop_node", sub { file "$HD_CNF/$CORE_SITE", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$CORE_SITE"; }; ... skip ....
  • 60. Before going any further ● Stop selinux – If it is enforcing ● modify policy of iptables – I recommend to stop it while configure working
  • 61. Lets start hadoop ● login to master node with hadoop-user – ssh –X hadoop-user@vmaster ● hadoop namenode format – hadoop namenode format ● execute start script – ex) start-all.sh
  • 62. Check hadoop status [hadoop-user@vmaster: ~]$jps -l 22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 22260 org.apache.hadoop.mapred.JobTracker 21968 org.apache.hadoop.hdfs.server.namenode.NameNode 27896 sun.tools.jps.Jps [hadoop-user@vmaster: ~]$hadoop fs -ls / Found 1 items drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp ※ It seems to be OK. Really?
  • 63. But, life is not easy http://www.trulygraphics.com/tg/weekend/
  • 64. Check status for all DNs task "show_jps", "vnode[0..2]", sub { say run "hostname"; my $r = run "jps"; say $r; }; [onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps vnode0 12682 Jps 12042 TaskTracker 11934 DataNode vnode1 11669 DataNode 11778 TaskTracker 12438 Jps vnode2 11128 DataNode 11237 TaskTracker 11895 Jps
  • 65. If there is some problem, http://blog.lib.umn.edu/isss/undergraduate/2011/11/y ou-do-have-any-tech-problem.html ● Check again – /etc/hosts – selinux & iptables – name & data dir./permission in hdfs – and so on... (on the each node)
  • 66. If you did not meet any problems or fixed those,
  • 67. Now you have hadoop https://hadoopworld2011.eventbrite.com/ Automatic MGM/Prov. solution yonhap &
  • 69. What more can we do?(1/2) ● add/remove data node ● add/remove storage ● Intergrate with monitoring – ex: Ganglia/Nagios ● Intergrate with other hadoop eco – Flume, flamingo, Oozie ● Intergrate other device or server – ex: Switch, DB server
  • 70. What more can we do?(2/2) ● sophisticated hadoop paramer control – ex: use XML parsing ● workflow control & batch ● backup ● periodic file system management – ex: log files ● web GUI ● make a framework for your purpose
  • 71. Ref. • http://hadoop.apache.org/ • http://pig.apache.org/ • http://hive.apache.org/ • http://confluence.openflamingo.org • http://www.openankus.org • http://www.rexify.org • https://groups.google.com/forum/#!forum/re x-users • http://modules.rexify.org/search?q=hadoop