SlideShare a Scribd company logo
1 of 46
Nitesh singh
WHAT IS BIG DATA!!!!!!
• LARGE AMOUNT OF DATA
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• purchases at department/
grocery stores
• Bank/Credit Card
transactions
• Social Network
HOW MUCH DATA!!
• Google processes 20 PB a day (2008)
• Wayback Machine has 3 PB + 100 TB/month
(3/2009)
• Facebook has 2.5 PB of user data + 15 TB/day
(4/2009)
• eBay has 6.5 PB of user data + 50 TB/day (5/2009)
• CERN’s Large Hydron Collider (LHC) generates 15
PB a year
HOW CAN WE ANALYZE THIS MUCH DATA?
WHAT IS APACHE HADOOP
• Apache Hadoop is an open-source software
framework for storage and large-scale processing
of data-sets on clusters ofcommodity hardwareIT IS
DEVELOPED BY “APACHE SOFTWARE FOUNDATION”.
• ITS STABLE RELEASE VERSION IS “2.4.1 ON JUNE
30,2014.
• IT IS WRITTEN IN “JAVA”.
• IT IS DISTRIBUTED FILE SYSTEM.
BASIC COMPONENT OF HADOOP FRAMEWORK!!
• MAP REDUCE
• HIVE
• PIG
• SQOOP
• FLUME
INFRASTRUCTURE PROVIDER
• AWS AMAZON(AMAZON WEB SERVICE).
• CLOUD ERA(LEADING PROVIDER)
• HORTON WORKS
• RACK SPACE.
• MAPR
• SFDC(SALES FOR DOT COM)
MAJOR ROLES OF HADOOP
• HADOOP ADMINISTRATOR
• HADOOP DEVELOPER.
HADOOP DEVELOPER
• RUN THE QUERY
• EXECUTE THE PROGRAM
• MAINTAIN A REPORT.
HADOOP ADMINISTRATOR
• MAINTAIN INFRASTRUCTURE.
• NOT WRITING ANY PROGRAM
• CHECK MEMORY MANAGEMENT AND ALL
• CHECK IF ANY NODE FAILS
• HOW MANY NODES WE HAVE TO CHOOSE.
HADOOP ARCHITECTURE
SQOOP FLUMEHBASE
PIGHIVE
MAP REDUCE
HDFS
HDFS(HADOOP DISTRIBUTED FILE SYSTEM)
CLUSTER
DATA CENTER
RACKS
NODE
BLOCKS
HDFS
WHAT IS HDFS
 HDFS IS A WAY TO STORE THE FILES
 PREVENTING A DATA TO BE LOOSED.
 BASIC COMPONENT OF HADOOP FRAME WORK WHERE
ALL FILES STORED
 FILES ARE STORED IN DISTRIBUTED MANNER IN TERMS
OF BLOCK ie CALLED “HADOOP DISTRIBUTED FILE
SYSTEM”.
MAP REDUCE ENGINE
• TECHNOLOGY FROM GOOGLE.
• A MAP REDUCE PROGRAM CONSISTS OF MAP
AND REDUCE FUNCTION
• A MAP REDUCE JOB IS BROKEN IN TO TASK
THAT RUNS IN PARLLEL
• IN HDFS PROCESS IS CALLED “DATA
DISTRIBUTION”
• PROCESS IS MAP RDUCE CALLED “JOB
DISTRIBUTION”.
MAP
REDUCE HDFS HADOOP
TYPES OF NODE
NODE
MASTER
• NAMENODE
• JOB TRACKER
SLAVE
• TASK TRACKER
• DATA NODE
JOB TRACKER NODE
• TAKE THE REQUEST FROM CLIENT
• IT PASS THAT INFORMATION TO NAME NODE.
• IT IS THE PART OF MAPREDUCE ENGINE.
• EVERY CLUSTER HAS ATMOST ON JOB
TRACKER NODE
TASK TRACKER
• MANY PER HADOOP CLUSTER
• EXECUTES MAP REDUCER OPERATION
• READ BLOCK FROM DATABASE.
NAME NODE
• ONE PER HADOOP CLUSTER.
• MAIN NODE OF HADOOP DISTRIBUTED FILE SYSTEM
• IF NAME NODE OR JOB TRACKER NODE FAILS THAN THE WHOLE
CLUSTER WILL HALT.
• VERY IMPORTANT NODE SO WE HAVE TO HAVE SECONDRY NODE AS
WELL.
• NAME NODE IS VERY IMPORTANT BECAUSE IT THE INTERMEIDATE OF
THE JOB TRACKER NODE AND THE DATA NODE.
WRITING THE DATA TO HDFS
• WHEN THE USER SEND REQUEST FOR WRITING THE FILE IN TO
HDFS,JOB TRACKER ACCEPT IT AND SEND REQUEST TO NAME
NODE.
• NAME NODE SEND REQUEST TO THE DATA NODE AND ASK IT TO FIND
THE PARTICULAR BLOCK SO THAT IT CAN ALLOCATE THAT BLOCK
TO THE REQUESTED FILE.
• DATA NODE SENDS ACKNOWLEDGEMENT TO THE NAME NODE
CONATAINS VACANT BLOCK IT HAS.
• THAN NAME NODE SENDS ACKNOWLEDGEMENT TO THE JOB
TRACKER AND ASK IT TO WRITE THE FILE AND DATA NODE ACCEPT
ITS ACESS AND WRITET THE CONTENT IN DATA NODE BLOCKS.
READING THE FILE FROM HDFS
• WHEN THE USER SENDS REQUEST ,JOB TRACKER ACCEPT IT AND SENDS
REQUEST TO NAME NODE
• NAME NODE SENDS REQUEST TO THE DATA NODE ,DATA NODE RETURN
ACKNOWLEDGEMENT THAT IS HAD THAT DATA BLOCKS
• NAME NODE SEND ACK TO JOB TRACKER NODE
• JOB TRACKER SENDS THAT ACCESS TO THE TASK TRACKER NODE
• TASK TRACKER NODE READS DATA DIRECTLY FROM THE DATA NODE ,REDIRECT
IT TO USER.
• AFTER SUCCESS PROCESS STOP.
MAP REDUCE PROGRAM REDUCER
CLASS
• import java.io.IOException;
• import org.apache.hadoop.io.IntWritable;
• import org.apache.hadoop.io.LongWritable;
• import org.apache.hadoop.io.Text;
• import org.apache.hadoop.mapred.MapReduceBase;
• import org.apache.hadoop.mapred.Mapper;
• import org.apache.hadoop.mapred.OutputCollector;
• import org.apache.hadoop.mapred.Reporter;
• public class mapper extends MapReduceBase implements
• Mapper<LongWritable, Text, Text, IntWritable> {
CONT….
• @Override
• public void map(LongWritable key, Text value,
• OutputCollector<Text, IntWritable> output, Reporter reporter)
• throws IOException {
•
• //1-you have an key value pair for the mapper class input like (0:I
am neeraj gupta;1:I am learning hadoop} like this
• //String fix="250000000";
• String s=value.toString();
• String s1[]=s.split(" ");//SPLIT to arrays of word
• //now i have the format fname lname id salary salry 3,7,11,15
• int w=0;
• int last=0;
• for(int i=3;i<s1.length;i=i+7)
CONT……
• w+=Integer.parseInt(s1[i+1])+Integer.parseInt(s1[i+2])+Integer.parseInt(s
1[i+3]);//for total
• //int w=Integer.parseInt(s1[i]);
•
•
• output.collect(new Text(s1[i-2]),new IntWritable(w));
•
• }//end of the for loop
•
•
•
•
REDUCER CLASS
• import java.io.IOException;
• import java.util.Iterator;
• import org.apache.hadoop.io.IntWritable;
• import org.apache.hadoop.io.Text;
• import org.apache.hadoop.mapred.OutputCollector;
• import org.apache.hadoop.mapred.MapReduceBase;
• import org.apache.hadoop.mapred.Reducer;
• import org.apache.hadoop.mapred.Reporter;
• public class Reduce extends MapReduceBase implements
• Reducer<Text, IntWritable, Text, IntWritable>{
• int s=0;
• // int count=6;
• String name="";
• @Override
• public void reduce(Text key, Iterator<IntWritable> values,
• OutputCollector<Text, IntWritable> output, Reporter reporter)
• throws IOException {
• int w1=0;
• int i=0;
• IntWritable w=values.next();
• w1+=w.get();
• }//end of the while loop
• if(s<w1)
• {
• s=w1;
• name=key.toString();
• output.collect(new Text(name), new IntWritable(s));
• }
• }
CONFIGURATION FILE
• import org.apache.hadoop.fs.Path;
• import org.apache.hadoop.io.IntWritable;
• import org.apache.hadoop.io.Text;
• import org.apache.hadoop.mapred.FileInputFormat;
• import org.apache.hadoop.mapred.FileOutputFormat;
• import org.apache.hadoop.mapred.JobClient;
• import org.apache.hadoop.mapred.JobConf;
• import org.apache.hadoop.conf.Configured;
• import org.apache.hadoop.util.Tool;
• import org.apache.hadoop.util.ToolRunner;
• public class WordCount extends Configured implements Tool {
• @Override
• public int run(String[] args) throws Exception {
• if (args.length != 2) {
• System.out.printf(
• "Usage: %s [generic options] <input dir> <output dir>n",
getClass()
• .getSimpleName());
• ToolRunner.printGenericCommandUsage(System.out);
• return -1;
• FileOutputFormat.setOutputPath(conf, new Path(args[1]));
• conf.setMapperClass(mapper.class);
• conf.setReducerClass(Reduce.class);
• conf.setMapOutputKeyClass(Text.class);
• conf.setMapOutputValueClass(IntWritable.class);
• conf.setOutputKeyClass(Text.class);
• conf.setOutputValueClass(IntWritable.class);
• JobClient.runJob(conf);
HIVE
• HIVE IS A DATA WARE HOUSE WHICH IS USING FOR READ THE TABLES
LANGUAGE “HQL” HIVE QUERY LANGUAGE
• DEFAULT READING FILES FORM HDFS
• BECAUSE HDFS KNOWS ONLY MAP REDUCE PROGRAM
• PERFOMANCE IS SLOWER THAN MAP REDUCE PRGRAM BECAUSE OF
COMPARISION
• OUTPUT STORES IN HIVE WARE HOUSE
HIVE WARE
HOUSE
HIVE QL
OUTPUT
STEPS TO CREATE TABLES AND LOADING THE
DATA
• STEP1:-TYPE HIVE ON CLI
• STEP2:- create table sales(month string,product string,quan string,fc
string,prot string,country string,exporter string,buyer string)row format
delimited fields terminated by ',';
• STEP3:- load data inpath 'spp/docs' overwrite into table docs;
• STEP4:- SELECT * FORM TABLE NAME TO SEE THE OUTPUT.
SALES DATA ANALYSIS PROJECT ON HADOOP
• WE HAVE 1500 ENTRIES IN A CSV FORMAT FILE AND WE NEED TO
ANALYZE THAT WHOLE DATA.
• WE HAVE MULTIPLE SHEETS OF DATA 4 FILES EACH HAVING 1500
ENTRIES
• WE NEED TO MAKE THE REPORT ASKED BY THE USER.
• WE CHOOSE HIVE QUERY LANGUAGE TO DO SO.
• WE EXPORT THAT FILE IN HIVE WARE HOUSE
• WE CONSTUCT THE TABLE IN HIVE
• WE RUN SEVERAL QURIES AND FIND THE DIFFERENT RESULT ASK
BY THE USER
• THIS IS VERY EFFICENT AND FAST
THANK YOU!!!
•
•

More Related Content

What's hot

How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
Chef Actions: Delightful near real-time activity tracking!
Chef Actions: Delightful near real-time activity tracking!Chef Actions: Delightful near real-time activity tracking!
Chef Actions: Delightful near real-time activity tracking!James Casey
 
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Edureka!
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoopmarkgrover
 
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Chef
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef Software, Inc.
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for HadoopHadoop User Group
 
Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)Allen Wittenauer
 
Automating Infrastructure with Chef
Automating Infrastructure with ChefAutomating Infrastructure with Chef
Automating Infrastructure with ChefJennifer Davis
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Chef
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Software, Inc.
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patternshadooparchbook
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and FuturePivotalOpenSourceHub
 
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...StampedeCon
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detectionhadooparchbook
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in sparkPeng Cheng
 

What's hot (20)

How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Chef Actions: Delightful near real-time activity tracking!
Chef Actions: Delightful near real-time activity tracking!Chef Actions: Delightful near real-time activity tracking!
Chef Actions: Delightful near real-time activity tracking!
 
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
Chef vs Puppet vs Ansible vs SaltStack | Configuration Management Tools Compa...
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
 
Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for Hadoop
 
Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)Let's Talk Operations! (Hadoop Summit 2014)
Let's Talk Operations! (Hadoop Summit 2014)
 
Automating Infrastructure with Chef
Automating Infrastructure with ChefAutomating Infrastructure with Chef
Automating Infrastructure with Chef
 
Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3Node object and roles - Fundamentals Webinar Series Part 3
Node object and roles - Fundamentals Webinar Series Part 3
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
How to build your query engine in spark
How to build your query engine in sparkHow to build your query engine in spark
How to build your query engine in spark
 

Viewers also liked

Primary and secondry
Primary and secondryPrimary and secondry
Primary and secondrymason97
 
Marketing e Imagen personal en los negocios
Marketing e Imagen personal en los negociosMarketing e Imagen personal en los negocios
Marketing e Imagen personal en los negociosLima Innova
 
Comparative Analysis of Promotional Strategy of Central and Shoppers Stop
Comparative Analysis of Promotional Strategy of Central and Shoppers StopComparative Analysis of Promotional Strategy of Central and Shoppers Stop
Comparative Analysis of Promotional Strategy of Central and Shoppers StopAjit gupta
 
Research Project on Vishal Mega Mart
Research Project on Vishal Mega MartResearch Project on Vishal Mega Mart
Research Project on Vishal Mega MartVinay Sabharwal
 
A research project report on comparative study of vishal megamart and its com...
A research project report on comparative study of vishal megamart and its com...A research project report on comparative study of vishal megamart and its com...
A research project report on comparative study of vishal megamart and its com...Projects Kart
 
Caso conflicto: Proyecto de negociación metodo harvard
Caso conflicto: Proyecto de negociación metodo harvard Caso conflicto: Proyecto de negociación metodo harvard
Caso conflicto: Proyecto de negociación metodo harvard 15097978
 
Taller de técnicas de negociación - Oct 09
Taller de técnicas de negociación - Oct 09Taller de técnicas de negociación - Oct 09
Taller de técnicas de negociación - Oct 09PROQUAME
 
“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...
“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...
“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...abhijit055
 
Dissertation project on “MARKETING STRATEGY OF V-MART AND VISHAL MEGA MART :...
Dissertation project on “MARKETING  STRATEGY OF V-MART AND VISHAL MEGA MART :...Dissertation project on “MARKETING  STRATEGY OF V-MART AND VISHAL MEGA MART :...
Dissertation project on “MARKETING STRATEGY OF V-MART AND VISHAL MEGA MART :...amaan Khan
 
Los 7 Elementos del Metodo de Negociacion de Harvard
Los 7 Elementos del Metodo de Negociacion de HarvardLos 7 Elementos del Metodo de Negociacion de Harvard
Los 7 Elementos del Metodo de Negociacion de HarvardFernando Igual
 
FACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLS
FACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLSFACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLS
FACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLSGirish Kumar
 
Soft Drink Industry
Soft Drink IndustrySoft Drink Industry
Soft Drink IndustrySeth P.
 

Viewers also liked (20)

Curso de imagen personal
Curso de imagen personalCurso de imagen personal
Curso de imagen personal
 
Drivers of land use changes and opportunities to reduc emissions in indonesia
Drivers of land use changes and opportunities to reduc emissions in indonesiaDrivers of land use changes and opportunities to reduc emissions in indonesia
Drivers of land use changes and opportunities to reduc emissions in indonesia
 
Bhopsa
Bhopsa Bhopsa
Bhopsa
 
Primary and secondry
Primary and secondryPrimary and secondry
Primary and secondry
 
Secondry research
Secondry researchSecondry research
Secondry research
 
Marketing e Imagen personal en los negocios
Marketing e Imagen personal en los negociosMarketing e Imagen personal en los negocios
Marketing e Imagen personal en los negocios
 
Research project
Research projectResearch project
Research project
 
final ppt
final pptfinal ppt
final ppt
 
Comparative Analysis of Promotional Strategy of Central and Shoppers Stop
Comparative Analysis of Promotional Strategy of Central and Shoppers StopComparative Analysis of Promotional Strategy of Central and Shoppers Stop
Comparative Analysis of Promotional Strategy of Central and Shoppers Stop
 
Research Project on Vishal Mega Mart
Research Project on Vishal Mega MartResearch Project on Vishal Mega Mart
Research Project on Vishal Mega Mart
 
VISHAL MEGA MART
VISHAL MEGA MARTVISHAL MEGA MART
VISHAL MEGA MART
 
A research project report on comparative study of vishal megamart and its com...
A research project report on comparative study of vishal megamart and its com...A research project report on comparative study of vishal megamart and its com...
A research project report on comparative study of vishal megamart and its com...
 
Caso conflicto: Proyecto de negociación metodo harvard
Caso conflicto: Proyecto de negociación metodo harvard Caso conflicto: Proyecto de negociación metodo harvard
Caso conflicto: Proyecto de negociación metodo harvard
 
Taller de técnicas de negociación - Oct 09
Taller de técnicas de negociación - Oct 09Taller de técnicas de negociación - Oct 09
Taller de técnicas de negociación - Oct 09
 
“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...
“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...
“A COMPARATIVE STUDY ON THE CONSUMER’S PREFERENCE TOWARDS BRANDED JEWELLERY O...
 
Dissertation project on “MARKETING STRATEGY OF V-MART AND VISHAL MEGA MART :...
Dissertation project on “MARKETING  STRATEGY OF V-MART AND VISHAL MEGA MART :...Dissertation project on “MARKETING  STRATEGY OF V-MART AND VISHAL MEGA MART :...
Dissertation project on “MARKETING STRATEGY OF V-MART AND VISHAL MEGA MART :...
 
Los 7 Elementos del Metodo de Negociacion de Harvard
Los 7 Elementos del Metodo de Negociacion de HarvardLos 7 Elementos del Metodo de Negociacion de Harvard
Los 7 Elementos del Metodo de Negociacion de Harvard
 
FACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLS
FACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLSFACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLS
FACTORS AFFECTING CONSUMER BEHAVIOUR WHILE SHOPPING AT SHOPPING MALLS
 
Tecnicas de Negociacion
Tecnicas de NegociacionTecnicas de Negociacion
Tecnicas de Negociacion
 
Soft Drink Industry
Soft Drink IndustrySoft Drink Industry
Soft Drink Industry
 

Similar to BIG DATA ANALYSIS

Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudSoam Acharya
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDataWorks Summit
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive Alex Silva
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxMichael Hackstein
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillCharles Givre
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleSean Chittenden
 
Hadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data AnalyticsHadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data AnalyticsRUHULAMINHAZARIKA
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingCascading
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment, Inc.
 

Similar to BIG DATA ANALYSIS (20)

Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-Cloud
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache Drill
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Hadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data AnalyticsHadoop Map-Reduce from the subject: Big Data Analytics
Hadoop Map-Reduce from the subject: Big Data Analytics
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
 
Java days gbg online
Java days gbg onlineJava days gbg online
Java days gbg online
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8
 

More from Nitesh Singh

More from Nitesh Singh (20)

Risk taking and emotions
Risk taking and emotionsRisk taking and emotions
Risk taking and emotions
 
Project report RAILWAY TICKET RESERVATION SYSTEM SAD
Project report RAILWAY TICKET RESERVATION SYSTEM SADProject report RAILWAY TICKET RESERVATION SYSTEM SAD
Project report RAILWAY TICKET RESERVATION SYSTEM SAD
 
The real comedy behind comedy
The real comedy behind comedyThe real comedy behind comedy
The real comedy behind comedy
 
Project report Rs Dry celaners
Project report Rs Dry celaners Project report Rs Dry celaners
Project report Rs Dry celaners
 
Udp vs-tcp
Udp vs-tcpUdp vs-tcp
Udp vs-tcp
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layer
 
Routers vs-switch
Routers vs-switchRouters vs-switch
Routers vs-switch
 
New udp
New udpNew udp
New udp
 
I pv4 format
I pv4 formatI pv4 format
I pv4 format
 
I pv4 addressing
I pv4 addressingI pv4 addressing
I pv4 addressing
 
Hub vs-switch
Hub vs-switchHub vs-switch
Hub vs-switch
 
Ftp
FtpFtp
Ftp
 
Email ftp
Email ftpEmail ftp
Email ftp
 
Www and http
Www and httpWww and http
Www and http
 
Transmission main
Transmission mainTransmission main
Transmission main
 
Ta 104-topology
Ta 104-topologyTa 104-topology
Ta 104-topology
 
Ta 104-topology (1)
Ta 104-topology (1)Ta 104-topology (1)
Ta 104-topology (1)
 
Ta 104-tcp
Ta 104-tcpTa 104-tcp
Ta 104-tcp
 
Ta 104-media-3
Ta 104-media-3Ta 104-media-3
Ta 104-media-3
 
Ta 104-media-2
Ta 104-media-2Ta 104-media-2
Ta 104-media-2
 

BIG DATA ANALYSIS

  • 2. WHAT IS BIG DATA!!!!!! • LARGE AMOUNT OF DATA • Lots of data is being collected and warehoused • Web data, e-commerce • purchases at department/ grocery stores • Bank/Credit Card transactions • Social Network
  • 3. HOW MUCH DATA!! • Google processes 20 PB a day (2008) • Wayback Machine has 3 PB + 100 TB/month (3/2009) • Facebook has 2.5 PB of user data + 15 TB/day (4/2009) • eBay has 6.5 PB of user data + 50 TB/day (5/2009) • CERN’s Large Hydron Collider (LHC) generates 15 PB a year
  • 4. HOW CAN WE ANALYZE THIS MUCH DATA?
  • 5. WHAT IS APACHE HADOOP • Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters ofcommodity hardwareIT IS DEVELOPED BY “APACHE SOFTWARE FOUNDATION”. • ITS STABLE RELEASE VERSION IS “2.4.1 ON JUNE 30,2014. • IT IS WRITTEN IN “JAVA”. • IT IS DISTRIBUTED FILE SYSTEM.
  • 6. BASIC COMPONENT OF HADOOP FRAMEWORK!! • MAP REDUCE • HIVE • PIG • SQOOP • FLUME
  • 7. INFRASTRUCTURE PROVIDER • AWS AMAZON(AMAZON WEB SERVICE). • CLOUD ERA(LEADING PROVIDER) • HORTON WORKS • RACK SPACE. • MAPR • SFDC(SALES FOR DOT COM)
  • 8. MAJOR ROLES OF HADOOP • HADOOP ADMINISTRATOR • HADOOP DEVELOPER.
  • 9. HADOOP DEVELOPER • RUN THE QUERY • EXECUTE THE PROGRAM • MAINTAIN A REPORT.
  • 10. HADOOP ADMINISTRATOR • MAINTAIN INFRASTRUCTURE. • NOT WRITING ANY PROGRAM • CHECK MEMORY MANAGEMENT AND ALL • CHECK IF ANY NODE FAILS • HOW MANY NODES WE HAVE TO CHOOSE.
  • 12. HDFS(HADOOP DISTRIBUTED FILE SYSTEM) CLUSTER DATA CENTER RACKS NODE BLOCKS HDFS
  • 13. WHAT IS HDFS  HDFS IS A WAY TO STORE THE FILES  PREVENTING A DATA TO BE LOOSED.  BASIC COMPONENT OF HADOOP FRAME WORK WHERE ALL FILES STORED  FILES ARE STORED IN DISTRIBUTED MANNER IN TERMS OF BLOCK ie CALLED “HADOOP DISTRIBUTED FILE SYSTEM”.
  • 14. MAP REDUCE ENGINE • TECHNOLOGY FROM GOOGLE. • A MAP REDUCE PROGRAM CONSISTS OF MAP AND REDUCE FUNCTION • A MAP REDUCE JOB IS BROKEN IN TO TASK THAT RUNS IN PARLLEL • IN HDFS PROCESS IS CALLED “DATA DISTRIBUTION” • PROCESS IS MAP RDUCE CALLED “JOB DISTRIBUTION”.
  • 16. TYPES OF NODE NODE MASTER • NAMENODE • JOB TRACKER SLAVE • TASK TRACKER • DATA NODE
  • 17. JOB TRACKER NODE • TAKE THE REQUEST FROM CLIENT • IT PASS THAT INFORMATION TO NAME NODE. • IT IS THE PART OF MAPREDUCE ENGINE. • EVERY CLUSTER HAS ATMOST ON JOB TRACKER NODE
  • 18. TASK TRACKER • MANY PER HADOOP CLUSTER • EXECUTES MAP REDUCER OPERATION • READ BLOCK FROM DATABASE.
  • 19. NAME NODE • ONE PER HADOOP CLUSTER. • MAIN NODE OF HADOOP DISTRIBUTED FILE SYSTEM • IF NAME NODE OR JOB TRACKER NODE FAILS THAN THE WHOLE CLUSTER WILL HALT. • VERY IMPORTANT NODE SO WE HAVE TO HAVE SECONDRY NODE AS WELL. • NAME NODE IS VERY IMPORTANT BECAUSE IT THE INTERMEIDATE OF THE JOB TRACKER NODE AND THE DATA NODE.
  • 20. WRITING THE DATA TO HDFS • WHEN THE USER SEND REQUEST FOR WRITING THE FILE IN TO HDFS,JOB TRACKER ACCEPT IT AND SEND REQUEST TO NAME NODE. • NAME NODE SEND REQUEST TO THE DATA NODE AND ASK IT TO FIND THE PARTICULAR BLOCK SO THAT IT CAN ALLOCATE THAT BLOCK TO THE REQUESTED FILE. • DATA NODE SENDS ACKNOWLEDGEMENT TO THE NAME NODE CONATAINS VACANT BLOCK IT HAS. • THAN NAME NODE SENDS ACKNOWLEDGEMENT TO THE JOB TRACKER AND ASK IT TO WRITE THE FILE AND DATA NODE ACCEPT ITS ACESS AND WRITET THE CONTENT IN DATA NODE BLOCKS.
  • 21. READING THE FILE FROM HDFS • WHEN THE USER SENDS REQUEST ,JOB TRACKER ACCEPT IT AND SENDS REQUEST TO NAME NODE • NAME NODE SENDS REQUEST TO THE DATA NODE ,DATA NODE RETURN ACKNOWLEDGEMENT THAT IS HAD THAT DATA BLOCKS • NAME NODE SEND ACK TO JOB TRACKER NODE • JOB TRACKER SENDS THAT ACCESS TO THE TASK TRACKER NODE • TASK TRACKER NODE READS DATA DIRECTLY FROM THE DATA NODE ,REDIRECT IT TO USER. • AFTER SUCCESS PROCESS STOP.
  • 22.
  • 23.
  • 24. MAP REDUCE PROGRAM REDUCER CLASS • import java.io.IOException; • import org.apache.hadoop.io.IntWritable; • import org.apache.hadoop.io.LongWritable; • import org.apache.hadoop.io.Text; • import org.apache.hadoop.mapred.MapReduceBase; • import org.apache.hadoop.mapred.Mapper; • import org.apache.hadoop.mapred.OutputCollector; • import org.apache.hadoop.mapred.Reporter; • public class mapper extends MapReduceBase implements • Mapper<LongWritable, Text, Text, IntWritable> {
  • 25. CONT…. • @Override • public void map(LongWritable key, Text value, • OutputCollector<Text, IntWritable> output, Reporter reporter) • throws IOException { • • //1-you have an key value pair for the mapper class input like (0:I am neeraj gupta;1:I am learning hadoop} like this • //String fix="250000000"; • String s=value.toString(); • String s1[]=s.split(" ");//SPLIT to arrays of word • //now i have the format fname lname id salary salry 3,7,11,15 • int w=0; • int last=0; • for(int i=3;i<s1.length;i=i+7)
  • 26. CONT…… • w+=Integer.parseInt(s1[i+1])+Integer.parseInt(s1[i+2])+Integer.parseInt(s 1[i+3]);//for total • //int w=Integer.parseInt(s1[i]); • • • output.collect(new Text(s1[i-2]),new IntWritable(w)); • • }//end of the for loop • • • •
  • 27. REDUCER CLASS • import java.io.IOException; • import java.util.Iterator; • import org.apache.hadoop.io.IntWritable; • import org.apache.hadoop.io.Text; • import org.apache.hadoop.mapred.OutputCollector; • import org.apache.hadoop.mapred.MapReduceBase; • import org.apache.hadoop.mapred.Reducer; • import org.apache.hadoop.mapred.Reporter;
  • 28. • public class Reduce extends MapReduceBase implements • Reducer<Text, IntWritable, Text, IntWritable>{ • int s=0; • // int count=6; • String name=""; • @Override • public void reduce(Text key, Iterator<IntWritable> values, • OutputCollector<Text, IntWritable> output, Reporter reporter) • throws IOException { • int w1=0; • int i=0;
  • 29. • IntWritable w=values.next(); • w1+=w.get(); • }//end of the while loop • if(s<w1) • { • s=w1; • name=key.toString(); • output.collect(new Text(name), new IntWritable(s)); • } • }
  • 30. CONFIGURATION FILE • import org.apache.hadoop.fs.Path; • import org.apache.hadoop.io.IntWritable; • import org.apache.hadoop.io.Text; • import org.apache.hadoop.mapred.FileInputFormat; • import org.apache.hadoop.mapred.FileOutputFormat; • import org.apache.hadoop.mapred.JobClient; • import org.apache.hadoop.mapred.JobConf; • import org.apache.hadoop.conf.Configured; • import org.apache.hadoop.util.Tool; • import org.apache.hadoop.util.ToolRunner;
  • 31. • public class WordCount extends Configured implements Tool { • @Override • public int run(String[] args) throws Exception { • if (args.length != 2) { • System.out.printf( • "Usage: %s [generic options] <input dir> <output dir>n", getClass() • .getSimpleName()); • ToolRunner.printGenericCommandUsage(System.out); • return -1;
  • 32. • FileOutputFormat.setOutputPath(conf, new Path(args[1])); • conf.setMapperClass(mapper.class); • conf.setReducerClass(Reduce.class); • conf.setMapOutputKeyClass(Text.class); • conf.setMapOutputValueClass(IntWritable.class); • conf.setOutputKeyClass(Text.class); • conf.setOutputValueClass(IntWritable.class); • JobClient.runJob(conf);
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. HIVE • HIVE IS A DATA WARE HOUSE WHICH IS USING FOR READ THE TABLES LANGUAGE “HQL” HIVE QUERY LANGUAGE • DEFAULT READING FILES FORM HDFS • BECAUSE HDFS KNOWS ONLY MAP REDUCE PROGRAM • PERFOMANCE IS SLOWER THAN MAP REDUCE PRGRAM BECAUSE OF COMPARISION • OUTPUT STORES IN HIVE WARE HOUSE
  • 40. STEPS TO CREATE TABLES AND LOADING THE DATA • STEP1:-TYPE HIVE ON CLI • STEP2:- create table sales(month string,product string,quan string,fc string,prot string,country string,exporter string,buyer string)row format delimited fields terminated by ','; • STEP3:- load data inpath 'spp/docs' overwrite into table docs; • STEP4:- SELECT * FORM TABLE NAME TO SEE THE OUTPUT.
  • 41. SALES DATA ANALYSIS PROJECT ON HADOOP • WE HAVE 1500 ENTRIES IN A CSV FORMAT FILE AND WE NEED TO ANALYZE THAT WHOLE DATA. • WE HAVE MULTIPLE SHEETS OF DATA 4 FILES EACH HAVING 1500 ENTRIES • WE NEED TO MAKE THE REPORT ASKED BY THE USER. • WE CHOOSE HIVE QUERY LANGUAGE TO DO SO. • WE EXPORT THAT FILE IN HIVE WARE HOUSE • WE CONSTUCT THE TABLE IN HIVE • WE RUN SEVERAL QURIES AND FIND THE DIFFERENT RESULT ASK BY THE USER • THIS IS VERY EFFICENT AND FAST
  • 42.
  • 43.
  • 44.
  • 45.