SlideShare a Scribd company logo
1 of 32
Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Web and Social Media
Analytics using Hadoop
Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
ᗍ Introduction to Big Data and Hadoop
ᗍ Understanding HDFS
ᗍ Introduction to MapReduce
ᗍ Social & Web Analytics via Hadoop
ᗍ BIG Data & Hadoop Course Syllabus
ᗍ Webinar by Skillspeed
Get Started with BIG Data & Hadoop
Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Get Started with BIG Data & Hadoop
Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications
Systems / Enterprises generate huge amount of
data from Terabytes to and even Petabytes of
information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be used for easy processing of such huge Data…..
We will answer how?
Before that let’s understand what is Hadoop?
Get Started with BIG Data & Hadoop
Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop
Characteristics
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks (MPI,
GIRAPH)
YARN
Cluster Resource Management
Get Started with BIG Data & Hadoop
Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hive Architecture
Driver
(Compiler, Optimizer, Executor)
Thrift Server
Web
Interface
Command Line Interface
Metastore
JDBC ODBC
HIVE
HADOOP
(MapReduce + HDFS)
Job Tracker NameNode
Data Node
+
Task Tracker
Get Started with BIG Data & Hadoop
Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Querying
select sum(mt.Trade_Currency_Value),mt.Transaction_Date,mt.Office_ID from share_trans4
mt group by mt.Transaction_Date,mt.Office_ID order by Transaction_Date asc
Syntax
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]
[LIMIT number]
Example:
Get Started with BIG Data & Hadoop
Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
HDFS
Get Started with BIG Data & Hadoop
Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity
machines, providing very high aggregate bandwidth across the cluster
HDFS and its Components
The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written
in Java for the Hadoop framework
NameNode
ᗍ Storage side master of the system
ᗍ It maintains, manages, and administers the data blocks present on the DataNodes
DataNodes
ᗍ Slave machines which provide the actual and redundant storage
ᗍ End points for client read and write operations
Get Started with BIG Data & Hadoop
Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
HDFS Architecture
NameNode
Client
Rack 1 Client Rack 2
Metadata (Name, replicas,...):
/home/foo/data, 3,…
Read DataNodes
Write
Replication
Blocks
Block ops
DataNodes
Metadata ops
Get Started with BIG Data & Hadoop
Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
HDFS NameNode
Keeps Meta data in Main Memory
ᗍ The entire metadata is in main memory
ᗍ FS meta-data is not loaded from hard disk
Metadata type
ᗍ Files in HDFS
ᗍ Data Blocks for each file
ᗍ DataNodes for each block
ᗍ File attributes, e.g. access time, replication factor, access control
Get Started with BIG Data & Hadoop
Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Secondary NameNode
Secondary NameNode:
ᗍ In HDFS 1.0, not a hot standby for the NameNode
ᗍ By Default connects to NameNode every hour*
ᗍ Housekeeping, backup of NameNode metadata
ᗍ Saved metadata is used to bring up the secondary
NameNode
NameNode
Secondary
NameNode
Metadata
I’’ll take metadata
every hour and
will make it secure
Get Started with BIG Data & Hadoop
Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce
Get Started with BIG Data & Hadoop
Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce – Scenario
Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop
Suppose, you are the
handling a project which has
x tasks and takes 100 hours
for one resource to complete
1 x 100 = 100 hours
100/10(resources) = 10 hours
Get Started with BIG Data & Hadoop
Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Similarly,
= 100 hours 100/10 = 10 hours
Map Reduce – Scenario
Get Started with BIG Data & Hadoop
Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
More Scenarios on Map-Reduce
Problem Statement:
Find maximum stock market levels recorded in a span of 5 years
Problem Statement:
De-identify personal identifier information
Get Started with BIG Data & Hadoop
Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Traditional Solution
matchesSplit Data
Very
Big
Data
All
matches
grep
grep
grep
cat
grep
:
matches
matches
matches
Split Data
Split Data
Split Data
Get Started with BIG Data & Hadoop
Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Solution
Very
Big
Input
Split Data
All
matches
:
Split Data
Split Data
Split Data
M
A
P
R
E
D
U
C
E
MapReduce Framework
Get Started with BIG Data & Hadoop
Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Advantages
Two biggest advantages:
ᗍ Takes processing to the data
ᗍ Allows processing data in parallel
a b
c
Map Task
HDFS Block
Data Center
Rack
Node
Get Started with BIG Data & Hadoop
Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Flow
1. Input data is present in data nodes
2. Map tasks = Input Splits
3. Mappers produce intermediate data
4. Data exchanged among nodes in “shuffling”
5. All data of same key goes to same reducer
6. Reducer output stored at output location
Node 1
INPUT DATA
Map
Node 2
Map
Node 1
Reduce
Node 1
Reduce
Get Started with BIG Data & Hadoop
Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Expected?
In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview
This will help you analyze the importance of the topics under study!
Get Started with BIG Data & Hadoop
Slide 25© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is the use of Namenode in HDFS?
What is DataNode in HDFS?
What is Job Tracker in HDFS?
What is MapReduce?
How does an Hadoop application look like on their basic components?
And many more…………….
The Top 5 Interview Questions
Get Started with BIG Data & Hadoop
Slide 26© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
Slide 27© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime access
to Course
Content via
LMS
100% Placement
Assistance
24x7 Support
Get Started with BIG Data & Hadoop
Slide 28© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big
Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and
Data Loading
Module 3
Introduction to Map
Reduce
Module 4
Advanced Map Reduce
Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and
Introduction to Hive
Module 7
Advanced Hive
Concepts
Module 8
Extending Hive and
HBase Introduction
Module 9
Advanced HBase and
Oozie Introduction
Module 10
Project Set-up
Discussion
Get Started with BIG Data & Hadoop
Slide 29© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
Slide 30© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact us..
Get Started with BIG Data & Hadoop
Slide 31© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots
http://iconizer.net/en/search/1/collection:Practika
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010
Social Analytics via Hadoop

More Related Content

More from Skillspeed

Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
Skillspeed
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
Skillspeed
 
Top 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer WebinarTop 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer Webinar
Skillspeed
 
Decoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOpsDecoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOps
Skillspeed
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Skillspeed
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
Skillspeed
 

More from Skillspeed (17)

Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
Top 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer WebinarTop 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer Webinar
 
Decoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOpsDecoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOps
 
Skillspeed Affiliate Program
Skillspeed Affiliate ProgramSkillspeed Affiliate Program
Skillspeed Affiliate Program
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
Hadoop for Business Intelligence Professionals
Hadoop for Business Intelligence ProfessionalsHadoop for Business Intelligence Professionals
Hadoop for Business Intelligence Professionals
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Social Analytics via Hadoop

  • 1. Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Web and Social Media Analytics using Hadoop
  • 2. Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Session Objectives ᗍ Introduction to Big Data and Hadoop ᗍ Understanding HDFS ᗍ Introduction to MapReduce ᗍ Social & Web Analytics via Hadoop ᗍ BIG Data & Hadoop Course Syllabus ᗍ Webinar by Skillspeed Get Started with BIG Data & Hadoop
  • 3. Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Get Started with BIG Data & Hadoop
  • 4. Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information It’s very difficult to manage such huge data…… Get Started with BIG Data & Hadoop
  • 5. Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Who Generates Big Data? Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data? Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
  • 6. Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop can be used for easy processing of such huge Data….. We will answer how? Before that let’s understand what is Hadoop? Get Started with BIG Data & Hadoop
  • 7. Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop and its Characteristics Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model It is an Open-source Data Management technology with scale-out storage and distributed processing Hadoop Characteristics Flexible Reliable Economical Scalable Get Started with BIG Data & Hadoop
  • 8. Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop Ecosystem Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) YARN Cluster Resource Management Get Started with BIG Data & Hadoop
  • 9. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hive Architecture Driver (Compiler, Optimizer, Executor) Thrift Server Web Interface Command Line Interface Metastore JDBC ODBC HIVE HADOOP (MapReduce + HDFS) Job Tracker NameNode Data Node + Task Tracker Get Started with BIG Data & Hadoop
  • 10. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Querying select sum(mt.Trade_Currency_Value),mt.Transaction_Date,mt.Office_ID from share_trans4 mt group by mt.Transaction_Date,mt.Office_ID order by Transaction_Date asc Syntax SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table_reference [WHERE where_condition] [GROUP BY col_list] [HAVING having_condition] [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]] [LIMIT number] Example: Get Started with BIG Data & Hadoop
  • 11. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com HDFS Get Started with BIG Data & Hadoop
  • 12. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster HDFS and its Components The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written in Java for the Hadoop framework NameNode ᗍ Storage side master of the system ᗍ It maintains, manages, and administers the data blocks present on the DataNodes DataNodes ᗍ Slave machines which provide the actual and redundant storage ᗍ End points for client read and write operations Get Started with BIG Data & Hadoop
  • 13. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com HDFS Architecture NameNode Client Rack 1 Client Rack 2 Metadata (Name, replicas,...): /home/foo/data, 3,… Read DataNodes Write Replication Blocks Block ops DataNodes Metadata ops Get Started with BIG Data & Hadoop
  • 14. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com HDFS NameNode Keeps Meta data in Main Memory ᗍ The entire metadata is in main memory ᗍ FS meta-data is not loaded from hard disk Metadata type ᗍ Files in HDFS ᗍ Data Blocks for each file ᗍ DataNodes for each block ᗍ File attributes, e.g. access time, replication factor, access control Get Started with BIG Data & Hadoop
  • 15. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Secondary NameNode Secondary NameNode: ᗍ In HDFS 1.0, not a hot standby for the NameNode ᗍ By Default connects to NameNode every hour* ᗍ Housekeeping, backup of NameNode metadata ᗍ Saved metadata is used to bring up the secondary NameNode NameNode Secondary NameNode Metadata I’’ll take metadata every hour and will make it secure Get Started with BIG Data & Hadoop
  • 16. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Map Reduce Get Started with BIG Data & Hadoop
  • 17. Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Map Reduce – Scenario Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop Suppose, you are the handling a project which has x tasks and takes 100 hours for one resource to complete 1 x 100 = 100 hours 100/10(resources) = 10 hours Get Started with BIG Data & Hadoop
  • 18. Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Similarly, = 100 hours 100/10 = 10 hours Map Reduce – Scenario Get Started with BIG Data & Hadoop
  • 19. Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com More Scenarios on Map-Reduce Problem Statement: Find maximum stock market levels recorded in a span of 5 years Problem Statement: De-identify personal identifier information Get Started with BIG Data & Hadoop
  • 20. Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Traditional Solution matchesSplit Data Very Big Data All matches grep grep grep cat grep : matches matches matches Split Data Split Data Split Data Get Started with BIG Data & Hadoop
  • 21. Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Solution Very Big Input Split Data All matches : Split Data Split Data Split Data M A P R E D U C E MapReduce Framework Get Started with BIG Data & Hadoop
  • 22. Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Advantages Two biggest advantages: ᗍ Takes processing to the data ᗍ Allows processing data in parallel a b c Map Task HDFS Block Data Center Rack Node Get Started with BIG Data & Hadoop
  • 23. Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Flow 1. Input data is present in data nodes 2. Map tasks = Input Splits 3. Mappers produce intermediate data 4. Data exchanged among nodes in “shuffling” 5. All data of same key goes to same reducer 6. Reducer output stored at output location Node 1 INPUT DATA Map Node 2 Map Node 1 Reduce Node 1 Reduce Get Started with BIG Data & Hadoop
  • 24. Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com What is Expected? In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview This will help you analyze the importance of the topics under study! Get Started with BIG Data & Hadoop
  • 25. Slide 25© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com What is the use of Namenode in HDFS? What is DataNode in HDFS? What is Job Tracker in HDFS? What is MapReduce? How does an Hadoop application look like on their basic components? And many more……………. The Top 5 Interview Questions Get Started with BIG Data & Hadoop
  • 26. Slide 26© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Job Trends – Hadoop Get Started with BIG Data & Hadoop
  • 27. Slide 27© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support Get Started with BIG Data & Hadoop
  • 28. Slide 28© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Course Topics Module 1 Introduction to Big Data and Hadoop Module 2 HDFS Internals, Hadoop Configurations and Data Loading Module 3 Introduction to Map Reduce Module 4 Advanced Map Reduce Concepts Module 5 Introduction to Pig Module 6 Advanced Pig and Introduction to Hive Module 7 Advanced Hive Concepts Module 8 Extending Hive and HBase Introduction Module 9 Advanced HBase and Oozie Introduction Module 10 Project Set-up Discussion Get Started with BIG Data & Hadoop
  • 29. Slide 29© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Corporate Partners Get Started with BIG Data & Hadoop
  • 30. Slide 30© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at sales@skillspeed.com Contact us.. Get Started with BIG Data & Hadoop
  • 31. Slide 31© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Image References Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots http://iconizer.net/en/search/1/collection:Practika http://findicons.com/icon/66444/user_group http://www.virtualizor.com/tour https://accounts.it.et.byu.edu/ http://www.clipartsfree.net/tag/server.html http://www.gopixpic.com/16/time-clock-icon-png-download http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/ http://www.lincs.fr/research/areas/big-data/ http://www.counsellingpages.co.uk/ http://langfordsconsultancy.com/langfords-training-support-package/ http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010

Editor's Notes

  1. SkillSpeed offer virtual instructor lead courses designed to bridge the time to competency gap experienced by the technology companies. USP of SkillSpeed is the subject matter expert (SME). SMEs are industry experts and has a good understanding and hands-on industry experience of the technology. This industry expert designs, develops, and delivers the course. SkillSpeed provides you: Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Real life industry case studies  - Live Virtual Interactions Interaction with industry experts  - Lifetime access to all course content via the LMS   - 24*7 support   - 100% placement assistance