SlideShare a Scribd company logo
1 of 20
HADOOP
Presentation by:
Sharad Pandey
Things included:
• History of data
• What is Big Data?
• Distributed Computing Vs Parallelization
• Google’s solution
• Hadoop
• HDFS
• MapReduce
• A video to explain map reduce
HISTORY OF DATA!!!
• Due to the advent of new technologies, devices, and
communication means like social networking sites, the amount
of data produced by mankind is growing rapidly every year.
• The amount of data produced by us from the beginning of time
till 2003 was 5 billion gigabytes. If you pile up the data in the form
of disks it may fill an entire football field.
• The same amount was created in every two days in 2011, and in
every ten minutes in 2013. This rate is still growing enormously.
Though all this information produced is meaningful and can be
useful when processed, it is being neglected.
What is Big Data?
• Big data means really a big data, it is a collection of large
datasets that cannot be processed using traditional
computing techniques. Big data is not merely a data,
rather it has become a complete subject, which involves
various tools, techniques and frameworks.
• Black box data
• Social media data
• power grid data
• Search engine data
Traditional Approach
What Caused The Problem?
year Data storage
1990 1000(MB)
2010 1000(GB)
year Transfer rate of
data
1990 4.4 mbps
2010 100 mbps
So What Is The Problem?
• The transfer speed is around 100 MB/s
• A standard disk is 1 Terabyte
• Time to read entire disk= 10000 seconds or 3 Hours!
• Increase in processing time may not be as helpful because
• Network bandwidth is now more of a limiting factor
• Physical limits of processor chips have been reached
So What do We Do?
• The obvious solution is that we use multiple processors to solve
the same problem by fragmenting it into pieces.
• Imagine if we had 100 drives, each holding one hundredth of the
data. Working in parallel, we could read the data in under two
minutes.
Distributed Computing Vs
Parallelization
Parallelization Distributed Computing
Distributed Computing
The key issues involved in this Solution:
• Hardware failure
• Combine the data after analysis
• Network Associated Problems
Google’s Solution
Above diagram shows various commodity hardware which could be
single CPU machines or servers with higher capacity.
Mike cafarella and Doug Cutting
Hadoop
• Doug Cutting, Mike Cafarella started an Open Source Project
called HADOOP in 2005.
• Now Apache Hadoop is a registered trademark of the Apache
Software Foundation.
• Hadoop runs applications using the MapReduce algorithm.
• Hadoop = HDFS + MapReduce.
Hadoop architecture
HDFS(Hadoop Distributed File System)
• Hadoop comes with a distributed file system called HDFS, which
stands for Hadoop Distributed Filesystem.
• HDFS, the Hadoop Distributed File System, is a distributed file
system designed to hold very large amounts of data (terabytes
or even petabytes), and provide high-throughput access to this
information.
• HDFS is highly fault tolerant and designed using low-cost
hardware.
MapReduce
• It is a programming model.
• Programs written in this functional style are automatically
parallelized and executed on a large cluster of commodity
machines.
MapReduce
MAP
map function that
processes a key/value
pair to generate a set
of intermediate
key/value pairs
REDUCE
and a reduce function
that merges all
intermediate values
associated with the
same intermediate key.
Queries
References
• http://www.tutorialspoint.com/hadoop
• http://www.youtube.com
• http://www.Wikipedia.com
• https://hadoop.apache.org/
Thank you

More Related Content

What's hot

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
jeffturner
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
royans
 

What's hot (20)

Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Hadoop
HadoopHadoop
Hadoop
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 

Similar to Hadoop and big data

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 

Similar to Hadoop and big data (20)

Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big Data And Hadoop
Big Data And HadoopBig Data And Hadoop
Big Data And Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 
Big datatraining ranga_1
Big datatraining ranga_1Big datatraining ranga_1
Big datatraining ranga_1
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 

Hadoop and big data

  • 2. Things included: • History of data • What is Big Data? • Distributed Computing Vs Parallelization • Google’s solution • Hadoop • HDFS • MapReduce • A video to explain map reduce
  • 3. HISTORY OF DATA!!! • Due to the advent of new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing rapidly every year. • The amount of data produced by us from the beginning of time till 2003 was 5 billion gigabytes. If you pile up the data in the form of disks it may fill an entire football field. • The same amount was created in every two days in 2011, and in every ten minutes in 2013. This rate is still growing enormously. Though all this information produced is meaningful and can be useful when processed, it is being neglected.
  • 4. What is Big Data? • Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, techniques and frameworks. • Black box data • Social media data • power grid data • Search engine data
  • 5.
  • 7. What Caused The Problem? year Data storage 1990 1000(MB) 2010 1000(GB) year Transfer rate of data 1990 4.4 mbps 2010 100 mbps
  • 8. So What Is The Problem? • The transfer speed is around 100 MB/s • A standard disk is 1 Terabyte • Time to read entire disk= 10000 seconds or 3 Hours! • Increase in processing time may not be as helpful because • Network bandwidth is now more of a limiting factor • Physical limits of processor chips have been reached
  • 9. So What do We Do? • The obvious solution is that we use multiple processors to solve the same problem by fragmenting it into pieces. • Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in under two minutes.
  • 11. Distributed Computing The key issues involved in this Solution: • Hardware failure • Combine the data after analysis • Network Associated Problems
  • 12. Google’s Solution Above diagram shows various commodity hardware which could be single CPU machines or servers with higher capacity.
  • 13. Mike cafarella and Doug Cutting
  • 14. Hadoop • Doug Cutting, Mike Cafarella started an Open Source Project called HADOOP in 2005. • Now Apache Hadoop is a registered trademark of the Apache Software Foundation. • Hadoop runs applications using the MapReduce algorithm. • Hadoop = HDFS + MapReduce.
  • 16. HDFS(Hadoop Distributed File System) • Hadoop comes with a distributed file system called HDFS, which stands for Hadoop Distributed Filesystem. • HDFS, the Hadoop Distributed File System, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes), and provide high-throughput access to this information. • HDFS is highly fault tolerant and designed using low-cost hardware.
  • 17. MapReduce • It is a programming model. • Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. MapReduce MAP map function that processes a key/value pair to generate a set of intermediate key/value pairs REDUCE and a reduce function that merges all intermediate values associated with the same intermediate key.
  • 19. References • http://www.tutorialspoint.com/hadoop • http://www.youtube.com • http://www.Wikipedia.com • https://hadoop.apache.org/