SlideShare a Scribd company logo
1 of 20
Download to read offline
Big Data
Where does Big Data come from
 Web data
 Social Media
 Click stream data
 Sensor data
 Connected Device
Big Data Challenges
 Size of Big data.
 Unstructured or semi structured data.
 Analyzing Big data.
Cred_hadoop_presenatation
How Hadoop solves the Big Data
Problem
 Hadoop is built on cluster of
machines.
 It handles unstructured and semi
structured data.
 Hadoop cluster can scale
horizontally to meet storage
requirements .
 Hadoop clusters provide both
storage as well as computation.
Solving Big data problems with
Hadoop
ENTERPRISE USE CASES
Retail
 Challenges :
Were higher priced items selling in certain markets ?
Should inventory be re-allocated or price optimized based on
geography ?
Manufacturing
 Challenges:
Monitor and predict network failure
Hadoop - Introduction
HADOOP = HDFS + MAPREDUCE
Services in Hadoop
 Namenode : Stores and maintains the metadata for HDFS
 Secondary namenode : Performs housekeeping functions for
namenode
 Datanode : Stores actual HDFS data blocks
 Jobtracker : Manages MapReduce jobs and distributes individual tasks
to task trackers.
 Tasktracker : Responsible to instantiate and monitor Map and reduce
task.
Hadoop Cluster Architecture
Hadoop job management
Hadoop Fault tolernace
 The Data stored in HDFS is replicated to more than one DataNode,
so that even if one data node goes down we have copy of data on
some other node.
 The replication factor by default is 3 and is configurable
 The namenode is Single Point of Failure in Cluster and hence the
logs and metadata are periodically backed up to secondary
namenode.
HDFS – Hadoop Distributed File
System
 Hadoop is the distributed file system for storing huge data sets on
the cluster of commodity hardware with streaming data access
pattern.
Map Reduce Concept
Hadoop Streaming API
Hadoop technology stack
Hadoop Ecosystems Introduction
 Sqoop : Imports data from relational databases.
 Flume : Collection and import of log and event data.
 Map Reduce : Parallel computation on server clusters.
 HDFS : Distributed redundant file system for Hadoop
 Pig : High level programming language for Hadoop computations.
 Hive : Data warehouse with SQL like access
Data Processing systems in Hadoop
Batch Processing
 Map Reduce
Stream Processing
 Apache Spark
 Apache Storm
Thank you.

More Related Content

What's hot

Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 

What's hot (19)

Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Big data
Big dataBig data
Big data
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
Hadoop
HadoopHadoop
Hadoop
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 
Big data
Big dataBig data
Big data
 

Viewers also liked

A review of slicing techniques in software engineering
A review of slicing techniques in software engineeringA review of slicing techniques in software engineering
A review of slicing techniques in software engineeringSalam Shah
 
Building the Global System
Building the Global SystemBuilding the Global System
Building the Global SystemLuigi Guarino
 
Дятлова
ДятловаДятлова
Дятловаrcnovo
 
The help kathryn stockett
The help   kathryn stockettThe help   kathryn stockett
The help kathryn stockettYin Doran
 
Закон об онлайн-кассах и сроки его реализации
Закон об онлайн-кассах и сроки его реализацииЗакон об онлайн-кассах и сроки его реализации
Закон об онлайн-кассах и сроки его реализацииMoySklad
 
Launch Pad Book Sept 2013
Launch Pad Book Sept 2013Launch Pad Book Sept 2013
Launch Pad Book Sept 2013ipreproperties
 
Why OER?
Why OER?Why OER?
Why OER?croduin
 
АТОЛ. Презентация 2
АТОЛ. Презентация 2АТОЛ. Презентация 2
АТОЛ. Презентация 2MoySklad
 
Apresentatçao total one 07 08-2015
Apresentatçao total one 07 08-2015Apresentatçao total one 07 08-2015
Apresentatçao total one 07 08-2015Wellisson Araujo
 
Fashion Jobs Central | Fashion Internships | Fashion Designer Jobs
Fashion Jobs Central | Fashion Internships | Fashion Designer JobsFashion Jobs Central | Fashion Internships | Fashion Designer Jobs
Fashion Jobs Central | Fashion Internships | Fashion Designer JobsJonathan_ht
 
JS-resume final
JS-resume finalJS-resume final
JS-resume finalJuan Silva
 
Finding a job in Canada
Finding a job in CanadaFinding a job in Canada
Finding a job in CanadaOnem Osuoka
 
saskia_koerner_resume_2015_photo
saskia_koerner_resume_2015_photosaskia_koerner_resume_2015_photo
saskia_koerner_resume_2015_photoSaskia Koerner
 
Integração das tecnologias móveis nas sala de aula através do eTwinning
Integração das tecnologias móveis nas sala de aula através do eTwinningIntegração das tecnologias móveis nas sala de aula através do eTwinning
Integração das tecnologias móveis nas sala de aula através do eTwinningAgrupamento de Escolas da Batalha
 

Viewers also liked (20)

5º mandamento (2)
5º mandamento (2)5º mandamento (2)
5º mandamento (2)
 
A review of slicing techniques in software engineering
A review of slicing techniques in software engineeringA review of slicing techniques in software engineering
A review of slicing techniques in software engineering
 
Building the Global System
Building the Global SystemBuilding the Global System
Building the Global System
 
Дятлова
ДятловаДятлова
Дятлова
 
The help kathryn stockett
The help   kathryn stockettThe help   kathryn stockett
The help kathryn stockett
 
Ptk2 Representacio grafica
Ptk2 Representacio graficaPtk2 Representacio grafica
Ptk2 Representacio grafica
 
Закон об онлайн-кассах и сроки его реализации
Закон об онлайн-кассах и сроки его реализацииЗакон об онлайн-кассах и сроки его реализации
Закон об онлайн-кассах и сроки его реализации
 
Scoop
ScoopScoop
Scoop
 
Launch Pad Book Sept 2013
Launch Pad Book Sept 2013Launch Pad Book Sept 2013
Launch Pad Book Sept 2013
 
курить здоровью вредить! моя
курить   здоровью вредить! моякурить   здоровью вредить! моя
курить здоровью вредить! моя
 
Why OER?
Why OER?Why OER?
Why OER?
 
CRISPRfrancesca
CRISPRfrancescaCRISPRfrancesca
CRISPRfrancesca
 
АТОЛ. Презентация 2
АТОЛ. Презентация 2АТОЛ. Презентация 2
АТОЛ. Презентация 2
 
Apresentatçao total one 07 08-2015
Apresentatçao total one 07 08-2015Apresentatçao total one 07 08-2015
Apresentatçao total one 07 08-2015
 
Fashion Jobs Central | Fashion Internships | Fashion Designer Jobs
Fashion Jobs Central | Fashion Internships | Fashion Designer JobsFashion Jobs Central | Fashion Internships | Fashion Designer Jobs
Fashion Jobs Central | Fashion Internships | Fashion Designer Jobs
 
JS-resume final
JS-resume finalJS-resume final
JS-resume final
 
Finding a job in Canada
Finding a job in CanadaFinding a job in Canada
Finding a job in Canada
 
CV NLN 2016
CV NLN 2016CV NLN 2016
CV NLN 2016
 
saskia_koerner_resume_2015_photo
saskia_koerner_resume_2015_photosaskia_koerner_resume_2015_photo
saskia_koerner_resume_2015_photo
 
Integração das tecnologias móveis nas sala de aula através do eTwinning
Integração das tecnologias móveis nas sala de aula através do eTwinningIntegração das tecnologias móveis nas sala de aula através do eTwinning
Integração das tecnologias móveis nas sala de aula através do eTwinning
 

Similar to Cred_hadoop_presenatation

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 

Similar to Cred_hadoop_presenatation (20)

Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 

Cred_hadoop_presenatation

  • 2. Where does Big Data come from  Web data  Social Media  Click stream data  Sensor data  Connected Device
  • 3. Big Data Challenges  Size of Big data.  Unstructured or semi structured data.  Analyzing Big data.
  • 5. How Hadoop solves the Big Data Problem  Hadoop is built on cluster of machines.  It handles unstructured and semi structured data.  Hadoop cluster can scale horizontally to meet storage requirements .  Hadoop clusters provide both storage as well as computation.
  • 6. Solving Big data problems with Hadoop ENTERPRISE USE CASES
  • 7. Retail  Challenges : Were higher priced items selling in certain markets ? Should inventory be re-allocated or price optimized based on geography ?
  • 9. Hadoop - Introduction HADOOP = HDFS + MAPREDUCE
  • 10. Services in Hadoop  Namenode : Stores and maintains the metadata for HDFS  Secondary namenode : Performs housekeeping functions for namenode  Datanode : Stores actual HDFS data blocks  Jobtracker : Manages MapReduce jobs and distributes individual tasks to task trackers.  Tasktracker : Responsible to instantiate and monitor Map and reduce task.
  • 13. Hadoop Fault tolernace  The Data stored in HDFS is replicated to more than one DataNode, so that even if one data node goes down we have copy of data on some other node.  The replication factor by default is 3 and is configurable  The namenode is Single Point of Failure in Cluster and hence the logs and metadata are periodically backed up to secondary namenode.
  • 14. HDFS – Hadoop Distributed File System  Hadoop is the distributed file system for storing huge data sets on the cluster of commodity hardware with streaming data access pattern.
  • 18. Hadoop Ecosystems Introduction  Sqoop : Imports data from relational databases.  Flume : Collection and import of log and event data.  Map Reduce : Parallel computation on server clusters.  HDFS : Distributed redundant file system for Hadoop  Pig : High level programming language for Hadoop computations.  Hive : Data warehouse with SQL like access
  • 19. Data Processing systems in Hadoop Batch Processing  Map Reduce Stream Processing  Apache Spark  Apache Storm