2. Where does Big Data come from
Web data
Social Media
Click stream data
Sensor data
Connected Device
3. Big Data Challenges
Size of Big data.
Unstructured or semi structured data.
Analyzing Big data.
5. How Hadoop solves the Big Data
Problem
Hadoop is built on cluster of
machines.
It handles unstructured and semi
structured data.
Hadoop cluster can scale
horizontally to meet storage
requirements .
Hadoop clusters provide both
storage as well as computation.
7. Retail
Challenges :
Were higher priced items selling in certain markets ?
Should inventory be re-allocated or price optimized based on
geography ?
10. Services in Hadoop
Namenode : Stores and maintains the metadata for HDFS
Secondary namenode : Performs housekeeping functions for
namenode
Datanode : Stores actual HDFS data blocks
Jobtracker : Manages MapReduce jobs and distributes individual tasks
to task trackers.
Tasktracker : Responsible to instantiate and monitor Map and reduce
task.
13. Hadoop Fault tolernace
The Data stored in HDFS is replicated to more than one DataNode,
so that even if one data node goes down we have copy of data on
some other node.
The replication factor by default is 3 and is configurable
The namenode is Single Point of Failure in Cluster and hence the
logs and metadata are periodically backed up to secondary
namenode.
14. HDFS – Hadoop Distributed File
System
Hadoop is the distributed file system for storing huge data sets on
the cluster of commodity hardware with streaming data access
pattern.
18. Hadoop Ecosystems Introduction
Sqoop : Imports data from relational databases.
Flume : Collection and import of log and event data.
Map Reduce : Parallel computation on server clusters.
HDFS : Distributed redundant file system for Hadoop
Pig : High level programming language for Hadoop computations.
Hive : Data warehouse with SQL like access
19. Data Processing systems in Hadoop
Batch Processing
Map Reduce
Stream Processing
Apache Spark
Apache Storm