Automating Google Workspace (GWS) & more with Apps Script
Webinar: Big Data & Hadoop - When not to use Hadoop
1. www.edureka.co/big-data-and-hadoop
When not to use Hadoop
View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop
For more details please contact us:
US : 1800 275 9730 (toll free)
INDIA : +91 88808 62004
Email Us : sales@edureka.co
For Queries:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
2. www.edureka.co/big-data-and-hadoop
Slide2
Objectives
At the end of this module, you will be able to…
Understand When not to use Hadoop
»Real Time Analytics
»Not a Replacement
»Dataset Size
»Complexity
»Security
Understand When to use Hadoop
»Huge Unstructured Datasets
»Response Time is Not an Issue
»Future Planning
»Multiple Frameworks for Big Data
»Lifetime Data Availability
5. Slide5
www.edureka.co/big-data-and-hadoop
If you want to do some Real Time Analytics, where you are expecting the result quickly, Hadoop should not be used directly
Hadoop works on Batch processing, hence the response time is high
Day1
Day2
Day 3
Day 4
.........
……….
……….
Day n
Day1
Day2
Day 3
Day 4
.........
……….
……….
Day n
Input
Data
Processing
Data
Input
Data
Processing
Data
Input
Data
Processing
Data
Input Data
Processing Data using MR
Time Lag
Real Time Analytics
8. Slide8
www.edureka.co/big-data-and-hadoop
Hadoop is not a replacement for your existing data processing infrastructure
After processing the data in Hadoop you need to send the output to relational database technologies for BI, decision support, reporting etc.
It is not going to replace your database, but your database isn’t likely to replace Hadoop either
Different tools for different jobs
Not a Replacement for Existing Infrastructure
9. Slide9
www.edureka.co/big-data-and-hadoop
Hadoop framework is not recommendable for small structured datasets as you have other tools available in the market which can do this work quite easily and at a fast pace than Hadoop like MS excel, RDBMS etc.
For a small data analytics, Hadoop can be costlier than other tools
Merge all the small files into one
Multiple Smaller Datasets –Accepted Way
10. Slide10
www.edureka.co/big-data-and-hadoop
Multiple Smaller Datasets –Accepted Way4225284
EachfileofxMB
Slow Execution –10400 ms4225284
Alltheabovefilesmergedintoonefile(9xMB)
Fast Execution –6140 ms
Same Output
Same Input
11. Slide11
www.edureka.co/big-data-and-hadoop
Unless you have a better understanding of the Hadoop framework, its not suggested to use Hadoop for production
Learning Hadoop and its eco-system tools and deciding which technology suits your need is again a different level of complexity
Novice Hadoopers
12. Slide12
www.edureka.co/big-data-and-hadoop
Many enterprises -especially within the highly regulated industries dealing with sensitive data -aren’t able to move as quick as they would like, towards implementing Big Data projects and Hadoop
“Example Health-care data used by Insurance companies to calculate premium”
Where Security is the Primary Concern?
They don’t have to hesitate though, as many of the security and compliance challenges are being continuously worked upon and can be surmountable (for example, by using Apache Accumulo on top of Hadoop).
15. Slide15
www.edureka.co/big-data-and-hadoop
Your have different types of data: structured, semi-structured and unstructured
The data set is huge in size i.e. several Terabytes or Petabytes
You are not in a hurry for Answers
Data Size and Data Diversity
16. Slide16
www.edureka.co/big-data-and-hadoop
To implement Hadoop on your data you should first understand the level of complexity of data and the rate in which it is going to grow
So we need a cluster planning, it may begin with building a small or medium cluster in your industry as per data (in GBs or few TBs ) available at present and scale up your cluster in future depending on the growth of your data
Future Planning
17. Slide17
www.edureka.co/big-data-and-hadoop
Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R , Python, Spark, MongoDB etc.
Multiple Frameworks for Big Data
20. LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
Slide20
www.edureka.co/big-data-and-hadoop
How it Works?
21. Slide21
www.edureka.co/big-data-and-hadoop
Module 1
»Understanding Big Data and Hadoop
Module 2
»Hadoop Architecture and HDFS
Module 3
»Hadoop MapReduce Framework -I
Module 4
»Hadoop MapReduce Framework -II
Module 5
»Advance MapReduce
Course Topics
Module 6
»PIG
Module 7
»HIVE
Module 8
»Advance HIVE and HBase
Module 9
»Advance HBase
Module 10
»Oozie and Hadoop Project