SlideShare a Scribd company logo
1 of 14
Problem Analysis
Problem Analysis
• Experiments in ICSI Desktop Cluster but in
reality Big Data dataset has to handle 100 pada
byte of data .
• Heavy network traffic is not considered .
Problem Analysis
• Mapreduce has latency as
• Mapping phase peak rate is not high .
• Need Bundle data for fast mapping .
• Limited Reducer as each reducer output file
is different .
Problem Analysis
Problem Analysis
• Mapreduce has latency as
• Hadoop do not support broadcasting
parameter references to all maps node thus all
map node has to bundle same parameter .
• Secondary buffer needed to swapping .
Problem Analysis
• Hadoop has drawbacks on implementing DFS .
• Mapreduce framework performs very poorly in
slot-base memory(1 slot 1 task) and iterative
processing tasks like graph processing.
• The MapReduce does not work when there are
computational dependencies in the data .
Problem Analysis
• To make the implementation of research
suggestion is more non-intuitive & complicated
than is necessary .
• If new data is added the jobs need to run over
the entire set again .
• A single failure kills all queued and running
jobs .
Suggestion
Suggestion
• Augmenting MapReduce with ad hoc support
may solve iterative and random access to its
dataset.
• Sampling also may use to solve iterative
problem .
Review Questions
Review Questions
• Why Mapping phase peak rate is not high ?
 It writes on intermediate data file .
• Why Hadoop do not support broadcasting ?
 As JAVA do not support sharing references
during mapping task .
Review Questions
• Mapreduce performs poorly in iterative why ?
 The system merge iterations and
materializing data only when required .
• Why new data cases to run whole job again ?
 Hadoop does not function well for random
access to its datasets . But YARN promise to
support that .
MapReduce - Hadoop - Big Data
MapReduce - Hadoop - Big Data

More Related Content

What's hot

What's hot (20)

Is This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleIs This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the People
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
 
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
 
AutoML Toolkit – Deep Dive
AutoML Toolkit – Deep DiveAutoML Toolkit – Deep Dive
AutoML Toolkit – Deep Dive
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
Building a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data CloudBuilding a Front End for a Sensor Data Cloud
Building a Front End for a Sensor Data Cloud
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
SparkCruise: Automatic Computation Reuse in Apache Spark
SparkCruise: Automatic Computation Reuse in Apache SparkSparkCruise: Automatic Computation Reuse in Apache Spark
SparkCruise: Automatic Computation Reuse in Apache Spark
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Geek Sync | New Features in SQL Server That Will Change the Way You Tune
Geek Sync | New Features in SQL Server That Will Change the Way You TuneGeek Sync | New Features in SQL Server That Will Change the Way You Tune
Geek Sync | New Features in SQL Server That Will Change the Way You Tune
 
Baseline activities an data management
Baseline activities an data managementBaseline activities an data management
Baseline activities an data management
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 

Viewers also liked

The role of the media in kenya
The role of the media in kenyaThe role of the media in kenya
The role of the media in kenya
Alice Chepkurui
 
Disadvantages of Technology in Communication
Disadvantages of Technology in CommunicationDisadvantages of Technology in Communication
Disadvantages of Technology in Communication
Anant Lodha
 
Needs and importance of corporate social responsibility
Needs and importance of corporate social responsibilityNeeds and importance of corporate social responsibility
Needs and importance of corporate social responsibility
AngelinDafni
 

Viewers also liked (13)

Library Policies: The Good, The Bad, and The Ugly
Library Policies: The Good, The Bad, and The UglyLibrary Policies: The Good, The Bad, and The Ugly
Library Policies: The Good, The Bad, and The Ugly
 
The Modern Library
The Modern LibraryThe Modern Library
The Modern Library
 
Library in Modern age
Library in Modern ageLibrary in Modern age
Library in Modern age
 
Our Library Rules and Regulations
Our Library Rules and RegulationsOur Library Rules and Regulations
Our Library Rules and Regulations
 
Library Usage and Essentials
Library Usage and EssentialsLibrary Usage and Essentials
Library Usage and Essentials
 
The role of the media in kenya
The role of the media in kenyaThe role of the media in kenya
The role of the media in kenya
 
Disadvantages of Technology in Communication
Disadvantages of Technology in CommunicationDisadvantages of Technology in Communication
Disadvantages of Technology in Communication
 
Importance of CSR
Importance of CSRImportance of CSR
Importance of CSR
 
What is CSR and Why is it Important
What is CSR and Why is it ImportantWhat is CSR and Why is it Important
What is CSR and Why is it Important
 
Needs and importance of corporate social responsibility
Needs and importance of corporate social responsibilityNeeds and importance of corporate social responsibility
Needs and importance of corporate social responsibility
 
OBJECTIVES OF COMMUNICATION
OBJECTIVES OF COMMUNICATIONOBJECTIVES OF COMMUNICATION
OBJECTIVES OF COMMUNICATION
 
Modern Library Shelving System: BCI Gothia Shelving
Modern Library Shelving System: BCI Gothia ShelvingModern Library Shelving System: BCI Gothia Shelving
Modern Library Shelving System: BCI Gothia Shelving
 
Mku virtual campus presentation
Mku virtual campus presentationMku virtual campus presentation
Mku virtual campus presentation
 

Similar to MapReduce - Hadoop - Big Data

Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
Gwen (Chen) Shapira
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Similar to MapReduce - Hadoop - Big Data (20)

Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
try
trytry
try
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Mapreduce Hadop.pptx
Mapreduce Hadop.pptxMapreduce Hadop.pptx
Mapreduce Hadop.pptx
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. ElephantHadoop & Spark Performance tuning using Dr. Elephant
Hadoop & Spark Performance tuning using Dr. Elephant
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 

More from Nafiz Ishtiaque Ahmed

Team 5 imputing_medical_missing_data_ga approach_preseatation
Team 5 imputing_medical_missing_data_ga approach_preseatationTeam 5 imputing_medical_missing_data_ga approach_preseatation
Team 5 imputing_medical_missing_data_ga approach_preseatation
Nafiz Ishtiaque Ahmed
 

More from Nafiz Ishtiaque Ahmed (20)

Stress effects in the brain during transcranial magnetic (1)
Stress effects in the brain during transcranial magnetic (1)Stress effects in the brain during transcranial magnetic (1)
Stress effects in the brain during transcranial magnetic (1)
 
Mobile ip presented by nafiz
Mobile ip   presented by nafizMobile ip   presented by nafiz
Mobile ip presented by nafiz
 
Bci communication _old
Bci  communication _oldBci  communication _old
Bci communication _old
 
Hh model(me)
Hh model(me)Hh model(me)
Hh model(me)
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Team 5 imputing_medical_missing_data_ga approach_preseatation
Team 5 imputing_medical_missing_data_ga approach_preseatationTeam 5 imputing_medical_missing_data_ga approach_preseatation
Team 5 imputing_medical_missing_data_ga approach_preseatation
 
Proposal (20185748, ahmed nafiz ishtiaque)
Proposal (20185748, ahmed nafiz ishtiaque)Proposal (20185748, ahmed nafiz ishtiaque)
Proposal (20185748, ahmed nafiz ishtiaque)
 
Proposal (20185748, ahmed nafiz ishtiaque)
Proposal (20185748, ahmed nafiz ishtiaque)Proposal (20185748, ahmed nafiz ishtiaque)
Proposal (20185748, ahmed nafiz ishtiaque)
 
Nafiz prasented an eeg-based brain computer interface for
Nafiz prasented an eeg-based brain computer interface forNafiz prasented an eeg-based brain computer interface for
Nafiz prasented an eeg-based brain computer interface for
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Brain signal seminar
Brain signal seminar Brain signal seminar
Brain signal seminar
 
Hw nafiz ishtiaque
Hw nafiz ishtiaqueHw nafiz ishtiaque
Hw nafiz ishtiaque
 
Hodgkin huxleymodeling
Hodgkin huxleymodelingHodgkin huxleymodeling
Hodgkin huxleymodeling
 
Responsive Distributed Routing Algorithm
Responsive Distributed Routing AlgorithmResponsive Distributed Routing Algorithm
Responsive Distributed Routing Algorithm
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

MapReduce - Hadoop - Big Data

  • 2. Problem Analysis • Experiments in ICSI Desktop Cluster but in reality Big Data dataset has to handle 100 pada byte of data . • Heavy network traffic is not considered .
  • 3. Problem Analysis • Mapreduce has latency as • Mapping phase peak rate is not high . • Need Bundle data for fast mapping . • Limited Reducer as each reducer output file is different .
  • 5. Problem Analysis • Mapreduce has latency as • Hadoop do not support broadcasting parameter references to all maps node thus all map node has to bundle same parameter . • Secondary buffer needed to swapping .
  • 6. Problem Analysis • Hadoop has drawbacks on implementing DFS . • Mapreduce framework performs very poorly in slot-base memory(1 slot 1 task) and iterative processing tasks like graph processing. • The MapReduce does not work when there are computational dependencies in the data .
  • 7. Problem Analysis • To make the implementation of research suggestion is more non-intuitive & complicated than is necessary . • If new data is added the jobs need to run over the entire set again . • A single failure kills all queued and running jobs .
  • 9. Suggestion • Augmenting MapReduce with ad hoc support may solve iterative and random access to its dataset. • Sampling also may use to solve iterative problem .
  • 11. Review Questions • Why Mapping phase peak rate is not high ?  It writes on intermediate data file . • Why Hadoop do not support broadcasting ?  As JAVA do not support sharing references during mapping task .
  • 12. Review Questions • Mapreduce performs poorly in iterative why ?  The system merge iterations and materializing data only when required . • Why new data cases to run whole job again ?  Hadoop does not function well for random access to its datasets . But YARN promise to support that .