Hadoop Architecture Overview Explaining HDFS, MapReduce and More

•Download as PPT, PDF•

14 likes•3,517 views

Hadoop is a distributed processing framework for large datasets. It stores data across clusters of commodity hardware in a Hadoop Distributed File System (HDFS) and provides tools for distributed processing using MapReduce. HDFS uses a master-slave architecture with a namenode managing metadata and datanodes storing data blocks. Data is replicated across nodes for reliability. MapReduce allows distributed processing of large datasets in parallel across clusters.

Technology

Hadoop architecture An overview Hari Shankar Sreekumar Software Engineer @Clickable

Ideas ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What is Hadoop? HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .

Hadoop Distributed File System A distributed filesystem designed for storing very large files with streaming data access running on clusters of commodity hardware . HDFS has been designed keeping MapReduce in mind Consists of a cluster of machines, each machine performing one or more of the following roles: Namenode (Only one per cluster) Secondary namenode (Checkpoint node) (Only one per cluster) Datanodes (Many per cluster)

HDFS Blocks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Namenode and Datanodes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Datanodes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Secondary namenode/Checkpoint node ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Image: Hadoop, The definitive Guide (Tom White)

Replication and rack-awareness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Reading from HDFS Image: Hadoop, The definitive Guide (Tom White) Failure=>Move to next 'closest' node with the block. Direct connection between client and datanode

Writing to HDFS Minimum replication for successful write: dfs.replication.min Files in HDFS are write-once and have strictly one writer at any time. Image: Hadoop, The definitive Guide (Tom White)

Hadoop Common File system abstraction: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others. Service-level authorization: Service Level Authorization is the initial authorization mechanism to ensure clients connecting to a particular Hadoop service have the necessary, pre-configured, permissions and are authorized to access the given service. For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to submit jobs.

[object Object],[object Object],[object Object],[object Object],[object Object],Data Integrity

Compression utilities ,[object Object],[object Object],Ref: Hadoop, The definitive Guide (Tom White) Splittable LZO is available separately and is a good trade-off between compression speed and compressed size.

Serialization utilities ,[object Object],[object Object],[object Object]

MapReduce Framework ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

References http://hadoop.apache.org/common/docs/current/hdfs_design.html Hadoop: The Definitive Guide, by Tom White. Copyright 2009 Tom White, 978-0-596-52197-4

What's hot

A Basic Introduction to the Hadoop eco system - no animationSameer Tiwari

HDFSSteve Loughran

Hadoop Tutorialawesomesos

Hadoop - OverviewJay

Hadoop demo pptPhil Young

HDFS: Hadoop Distributed FilesystemSteve Loughran

Hadoop 1.x vs 2Rommel Garcia

Introduction to Hadoop and Hadoop component rebeccatho

Apache hadoop technology : BeginnersShweta Patnaik

Hadoop File system (HDFS)Prashant Gupta

Hadoop Overview kdd2011Milind Bhandarkar

July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox

Hadoop-IntroductionSandeep Deshmukh

Introduction to HadoopOvidiu Dimulescu

Basics of big data analytics hadoopAmbuj Kumar

Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!

Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen

Hadoop interview quations1Vemula Ravi

6.hivePrashant Gupta

What's hot (20)

A Basic Introduction to the Hadoop eco system - no animation

HDFS

Hadoop Tutorial

Hadoop - Overview

Hadoop demo ppt

HDFS: Hadoop Distributed Filesystem

Hadoop 1.x vs 2

Introduction to Hadoop and Hadoop component

Apache hadoop technology : Beginners

Hadoop File system (HDFS)

Hadoop Overview kdd2011

July 2010 Triangle Hadoop Users Group - Chad Vawter Slides

Hadoop-Introduction

Introduction to Hadoop

Basics of big data analytics hadoop

Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |

Overview of Big data, Hadoop and Microsoft BI - version1

Hadoop interview quations1

6.hive

Viewers also liked

Introduction to Google App Enginerajdeep

Platform as a service google app engineDeepu S Nath

Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko

Unit i introduction to grid computingsudha kar

PaaS - google app engineJ Singh

5. the grid implementing production gridDr Sandeep Kumar Poonia

Hadoop Architecture and HDFSEdureka!

Google app engineSuraj Mehta

1. GRID COMPUTINGDr Sandeep Kumar Poonia

Hadoop Overview & Architecture EMC

To Serve and Protect: Making Sense of Hadoop Security Inside Analysis

Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneInnovative Management Services

Hadoop and Data Access SecurityCloudera, Inc.

Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara

Big Data, Security Intelligence, (And Why I Hate This Title) Coastal Pet Products, Inc.

Hadoop Ecosystem Architecture Overview Senthil Kumar

Big Data Security and GovernanceDataWorks Summit/Hadoop Summit

"Big Data" in the Energy IndustryPaige Bailey

Demystify big data data scienceMahesh Kumar CV

Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBlue Coat

Viewers also liked (20)

Introduction to Google App Engine

Platform as a service google app engine

Distributed Computing with Apache Hadoop: Technology Overview

Unit i introduction to grid computing

PaaS - google app engine

5. the grid implementing production grid

Hadoop Architecture and HDFS

Google app engine

1. GRID COMPUTING

Hadoop Overview & Architecture

To Serve and Protect: Making Sense of Hadoop Security

Open-BDA - Big Data Hadoop Developer Training 10th & 11th June

Hadoop and Data Access Security

Big Data, Big Content, and Aligning Your Storage Strategy

Big Data, Security Intelligence, (And Why I Hate This Title)

Hadoop Ecosystem Architecture Overview

Big Data Security and Governance

"Big Data" in the Energy Industry

Demystify big data data science

Big Data Security Intelligence and Analytics for Advanced Threat Protection

Similar to Hadoop Architecture Overview Explaining HDFS, MapReduce and More

Big data with HDFS and Mapreducesenthil0809

Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn

Hadoop HDFS Architeture and Designsudhakara st

Introduction to Hadoop Distributed File System(HDFS).pptxSakthiVinoth78

Unit 1SriKGangadharRaoAssi

Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc

Introduction to HDFSBhavesh Padharia

Hadoop and HDFSSatyaHadoop

module 2.pptxssuser6e8e41

Hadoop at a glanceTan Tran

Big data interview questions and answersKalyan Hadoop

Introduction_to_HDFS sun.pptxsunithachphd

Apache hadoopsheetal sharma

Apache Hadoop Big Data TechnologyJay Nagar

Hadoop professional-software-development-course-in-mumbaiUnmesh Baile

Hadoop-professional-software-development-course-in-mumbaiUnmesh Baile

Hadoop data managementSubhas Kumar Ghosh

Hadoop distributed file systemAnshul Bhatnagar

Lecture 2 part 1Jazan University

Hadoop overview.pdfSunil D Patil

Similar to Hadoop Architecture Overview Explaining HDFS, MapReduce and More (20)

Big data with HDFS and Mapreduce

Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...

Hadoop HDFS Architeture and Design

Introduction to Hadoop Distributed File System(HDFS).pptx

Unit 1

Hadoop Distributed File System for Big Data Analytics

Introduction to HDFS

Hadoop and HDFS

module 2.pptx

Hadoop at a glance

Big data interview questions and answers

Introduction_to_HDFS sun.pptx

Apache hadoop

Apache Hadoop Big Data Technology

Hadoop professional-software-development-course-in-mumbai

Hadoop-professional-software-development-course-in-mumbai

Hadoop data management

Hadoop distributed file system

Lecture 2 part 1

Hadoop overview.pdf

Recently uploaded

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Training state-of-the-art general text embeddingZilliz

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Story boards and shot lists for my a level piececharlottematthew16

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Artificial intelligence in cctv survelliance.pptxhariprasad279825

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Gen AI in Business - Global Trends Report 2024.pdfAddepto

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

AI as an Interface for Commercial BuildingsMemoori

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

What's New in Teams Calling, Meetings and Devices March 2024

Training state-of-the-art general text embedding

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Story boards and shot lists for my a level piece

SIP trunking in Janus @ Kamailio World 2024

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Advanced Test Driven-Development @ php[tek] 2024

Artificial intelligence in cctv survelliance.pptx

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Vector Databases 101 - An introduction to the world of Vector Databases

Designing IA for AI - Information Architecture Conference 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Ensuring Technical Readiness For Copilot in Microsoft 365

Gen AI in Business - Global Trends Report 2024.pdf

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

AI as an Interface for Commercial Buildings

Are Multi-Cloud and Serverless Good or Bad?

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Hadoop Architecture Overview Explaining HDFS, MapReduce and More

1. Hadoop architecture An overview Hari Shankar Sreekumar Software Engineer @Clickable

3. What is Hadoop? HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .

4. What is Hadoop? HDFS Hadoop Common MapReduce Pig Hive HBase Zookeeper Avro Cassandra Mahout . . . . . . . . .

5. Hadoop Distributed File System A distributed filesystem designed for storing very large files with streaming data access running on clusters of commodity hardware . HDFS has been designed keeping MapReduce in mind Consists of a cluster of machines, each machine performing one or more of the following roles: Namenode (Only one per cluster) Secondary namenode (Checkpoint node) (Only one per cluster) Datanodes (Many per cluster)

10. Image: Hadoop, The definitive Guide (Tom White)

11.

12. Reading from HDFS Image: Hadoop, The definitive Guide (Tom White) Failure=>Move to next 'closest' node with the block. Direct connection between client and datanode

13. Writing to HDFS Minimum replication for successful write: dfs.replication.min Files in HDFS are write-once and have strictly one writer at any time. Image: Hadoop, The definitive Guide (Tom White)

14. Hadoop Common File system abstraction: The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others. Service-level authorization: Service Level Authorization is the initial authorization mechanism to ensure clients connecting to a particular Hadoop service have the necessary, pre-configured, permissions and are authorized to access the given service. For example, a MapReduce cluster can use this mechanism to allow a configured list of users/groups to submit jobs.

15.

16.

17.

18.

19. Image: Hadoop, The definitive Guide (Tom White)

Hadoop Architecture Overview Explaining HDFS, MapReduce and More

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop Architecture Overview Explaining HDFS, MapReduce and More

Similar to Hadoop Architecture Overview Explaining HDFS, MapReduce and More (20)

Recently uploaded

Recently uploaded (20)

Hadoop Architecture Overview Explaining HDFS, MapReduce and More