SlideShare a Scribd company logo
1 of 26
 Cluster Architecture Web Search for a Planet and much more…. Abhijeet Desai desaiabhijeet89@gmail.com 1
2
Google Query Serving Infrastructure Elapsed time: 0.25s, machines involved: 1000s+ 3
PageRank PageRankTM is the core technology to measure the importance of a page Google's theory 	– If page A links to page B 		# Page B is important 		# The link text is irrelevant 	– If many important links point to page A 		# Links from page A are also important 4
5
Key Design Principles Software reliability Use replication for better request throughput and availability  Price/performance beats peak performance Using commodity PCs reduces the cost of computation  6
The Power Problem High density of machines (racks) 	– High power consumption 400-700 W/ft2 		# Typical data center provides 70-150 W/ft2 		# Energy costs 	– Heating 		# Cooling system costs Reducing power 	– Reduce performance (c/p may not reduce!) 	– Faster hardware depreciation (cost up!) 7
Parallelism Lookup of matching docs in a large index 	--> many lookups in a set of smaller indexes followed by a merge step A query stream 	--> multiple streams 		(each handled by a cluster) Adding machines to a pool increases serving capacity 8
Hardware Level Consideration  Instruction level parallelism does not help Multiple simple, in-order, short-pipeline core Thread level parallelism Memory system with moderate sized L2 cache is enough Large shared-memory machines are not required to boost the performance 9
GFS (Google File System) Design ,[object Object]
 Data transfers happen directly between clients/chunk servers
 Files broken into chunks (typically 64 MB)10
GFS Usage @ Google • 200+ clusters • Many clusters of 1000s of machines • Pools of 1000s of clients • 4+ PB Filesystems • 40 GB/s read/write load 	– (in the presence of frequent HW failures) 11
The Machinery 12
Architectural view of the storage hierarchy 13
Clusters through the years “Google” Circa 1997 (google.stanford.edu) Google (circa 1999) 14
Clusters through the years Google (new data center 2001) Google Data Center (Circa 2000) 3 days later 15
Current Design • In-house rack design • PC-class motherboards • Low-end storage and networking hardware • Linux • + in-house software 16
Container Datacenter  17
Container Datacenter  18
Multicore Computing 19
Comparison Between Custom built & High-end Servers  20
Implications of the Computing Environment Stuff Breaks • If you have one server, it may stay up three years (1,000 days) • If you have 10,000 servers, expect to lose ten a day • “Ultra-reliable” hardware doesn’t really help • At large scale, super-fancy reliable hardware still fails, albeit less often 	– software still needs to be fault-tolerant 	– commodity machines without fancy hardware give better performance/$ • Reliability has to come from the software • Making it easier to write distributed programs 21
Infrastructure for Search Systems Several key pieces of infrastructure: 	– GFS 	– MapReduce 	– BigTable 22
MapReduce • A simple programming model that applies to many large-scale computing problems • Hide messy details in MapReduce runtime library: 	– automatic parallelization 	– load balancing 	– network and disk transfer              optimizations 	– handling of machine failures 	– robustness 	– improvements to core library benefit all users of library! 23
Typical problem solved by MapReduce • Read a lot of data • Map: extract something you care about from each record • Shuffle and Sort • Reduce: aggregate, summarize, filter, or transform • Write the results • Outline stays the same, map and reduce change to fit the problem 24

More Related Content

What's hot (20)

Database management system
Database management systemDatabase management system
Database management system
 
Ccna ppt1
Ccna ppt1Ccna ppt1
Ccna ppt1
 
Chapter25
Chapter25Chapter25
Chapter25
 
CCNA IP Addressing
CCNA IP AddressingCCNA IP Addressing
CCNA IP Addressing
 
Linux Networking Commands
Linux Networking CommandsLinux Networking Commands
Linux Networking Commands
 
Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternatives
 
Query decomposition in data base
Query decomposition in data baseQuery decomposition in data base
Query decomposition in data base
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
 
Lect 2 i pv6-latest-rami
Lect 2 i pv6-latest-ramiLect 2 i pv6-latest-rami
Lect 2 i pv6-latest-rami
 
Network ppt
Network pptNetwork ppt
Network ppt
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
Network protocols
Network protocolsNetwork protocols
Network protocols
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
Network Topologies in Simple (Logical, Physical and Types)
Network Topologies in Simple (Logical, Physical and Types)Network Topologies in Simple (Logical, Physical and Types)
Network Topologies in Simple (Logical, Physical and Types)
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Computer Networking 101
Computer Networking 101Computer Networking 101
Computer Networking 101
 
The basics of computer networking
The basics of computer networkingThe basics of computer networking
The basics of computer networking
 
Router configuration in packet tracer
Router configuration in packet  tracerRouter configuration in packet  tracer
Router configuration in packet tracer
 
Database fragmentation
Database fragmentationDatabase fragmentation
Database fragmentation
 
Routers.ppt
Routers.pptRouters.ppt
Routers.ppt
 

Viewers also liked

Google Architecture - Breaking it Open
Google Architecture - Breaking it OpenGoogle Architecture - Breaking it Open
Google Architecture - Breaking it OpenHARMAN Services
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architectureDivyangee Jain
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)Rohit Jain
 
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.Rishikese MR
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMJYoTHiSH o.s
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeCristina Munoz
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenHARMAN Services
 
It 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processingIt 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processingHarish Khodke
 
Digital image processing unit 1
Digital image processing unit 1Digital image processing unit 1
Digital image processing unit 1Anantharaj Manoj
 

Viewers also liked (20)

Google Architecture - Breaking it Open
Google Architecture - Breaking it OpenGoogle Architecture - Breaking it Open
Google Architecture - Breaking it Open
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architecture
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Cluster computer
Cluster  computerCluster  computer
Cluster computer
 
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challenge
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it Open
 
Computer cluster
Computer clusterComputer cluster
Computer cluster
 
It 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processingIt 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processing
 
Digital image processing unit 1
Digital image processing unit 1Digital image processing unit 1
Digital image processing unit 1
 
Dip Unit Test-I
Dip Unit Test-IDip Unit Test-I
Dip Unit Test-I
 
OGSA
OGSAOGSA
OGSA
 
Globus ppt
Globus pptGlobus ppt
Globus ppt
 
MPI
MPIMPI
MPI
 

Similar to Google cluster architecture

CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptdhanasekarscse
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Google data centers
Google data centersGoogle data centers
Google data centersAli Al-Ugali
 
54665962-Nav-Cluster-Computing.pptx
54665962-Nav-Cluster-Computing.pptx54665962-Nav-Cluster-Computing.pptx
54665962-Nav-Cluster-Computing.pptxYashAhire28
 
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)Robert Grossman
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overviewKHANSAFEE
 
The Rise of Cloud Computing Systems
The Rise of Cloud Computing SystemsThe Rise of Cloud Computing Systems
The Rise of Cloud Computing SystemsDaehyeok Kim
 
Architecting Cloudy Applications
Architecting Cloudy ApplicationsArchitecting Cloudy Applications
Architecting Cloudy ApplicationsDavid Chou
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用lantianlcdx
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureNguyen Duong
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Saptak Sen
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Data Center of the Future v1.0.pptx
Data Center of the Future v1.0.pptxData Center of the Future v1.0.pptx
Data Center of the Future v1.0.pptxjuergenJaeckel
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute EngineArun Nagarajan
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 

Similar to Google cluster architecture (20)

CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Google data centers
Google data centersGoogle data centers
Google data centers
 
54665962-Nav-Cluster-Computing.pptx
54665962-Nav-Cluster-Computing.pptx54665962-Nav-Cluster-Computing.pptx
54665962-Nav-Cluster-Computing.pptx
 
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
 
CDP_2(1).pptx
CDP_2(1).pptxCDP_2(1).pptx
CDP_2(1).pptx
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
 
The Rise of Cloud Computing Systems
The Rise of Cloud Computing SystemsThe Rise of Cloud Computing Systems
The Rise of Cloud Computing Systems
 
Architecting Cloudy Applications
Architecting Cloudy ApplicationsArchitecting Cloudy Applications
Architecting Cloudy Applications
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sure
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Data Center of the Future v1.0.pptx
Data Center of the Future v1.0.pptxData Center of the Future v1.0.pptx
Data Center of the Future v1.0.pptx
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
 
Cloud
CloudCloud
Cloud
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Google cluster architecture

  • 1. Cluster Architecture Web Search for a Planet and much more…. Abhijeet Desai desaiabhijeet89@gmail.com 1
  • 2. 2
  • 3. Google Query Serving Infrastructure Elapsed time: 0.25s, machines involved: 1000s+ 3
  • 4. PageRank PageRankTM is the core technology to measure the importance of a page Google's theory – If page A links to page B # Page B is important # The link text is irrelevant – If many important links point to page A # Links from page A are also important 4
  • 5. 5
  • 6. Key Design Principles Software reliability Use replication for better request throughput and availability Price/performance beats peak performance Using commodity PCs reduces the cost of computation 6
  • 7. The Power Problem High density of machines (racks) – High power consumption 400-700 W/ft2 # Typical data center provides 70-150 W/ft2 # Energy costs – Heating # Cooling system costs Reducing power – Reduce performance (c/p may not reduce!) – Faster hardware depreciation (cost up!) 7
  • 8. Parallelism Lookup of matching docs in a large index --> many lookups in a set of smaller indexes followed by a merge step A query stream --> multiple streams (each handled by a cluster) Adding machines to a pool increases serving capacity 8
  • 9. Hardware Level Consideration Instruction level parallelism does not help Multiple simple, in-order, short-pipeline core Thread level parallelism Memory system with moderate sized L2 cache is enough Large shared-memory machines are not required to boost the performance 9
  • 10.
  • 11. Data transfers happen directly between clients/chunk servers
  • 12. Files broken into chunks (typically 64 MB)10
  • 13. GFS Usage @ Google • 200+ clusters • Many clusters of 1000s of machines • Pools of 1000s of clients • 4+ PB Filesystems • 40 GB/s read/write load – (in the presence of frequent HW failures) 11
  • 15. Architectural view of the storage hierarchy 13
  • 16. Clusters through the years “Google” Circa 1997 (google.stanford.edu) Google (circa 1999) 14
  • 17. Clusters through the years Google (new data center 2001) Google Data Center (Circa 2000) 3 days later 15
  • 18. Current Design • In-house rack design • PC-class motherboards • Low-end storage and networking hardware • Linux • + in-house software 16
  • 22. Comparison Between Custom built & High-end Servers 20
  • 23. Implications of the Computing Environment Stuff Breaks • If you have one server, it may stay up three years (1,000 days) • If you have 10,000 servers, expect to lose ten a day • “Ultra-reliable” hardware doesn’t really help • At large scale, super-fancy reliable hardware still fails, albeit less often – software still needs to be fault-tolerant – commodity machines without fancy hardware give better performance/$ • Reliability has to come from the software • Making it easier to write distributed programs 21
  • 24. Infrastructure for Search Systems Several key pieces of infrastructure: – GFS – MapReduce – BigTable 22
  • 25. MapReduce • A simple programming model that applies to many large-scale computing problems • Hide messy details in MapReduce runtime library: – automatic parallelization – load balancing – network and disk transfer optimizations – handling of machine failures – robustness – improvements to core library benefit all users of library! 23
  • 26. Typical problem solved by MapReduce • Read a lot of data • Map: extract something you care about from each record • Shuffle and Sort • Reduce: aggregate, summarize, filter, or transform • Write the results • Outline stays the same, map and reduce change to fit the problem 24
  • 27. Conclusions For a large scale web service system like Google – Design the algorithm which can be easily parallelized – Design the architecture using replication to achieve distributed computing/storage and fault tolerance – Be aware of the power problem which significantly restricts the use of parallelism 25
  • 28. References 1. Luiz André Barroso , Jeffrey Dean , UrsHölzle, Web Search for a Planet: The Google Cluster Architecture, IEEE Micro, v.23 n.2, p.22-28, March 2003 [doi>10.1109/MM.2003.1196112] 2. S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Proc. Seventh World Wide Web Conf. (WWW7), International World Wide Web Conference Committee (IW3C2), 1998, pp. 107-117. 3. “TPC Benchmark C Full Disclosure Report for IBM eserverxSeries 440 using Microsoft SQL Server 2000 Enterprise Edition and Microsoft Windows .NET Datacenter Server 2003, TPC-C Version 5.0,” http://www.tpc.org/results/FDR/TPCC/ibm.x4408way.c5.fdr.02110801.pdf. 4. D. Marr et al., “Hyper-Threading Technology Architecture and Microarchitecture: A Hypertext History,” Intel Technology J., vol. 6, issue 1, Feb. 2002. 5. L. Hammond, B. Nayfeh, and K. Olukotun, “A Single-Chip Multiprocessor,” Computer, vol. 30, no. 9, Sept. 1997, pp. 79-85. 6. L.A. Barroso et al., “Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,” Proc. 27th ACM Int’l Symp. Computer Architecture, ACM Press, 2000, pp. 282-293. 7. L.A. Barroso, K. Gharachorloo, and E. Bugnion, “Memory System Characterization of Commercial Workloads,” Proc. 25th ACM Int’l Symp. Computer Architecture, ACM Press, 1998, pp. 3-14. 26