SlideShare a Scribd company logo
1 of 34
Facebook Architecture



Aditya Agarwal
Director of Engineering
11/22/2008
Agenda
 1   Architecture Overview

 2   PHP, MySQL, Memcache

 3   Thrift, Scribe, Tools

 4   News Feed Architecture
At a Glance
   The Social Graph
   120M+ active users
   50B+ PVs per month
   10B+ Photos
   1B+ connections
   50K+ Platform Apps
   400K+ App Developers
General Design Principles
▪   Use open source where possible
      ▪   Explore making optimizations where needed

▪   Unix Philosophy
      ▪   Keep individual components simple yet performant
      ▪   Combine as necessary
      ▪   Concentrate on clean interface points

▪   Build everything for scale
▪   Try to minimize failure points
▪   Simplicity, Simplicity, Simplicity!
Architecture Overview

        LAMP           +      Services
        PHP                    AdServer
                               Search
        Memcache               Network Selector
                               News Feed
        MySQL                  Blogfeeds
                               CSSParser
              php!             Mobile
                               ShareScraper


                                     !php
                     Thrift
                     Scribe
                     ODS
                     Tools
PHP

▪   Good web programming language
     ▪   Extensive library support for web development
     ▪   Active developer community


▪   Good for rapid iteration
     ▪   Dynamically typed, interpreted scripting language
PHP: What we Learnt
▪   Tough to scale for large code bases
      ▪   Weak typing
      ▪   Limited opportunities for static analysis, code optimizations


▪   Not necessarily optimized for large website use case
      ▪   E.g. No dynamic reloading of files on web server


▪   Linearly increasing cost per included file


▪   Extension framework is difficult to use
PHP: Customizations
▪   Op-code optimization
▪   APC improvements
     ▪   Lazy loading
     ▪   Cache priming
     ▪   More efficient locking semantics for variable cache data

▪   Custom extensions
     ▪   Memcache client extension
     ▪   Serialization format
     ▪   Logging, Stats collection, Monitoring
     ▪   Asynchronous event-handling mechanism
MySQL
▪   Fast, reliable


▪   Used primarily as <key,value> store
      ▪   Data randomly distributed amongst large set of logical instances
      ▪   Most data access based on global id


▪   Large number of logical instances spread out across physical nodes
      ▪   Load balancing at physical node level


▪   No read replication
MySQL: What We Learnt (ing)
▪   Logical migration of data is very difficult


▪   Create a large number of logical dbs, load balance them over varying
    number of physical nodes


▪   No joins in production
      ▪   Logically difficult (because data is distributed randomly)


▪   Easier to scale CPU on web tier
MySQL: What we Learnt (ing)
▪   Most data access is for recent data
      ▪   Optimize table layout for recency
      ▪   Archive older data


▪   Don’t ever store non-static data in a central db
      ▪   CDB makes it easier to perform certain aggregated queries
      ▪   Will not scale


▪   Use services or memcache for global queries
      ▪   E.g.: What are the most popular groups in my network
MySQL: Customizations
▪   No extensive native MySQL modifications


▪   Custom partitioning scheme
     ▪   Global id assigned to all data


▪   Custom archiving scheme
     ▪   Based on frequency and recency of data on a per-user basis


▪   Extended Query Engine for cross-data center replication, cache
    consistency
MySQL: Customizations
▪   Graph based data-access libraries
     ▪   Loosely typed objects (nodes) with limited datatypes (int, varchar, text)
     ▪   Replicated connections (edges)
     ▪   Analogous to distributed foreign keys


▪   Some data collocated
     ▪   Example: User profile data and all of user’s connections


▪   Most data distributed randomly
Memcache
▪   High-Performance, distributed in-memory hash table
▪   Used to alleviate database load
▪   Primary form of caching
▪   Over 25TB of in-memory cache
▪   Average latency < 200 micro-seconds
▪   Cache serialized PHP data structures
▪   Lots and lots of multi-gets to retrieve data spanning across graph edges
Memache: Customizations
▪   Memache over UDP
     ▪   Reduce memory overhead of thousands of TCP connection buffers
     ▪   Application-level flow control (optimization for multi-gets)


▪   On demand aggregation of per-thread stats
     ▪   Reduces global lock contention


▪   Multiple Kernel changes to optimize for Memcache usage
     ▪   Distributing network interrupt handling over multiple cores
     ▪   Opportunistic polling of network interface
Let’s put this into action
Under the Covers
▪   Get my profile data
      ▪   Fetch from cache, potentially go to my DB (based on user-id)

▪   Get friend connections
      ▪   Cache, if not DB (based on user-id)

▪   In parallel, fetch last 10 photo album ids for each of my friends
      ▪   Multi-get; individual cache misses fetches data from db (based on photo-
          album id)

▪   Fetch data for most recent photo albums in parallel
▪   Execute page-specific rendering logic in PHP
▪   Return data, make user happy
LAMP is not Perfect
LAMP is not Perfect
▪   PHP+MySQL+Memcache works for a large class of problems but not for
    everything
     ▪   PHP is stateless
     ▪   PHP not the fastest executing language
     ▪   All data is remote

▪   Reasons why services are written
     ▪   Store code closer to data
     ▪   Compiled environment is more efficient
     ▪   Certain functionality only present in other languages
Services Philosophy
▪   Create a service iff required
      ▪   Real overhead for deployment, maintenance, separate code-base
      ▪   Another failure point

▪   Create a common framework and toolset that will allow for easier
    creation of services
      ▪   Thrift
      ▪   Scribe
      ▪   ODS, Alerting service, Monitoring service

▪   Use the right language, library and tool for the task
Thrift




High-Level Goal: Enable transparent interaction between these.
                                                                 …and some others too.
Thrift
▪   Lightweight software framework for cross-language development
▪   Provide IDL, statically generate code
▪   Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell
    etc.
▪   Transports: Simple Interface to I/O
     ▪   Tsocket, TFileTransport, TMemoryBuffer

▪   Protocols: Serialization Format
     ▪   TBinaryProtocol, TJSONProtocol

▪   Servers
     ▪   Non-Blocking, Async, Single Threaded, Multi-threaded
Hasn’t this been done before?                      (yes.)


▪   SOAP
       ▪   XML, XML, and more XML

▪   CORBA
       ▪   Bloated? Remote bindings?

▪   COM
       ▪   Face-Win32ClientSoftware.dll-Book

▪   Pillar
       ▪   Slick! But no versioning/abstraction.

▪   Protocol Buffers
Thrift: Why?
•   It’s quick. Really quick.

•   Less time wasted by individual developers
     •   No duplicated networking and protocol code
     •   Less time dealing with boilerplate stuff
     •   Write your client and server in about 5 minutes


•   Division of labor
     •   Work on high-performance servers separate from applications

•   Common toolkit
     •   Fosters code reuse and shared tools
Scribe
▪   Scalable distributed logging framework
▪   Useful for logging a wide array of data
      ▪   Search Redologs
      ▪   Powers news feed publishing
      ▪   A/B testing data

▪   Weak Reliability
      ▪   More reliable than traditional logging but not suitable for database
          transactions.

▪   Simple data model
▪   Built on top of Thrift
Other Tools
▪   SMC (Service Management Console)
     ▪   Centralized configuration
     ▪   Used to determine logical service -> physical node mapping
Other Tools
▪   ODS
     ▪   Used to log and view historical trends for any stats published by service
     ▪   Useful for service monitoring, alerting
Open Source
▪   Thrift
      ▪   http://developers.facebook.com/thrift/



▪   Scribe
      ▪   http://developers.facebook.com/scribe/



▪   PHPEmbed
      ▪   http://developers.facebook.com/phpembed/



▪   More good stuff
      ▪   http://developers.facebook.com/opensource.php
NewsFeed – The Goodz
NewsFeed – The Work
                                                                                       friends’
                                                                                       actions
                                      web tier                           Leaf Server
                        Html

                                        PHP          Actions (Scribe)    Leaf Server
                     home.php                                            Leaf Server
     user

                                          return                         Leaf Server
                                        view state



                                       view                             aggregators
                                       state
                                      storage                                             friends’
                                                                                          actions?
                                                                         aggregating...
- Most arrows indicate thrift calls                                      ranking...
Search – The Goodz
Search – The Work
                    Thrift


                                        search tier
                                         slave             slave   master     slave
                                        index             index    index    index
user
         web tier
                      Scribe     live              db
        PHP                    change            index
                                logs              files




                                           Indexing service




                                           DB Tier
               Updates
Questions?

More info at www.facebook.com/eblog


Aditya Agarwal
aditya@facebook.com

More Related Content

What's hot

Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 

What's hot (20)

Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challenge
 
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
 
Facebook Messenger/Whatsapp System Design
Facebook Messenger/Whatsapp System DesignFacebook Messenger/Whatsapp System Design
Facebook Messenger/Whatsapp System Design
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
HBase internals
HBase internalsHBase internals
HBase internals
 
Microsoft Power BI Overview
Microsoft Power BI OverviewMicrosoft Power BI Overview
Microsoft Power BI Overview
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 

Similar to Facebook architecture

Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
Membase
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
baggioss
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
harendra_pathak
 
Rubyonrails 090715105949-phpapp01
Rubyonrails 090715105949-phpapp01Rubyonrails 090715105949-phpapp01
Rubyonrails 090715105949-phpapp01
sagaroceanic11
 
6 3 tier architecture php
6 3 tier architecture php6 3 tier architecture php
6 3 tier architecture php
cefour
 

Similar to Facebook architecture (20)

Qcon
QconQcon
Qcon
 
Top ten-list
Top ten-listTop ten-list
Top ten-list
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Ruby On Rails
Ruby On RailsRuby On Rails
Ruby On Rails
 
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
IBM Connect 2017: Your Data In the Major Leagues: A Practical Guide to REST S...
 
20120306 dublin js
20120306 dublin js20120306 dublin js
20120306 dublin js
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
Rubyonrails 090715105949-phpapp01
Rubyonrails 090715105949-phpapp01Rubyonrails 090715105949-phpapp01
Rubyonrails 090715105949-phpapp01
 
6 3 tier architecture php
6 3 tier architecture php6 3 tier architecture php
6 3 tier architecture php
 
In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Turbocharging php applications with zend server (workshop)
Turbocharging php applications with zend server (workshop)Turbocharging php applications with zend server (workshop)
Turbocharging php applications with zend server (workshop)
 

More from mysqlops

More from mysqlops (20)

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautiful
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-management
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internals
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDL
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护
 
Memcached
MemcachedMemcached
Memcached
 
DevOPS
DevOPSDevOPS
DevOPS
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Facebook architecture

  • 1.
  • 3. Agenda 1 Architecture Overview 2 PHP, MySQL, Memcache 3 Thrift, Scribe, Tools 4 News Feed Architecture
  • 4. At a Glance The Social Graph 120M+ active users 50B+ PVs per month 10B+ Photos 1B+ connections 50K+ Platform Apps 400K+ App Developers
  • 5. General Design Principles ▪ Use open source where possible ▪ Explore making optimizations where needed ▪ Unix Philosophy ▪ Keep individual components simple yet performant ▪ Combine as necessary ▪ Concentrate on clean interface points ▪ Build everything for scale ▪ Try to minimize failure points ▪ Simplicity, Simplicity, Simplicity!
  • 6. Architecture Overview LAMP + Services PHP AdServer Search Memcache Network Selector News Feed MySQL Blogfeeds CSSParser php! Mobile ShareScraper !php Thrift Scribe ODS Tools
  • 7. PHP ▪ Good web programming language ▪ Extensive library support for web development ▪ Active developer community ▪ Good for rapid iteration ▪ Dynamically typed, interpreted scripting language
  • 8. PHP: What we Learnt ▪ Tough to scale for large code bases ▪ Weak typing ▪ Limited opportunities for static analysis, code optimizations ▪ Not necessarily optimized for large website use case ▪ E.g. No dynamic reloading of files on web server ▪ Linearly increasing cost per included file ▪ Extension framework is difficult to use
  • 9. PHP: Customizations ▪ Op-code optimization ▪ APC improvements ▪ Lazy loading ▪ Cache priming ▪ More efficient locking semantics for variable cache data ▪ Custom extensions ▪ Memcache client extension ▪ Serialization format ▪ Logging, Stats collection, Monitoring ▪ Asynchronous event-handling mechanism
  • 10. MySQL ▪ Fast, reliable ▪ Used primarily as <key,value> store ▪ Data randomly distributed amongst large set of logical instances ▪ Most data access based on global id ▪ Large number of logical instances spread out across physical nodes ▪ Load balancing at physical node level ▪ No read replication
  • 11. MySQL: What We Learnt (ing) ▪ Logical migration of data is very difficult ▪ Create a large number of logical dbs, load balance them over varying number of physical nodes ▪ No joins in production ▪ Logically difficult (because data is distributed randomly) ▪ Easier to scale CPU on web tier
  • 12. MySQL: What we Learnt (ing) ▪ Most data access is for recent data ▪ Optimize table layout for recency ▪ Archive older data ▪ Don’t ever store non-static data in a central db ▪ CDB makes it easier to perform certain aggregated queries ▪ Will not scale ▪ Use services or memcache for global queries ▪ E.g.: What are the most popular groups in my network
  • 13. MySQL: Customizations ▪ No extensive native MySQL modifications ▪ Custom partitioning scheme ▪ Global id assigned to all data ▪ Custom archiving scheme ▪ Based on frequency and recency of data on a per-user basis ▪ Extended Query Engine for cross-data center replication, cache consistency
  • 14. MySQL: Customizations ▪ Graph based data-access libraries ▪ Loosely typed objects (nodes) with limited datatypes (int, varchar, text) ▪ Replicated connections (edges) ▪ Analogous to distributed foreign keys ▪ Some data collocated ▪ Example: User profile data and all of user’s connections ▪ Most data distributed randomly
  • 15. Memcache ▪ High-Performance, distributed in-memory hash table ▪ Used to alleviate database load ▪ Primary form of caching ▪ Over 25TB of in-memory cache ▪ Average latency < 200 micro-seconds ▪ Cache serialized PHP data structures ▪ Lots and lots of multi-gets to retrieve data spanning across graph edges
  • 16. Memache: Customizations ▪ Memache over UDP ▪ Reduce memory overhead of thousands of TCP connection buffers ▪ Application-level flow control (optimization for multi-gets) ▪ On demand aggregation of per-thread stats ▪ Reduces global lock contention ▪ Multiple Kernel changes to optimize for Memcache usage ▪ Distributing network interrupt handling over multiple cores ▪ Opportunistic polling of network interface
  • 17. Let’s put this into action
  • 18. Under the Covers ▪ Get my profile data ▪ Fetch from cache, potentially go to my DB (based on user-id) ▪ Get friend connections ▪ Cache, if not DB (based on user-id) ▪ In parallel, fetch last 10 photo album ids for each of my friends ▪ Multi-get; individual cache misses fetches data from db (based on photo- album id) ▪ Fetch data for most recent photo albums in parallel ▪ Execute page-specific rendering logic in PHP ▪ Return data, make user happy
  • 19. LAMP is not Perfect
  • 20. LAMP is not Perfect ▪ PHP+MySQL+Memcache works for a large class of problems but not for everything ▪ PHP is stateless ▪ PHP not the fastest executing language ▪ All data is remote ▪ Reasons why services are written ▪ Store code closer to data ▪ Compiled environment is more efficient ▪ Certain functionality only present in other languages
  • 21. Services Philosophy ▪ Create a service iff required ▪ Real overhead for deployment, maintenance, separate code-base ▪ Another failure point ▪ Create a common framework and toolset that will allow for easier creation of services ▪ Thrift ▪ Scribe ▪ ODS, Alerting service, Monitoring service ▪ Use the right language, library and tool for the task
  • 22. Thrift High-Level Goal: Enable transparent interaction between these. …and some others too.
  • 23. Thrift ▪ Lightweight software framework for cross-language development ▪ Provide IDL, statically generate code ▪ Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell etc. ▪ Transports: Simple Interface to I/O ▪ Tsocket, TFileTransport, TMemoryBuffer ▪ Protocols: Serialization Format ▪ TBinaryProtocol, TJSONProtocol ▪ Servers ▪ Non-Blocking, Async, Single Threaded, Multi-threaded
  • 24. Hasn’t this been done before? (yes.) ▪ SOAP ▪ XML, XML, and more XML ▪ CORBA ▪ Bloated? Remote bindings? ▪ COM ▪ Face-Win32ClientSoftware.dll-Book ▪ Pillar ▪ Slick! But no versioning/abstraction. ▪ Protocol Buffers
  • 25. Thrift: Why? • It’s quick. Really quick. • Less time wasted by individual developers • No duplicated networking and protocol code • Less time dealing with boilerplate stuff • Write your client and server in about 5 minutes • Division of labor • Work on high-performance servers separate from applications • Common toolkit • Fosters code reuse and shared tools
  • 26. Scribe ▪ Scalable distributed logging framework ▪ Useful for logging a wide array of data ▪ Search Redologs ▪ Powers news feed publishing ▪ A/B testing data ▪ Weak Reliability ▪ More reliable than traditional logging but not suitable for database transactions. ▪ Simple data model ▪ Built on top of Thrift
  • 27. Other Tools ▪ SMC (Service Management Console) ▪ Centralized configuration ▪ Used to determine logical service -> physical node mapping
  • 28. Other Tools ▪ ODS ▪ Used to log and view historical trends for any stats published by service ▪ Useful for service monitoring, alerting
  • 29. Open Source ▪ Thrift ▪ http://developers.facebook.com/thrift/ ▪ Scribe ▪ http://developers.facebook.com/scribe/ ▪ PHPEmbed ▪ http://developers.facebook.com/phpembed/ ▪ More good stuff ▪ http://developers.facebook.com/opensource.php
  • 31. NewsFeed – The Work friends’ actions web tier Leaf Server Html PHP Actions (Scribe) Leaf Server home.php Leaf Server user return Leaf Server view state view aggregators state storage friends’ actions? aggregating... - Most arrows indicate thrift calls ranking...
  • 32. Search – The Goodz
  • 33. Search – The Work Thrift search tier slave slave master slave index index index index user web tier Scribe live db PHP change index logs files Indexing service DB Tier Updates
  • 34. Questions? More info at www.facebook.com/eblog Aditya Agarwal aditya@facebook.com