SlideShare a Scribd company logo
1 of 39
Facebook Messages & HBase
                Nicolas Spiegelberg
                Software Engineer, Facebook




April 8, 2011
Talk Outline

▪   About Facebook Messages
▪   Intro to HBase
▪   Why HBase
▪   HBase Contributions
▪   MySQL -> HBase Migration
▪   Future Plans
▪   Q&A
Facebook Messages
The New Facebook Messages

 Messages   Chats    Emails   SMS
Monthly data volume prior to launch


           15B x 1,024 bytes = 14TB



           120B x 100 bytes = 11TB
Messaging Data
▪   Small/medium sized data              HBase
    ▪   Message metadata & indices
    ▪   Search index
    ▪   Small message bodies


▪   Attachments and large messages                Haystack
    ▪   Used for our existing photo/video store
Open Source Stack
▪   Memcached   -->   App Server Cache
▪   ZooKeeper   -->   Small Data Coordination Service
▪   HBase             -->   Database Storage Engine
▪   HDFS        -->   Distributed FileSystem
▪   Hadoop      -->   Asynchronous Map-Reduce Jobs
Our architecture
                                                           User Directory Service
           Clients
   (Front End, MTA, etc.)
                            What’s the cell for
                               this user?
                                                  Cell 2
        Cell 1                                                   Cell 3

                                  Cell 1   Application Server
 Application Server                                  Application Server

                                                  Attachments
                                             HBase/HDFS/Z
  HBase/HDFS/Z                               KMessage, Metadata,
                                                       HBase/HDFS/Z
  K                                             Search Index
                                                       K


                            Haystack
About HBase
HBase in a nutshell

• distributed, large-scale data store

• efficient at random reads/writes

• initially modeled after Google’s BigTable

• open source project (Apache)
When to use HBase?

▪ storing large amounts of data

▪ need high write throughput

▪ need efficient random access within large data sets

▪ need to scale gracefully with data

▪ for structured and semi-structured data

▪ don’t need full RDMS capabilities (cross table transactions, joins, etc.)
HBase Data Model
• An HBase table is:
 •   a sparse , three-dimensional array of cells, indexed by:
      RowKey, ColumnKey, Timestamp/Version
 •   sharded into regions along an ordered RowKey space

• Within each region:
 •   Data is grouped into column families
 ▪   Sort order within each column family:

      Row Key (asc), Column Key (asc), Timestamp (desc)
Example: Inbox Search
• Schema
 •   Key: RowKey: userid, Column: word, Version: MessageID
 •   Value: Auxillary info (like offset of word in message)

• Data is stored sorted by <userid, word, messageID>:

      User1:hi:17->offset1
                                 Can efficiently handle queries like:
      User1:hi:16->offset2
      User1:hello:16->offset3    - Get top N messageIDs for a
      User1:hello:2->offset4       specific user & word
      ...
      User2:....                 - Typeahead query: for a given user,
      User2:...                    get words that match a prefix
      ...
HBase System Overview
                             Database Layer

   HBASE
       Master      Backup
                   Master
   Region           Region                    Region    ...
   Server           Server                    Server
                                                              Coordination Service
            Storage Layer

HDFS                                                   Zookeeper Quorum
   Namenode          Secondary Namenode                ZK      ZK           ...
                                                       Peer    Peer
Datanode Datanode           Datanode          ...
HBase Overview
HBASE Region Server
                ....
            Region #2
       Region #1
                        ....
                  ColumnFamily #2

            ColumnFamily #1                Memstore
                                    (in memory data structure)


                HFiles (in HDFS)                             flush




  Write Ahead Log ( in HDFS)
HBase Overview
•       Very good at random reads/writes

•       Write path
    •    Sequential write/sync to commit log
    •    update memstore

•       Read path
    •    Lookup memstore & persistent HFiles
    •    HFile data is sorted and has a block index for efficient retrieval

•       Background chores
    •    Flushes (memstore -> HFile)
    •    Compactions (group of HFiles merged into one)
Why HBase?
Performance is great, but what else…
Horizontal scalability
▪   HBase & HDFS are elastic by design
▪   Multiple table shards (regions) per physical server
▪   On node additions
    ▪   Load balancer automatically reassigns shards from overloaded
        nodes to new nodes
    ▪   Because filesystem underneath is itself distributed, data for
        reassigned regions is instantly servable from the new nodes.
▪   Regions can be dynamically split into smaller regions.
    ▪   Pre-sharding is not necessary
    ▪   Splits are near instantaneous!
Automatic Failover
▪   Node failures automatically detected by HBase Master
▪   Regions on failed node are distributed evenly among surviving nodes.
        ▪   Multiple regions/server model avoids need for substantial
            overprovisioning
▪   HBase Master failover
    ▪   1 active, rest standby
    ▪   When active master fails, a standby automatically takes over
HBase uses HDFS
We get the benefits of HDFS as a storage system for free
▪   Fault tolerance (block level replication for redundancy)
▪   Scalability
▪   End-to-end checksums to detect and recover from corruptions
▪   Map Reduce for large scale data processing
▪   HDFS already battle tested inside Facebook
    ▪   running petabyte scale clusters
    ▪   lot of in-house development and operational experience
Simpler Consistency Model
▪   HBase’s strong consistency model
     ▪   simpler for a wide variety of applications to deal with
     ▪   client gets same answer no matter which replica data is read from


▪   Eventual consistency: tricky for applications fronted by a cache
     ▪   replicas may heal eventually during failures
     ▪   but stale data could remain stuck in cache
Typical Cluster Layout
 ▪   Multiple clusters/cells for messaging
     ▪   20 servers/rack; 5 or more racks per cluster
 ▪   Controllers (master/Zookeeper) spread across racks
 ZooKeeper Peer       ZooKeeper Peer     ZooKeeper Peer    ZooKeeper Peer    ZooKeeper Peer
 HDFS Namenode        Backup Namenode    Job Tracker       Hbase Master      Backup Master

Region Server        Region Server      Region Server     Region Server     Region Server
Data Node            Data Node          Data Node         Data Node         Data Node
Task Tracker         Task Tracker       Task Tracker      Task Tracker      Task Tracker



19x...                19x...            19x...            19x...            19x...


Region Server        Region Server      Region Server     Region Server     Region Server
Data Node            Data Node          Data Node         Data Node         Data Node
Task Tracker         Task Tracker       Task Tracker      Task Tracker      Task Tracker

         Rack #1          Rack #2            Rack #3           Rack #4           Rack #5
HBase Enhancements
Goal: Zero Data Loss
Goal of Zero Data Loss/Correctness
▪   sync support added to hadoop-20 branch
    ▪   for keeping transaction log (WAL) in HDFS
    ▪   to guarantee durability of transactions
▪   Row-level ACID compliance
▪   Enhanced HDFS’s Block Placement Policy:
    ▪   Original: rack aware, but minimally constrained
    ▪   Now: Placement of replicas constrained to configurable node groups
    ▪   Result: Data loss probability reduced by orders of magnitude
Availability/Stability improvements
▪   HBase master rewrite- region assignments using ZK
▪   Rolling Restarts – doing software upgrades without a downtime
▪   Interrupt Compactions – prioritize availability over minor perf gains
▪   Timeouts on client-server RPCs
▪   Staggered major compaction to avoid compaction storms
Performance Improvements
▪   Compactions
    ▪   critical for read performance
    ▪   Improved compaction algo
    ▪   delete/TTL/overwrite processing in minor compactions
▪   Read optimizations:
    ▪   Seek optimizations for rows with large number of cells
    ▪   Bloom filters to minimize HFile lookups
    ▪   Timerange hints on HFiles (great for temporal data)
    ▪   Improved handling of compressed HFiles
Operational Experiences
▪   Darklaunch:
    ▪   shadow traffic on test clusters for continuous, at scale testing
    ▪   experiment/tweak knobs
    ▪   simulate failures, test rolling upgrades
▪   Constant (pre-sharding) region count & controlled rolling splits
▪   Administrative tools and monitoring
    ▪   Alerts (HBCK, memory alerts, perf alerts, health alerts)
    ▪   auto detecting/decommissioning misbehaving machines
    ▪   Dashboards
▪   Application level backup/recovery pipeline
Working within the Apache community
▪   Growing with the community
    ▪   Started with a stable, healthy project
    ▪   In house expertise in both HDFS and HBase
    ▪   Increasing community involvement
▪   Undertook massive feature improvements with community help
    ▪   HDFS 0.20-append branch
    ▪   HBase Master rewrite
▪   Continually interacting with the community to identify and fix issues
    ▪   e.g., large responses (2GB RPC)
Data migration
Another place we used HBase heavily…
Move messaging data from MySQL to
HBase
Move messaging data from MySQL to HBase
▪   In MySQL, inbox data was kept normalized
    ▪   user’s messages are stored across many different machines
▪   Migrating a user is basically one big join across tables spread over
    many different machines
▪   Multiple terabytes of data (for over 500M users)
▪   Cannot pound 1000s of production UDBs to migrate users
How we migrated
▪   Periodically, get a full export of all the users’ inbox data in MySQL
▪   And, use bulk loader to import the above into a migration HBase
    cluster
▪   To migrate users:
    ▪   Since users may continue to receive messages during migration:
        ▪   double-write (to old and new system) during the migration period
    ▪   Get a list of all recent messages (since last MySQL export) for the
        user
        ▪   Load new messages into the migration HBase cluster
        ▪   Perform the join operations to generate the new data
        ▪   Export it and upload into the final cluster
Future Plans
HBase Expands
Facebook Insights Goes Real-Time
▪   Recently launched real-time analytics for social plugins on top of
    HBase
▪   Publishers get real-time distribution/engagement metrics:
        ▪   # of impressions, likes
        ▪   analytics by
            ▪   Domain, URL, demographics
            ▪   Over various time periods (the last hour, day, all-time)
▪   Makes use of HBase capabilities like:
    ▪   Efficient counters (read-modify-write increment operations)
    ▪   TTL for purging old data
Future Work
It is still early days…!
▪   Namenode HA (AvatarNode)
▪   Fast hot-backups (Export/Import)
▪   Online schema & config changes
▪   Running HBase as a service (multi-tenancy)
▪   Features (like secondary indices, batching hybrid mutations)
▪   Cross-DC replication
▪   Lot more performance/availability improvements
Thanks! Questions?
facebook.com/engineering
杭州站 · 2011年10月20日~22日
 www.qconhangzhou.com(6月启动)




QCon北京站官方网站和资料下载
     www.qconbeijing.com

More Related Content

What's hot

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기NAVER D2
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 

What's hot (20)

Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 

Similar to Facebook Messages & HBase

Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Yahoo Developer Network
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project ReportTushar Dalvi
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPERKrishnaVeni451953
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012Chris Huang
 

Similar to Facebook Messages & HBase (20)

Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Hbase
HbaseHbase
Hbase
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Facebook Messages & HBase

  • 1. Facebook Messages & HBase Nicolas Spiegelberg Software Engineer, Facebook April 8, 2011
  • 2. Talk Outline ▪ About Facebook Messages ▪ Intro to HBase ▪ Why HBase ▪ HBase Contributions ▪ MySQL -> HBase Migration ▪ Future Plans ▪ Q&A
  • 4. The New Facebook Messages Messages Chats Emails SMS
  • 5.
  • 6. Monthly data volume prior to launch 15B x 1,024 bytes = 14TB 120B x 100 bytes = 11TB
  • 7. Messaging Data ▪ Small/medium sized data HBase ▪ Message metadata & indices ▪ Search index ▪ Small message bodies ▪ Attachments and large messages Haystack ▪ Used for our existing photo/video store
  • 8. Open Source Stack ▪ Memcached --> App Server Cache ▪ ZooKeeper --> Small Data Coordination Service ▪ HBase --> Database Storage Engine ▪ HDFS --> Distributed FileSystem ▪ Hadoop --> Asynchronous Map-Reduce Jobs
  • 9. Our architecture User Directory Service Clients (Front End, MTA, etc.) What’s the cell for this user? Cell 2 Cell 1 Cell 3 Cell 1 Application Server Application Server Application Server Attachments HBase/HDFS/Z HBase/HDFS/Z KMessage, Metadata, HBase/HDFS/Z K Search Index K Haystack
  • 11. HBase in a nutshell • distributed, large-scale data store • efficient at random reads/writes • initially modeled after Google’s BigTable • open source project (Apache)
  • 12. When to use HBase? ▪ storing large amounts of data ▪ need high write throughput ▪ need efficient random access within large data sets ▪ need to scale gracefully with data ▪ for structured and semi-structured data ▪ don’t need full RDMS capabilities (cross table transactions, joins, etc.)
  • 13. HBase Data Model • An HBase table is: • a sparse , three-dimensional array of cells, indexed by: RowKey, ColumnKey, Timestamp/Version • sharded into regions along an ordered RowKey space • Within each region: • Data is grouped into column families ▪ Sort order within each column family: Row Key (asc), Column Key (asc), Timestamp (desc)
  • 14. Example: Inbox Search • Schema • Key: RowKey: userid, Column: word, Version: MessageID • Value: Auxillary info (like offset of word in message) • Data is stored sorted by <userid, word, messageID>: User1:hi:17->offset1 Can efficiently handle queries like: User1:hi:16->offset2 User1:hello:16->offset3 - Get top N messageIDs for a User1:hello:2->offset4 specific user & word ... User2:.... - Typeahead query: for a given user, User2:... get words that match a prefix ...
  • 15. HBase System Overview Database Layer HBASE Master Backup Master Region Region Region ... Server Server Server Coordination Service Storage Layer HDFS Zookeeper Quorum Namenode Secondary Namenode ZK ZK ... Peer Peer Datanode Datanode Datanode ...
  • 16. HBase Overview HBASE Region Server .... Region #2 Region #1 .... ColumnFamily #2 ColumnFamily #1 Memstore (in memory data structure) HFiles (in HDFS) flush Write Ahead Log ( in HDFS)
  • 17. HBase Overview • Very good at random reads/writes • Write path • Sequential write/sync to commit log • update memstore • Read path • Lookup memstore & persistent HFiles • HFile data is sorted and has a block index for efficient retrieval • Background chores • Flushes (memstore -> HFile) • Compactions (group of HFiles merged into one)
  • 18. Why HBase? Performance is great, but what else…
  • 19. Horizontal scalability ▪ HBase & HDFS are elastic by design ▪ Multiple table shards (regions) per physical server ▪ On node additions ▪ Load balancer automatically reassigns shards from overloaded nodes to new nodes ▪ Because filesystem underneath is itself distributed, data for reassigned regions is instantly servable from the new nodes. ▪ Regions can be dynamically split into smaller regions. ▪ Pre-sharding is not necessary ▪ Splits are near instantaneous!
  • 20. Automatic Failover ▪ Node failures automatically detected by HBase Master ▪ Regions on failed node are distributed evenly among surviving nodes. ▪ Multiple regions/server model avoids need for substantial overprovisioning ▪ HBase Master failover ▪ 1 active, rest standby ▪ When active master fails, a standby automatically takes over
  • 21. HBase uses HDFS We get the benefits of HDFS as a storage system for free ▪ Fault tolerance (block level replication for redundancy) ▪ Scalability ▪ End-to-end checksums to detect and recover from corruptions ▪ Map Reduce for large scale data processing ▪ HDFS already battle tested inside Facebook ▪ running petabyte scale clusters ▪ lot of in-house development and operational experience
  • 22. Simpler Consistency Model ▪ HBase’s strong consistency model ▪ simpler for a wide variety of applications to deal with ▪ client gets same answer no matter which replica data is read from ▪ Eventual consistency: tricky for applications fronted by a cache ▪ replicas may heal eventually during failures ▪ but stale data could remain stuck in cache
  • 23. Typical Cluster Layout ▪ Multiple clusters/cells for messaging ▪ 20 servers/rack; 5 or more racks per cluster ▪ Controllers (master/Zookeeper) spread across racks ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer ZooKeeper Peer HDFS Namenode Backup Namenode Job Tracker Hbase Master Backup Master Region Server Region Server Region Server Region Server Region Server Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker 19x... 19x... 19x... 19x... 19x... Region Server Region Server Region Server Region Server Region Server Data Node Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Rack #1 Rack #2 Rack #3 Rack #4 Rack #5
  • 26. Goal of Zero Data Loss/Correctness ▪ sync support added to hadoop-20 branch ▪ for keeping transaction log (WAL) in HDFS ▪ to guarantee durability of transactions ▪ Row-level ACID compliance ▪ Enhanced HDFS’s Block Placement Policy: ▪ Original: rack aware, but minimally constrained ▪ Now: Placement of replicas constrained to configurable node groups ▪ Result: Data loss probability reduced by orders of magnitude
  • 27. Availability/Stability improvements ▪ HBase master rewrite- region assignments using ZK ▪ Rolling Restarts – doing software upgrades without a downtime ▪ Interrupt Compactions – prioritize availability over minor perf gains ▪ Timeouts on client-server RPCs ▪ Staggered major compaction to avoid compaction storms
  • 28. Performance Improvements ▪ Compactions ▪ critical for read performance ▪ Improved compaction algo ▪ delete/TTL/overwrite processing in minor compactions ▪ Read optimizations: ▪ Seek optimizations for rows with large number of cells ▪ Bloom filters to minimize HFile lookups ▪ Timerange hints on HFiles (great for temporal data) ▪ Improved handling of compressed HFiles
  • 29. Operational Experiences ▪ Darklaunch: ▪ shadow traffic on test clusters for continuous, at scale testing ▪ experiment/tweak knobs ▪ simulate failures, test rolling upgrades ▪ Constant (pre-sharding) region count & controlled rolling splits ▪ Administrative tools and monitoring ▪ Alerts (HBCK, memory alerts, perf alerts, health alerts) ▪ auto detecting/decommissioning misbehaving machines ▪ Dashboards ▪ Application level backup/recovery pipeline
  • 30. Working within the Apache community ▪ Growing with the community ▪ Started with a stable, healthy project ▪ In house expertise in both HDFS and HBase ▪ Increasing community involvement ▪ Undertook massive feature improvements with community help ▪ HDFS 0.20-append branch ▪ HBase Master rewrite ▪ Continually interacting with the community to identify and fix issues ▪ e.g., large responses (2GB RPC)
  • 31. Data migration Another place we used HBase heavily…
  • 32. Move messaging data from MySQL to HBase
  • 33. Move messaging data from MySQL to HBase ▪ In MySQL, inbox data was kept normalized ▪ user’s messages are stored across many different machines ▪ Migrating a user is basically one big join across tables spread over many different machines ▪ Multiple terabytes of data (for over 500M users) ▪ Cannot pound 1000s of production UDBs to migrate users
  • 34. How we migrated ▪ Periodically, get a full export of all the users’ inbox data in MySQL ▪ And, use bulk loader to import the above into a migration HBase cluster ▪ To migrate users: ▪ Since users may continue to receive messages during migration: ▪ double-write (to old and new system) during the migration period ▪ Get a list of all recent messages (since last MySQL export) for the user ▪ Load new messages into the migration HBase cluster ▪ Perform the join operations to generate the new data ▪ Export it and upload into the final cluster
  • 36. Facebook Insights Goes Real-Time ▪ Recently launched real-time analytics for social plugins on top of HBase ▪ Publishers get real-time distribution/engagement metrics: ▪ # of impressions, likes ▪ analytics by ▪ Domain, URL, demographics ▪ Over various time periods (the last hour, day, all-time) ▪ Makes use of HBase capabilities like: ▪ Efficient counters (read-modify-write increment operations) ▪ TTL for purging old data
  • 37. Future Work It is still early days…! ▪ Namenode HA (AvatarNode) ▪ Fast hot-backups (Export/Import) ▪ Online schema & config changes ▪ Running HBase as a service (multi-tenancy) ▪ Features (like secondary indices, batching hybrid mutations) ▪ Cross-DC replication ▪ Lot more performance/availability improvements
  • 39. 杭州站 · 2011年10月20日~22日 www.qconhangzhou.com(6月启动) QCon北京站官方网站和资料下载 www.qconbeijing.com