SlideShare a Scribd company logo
1 of 40
Download to read offline
Graph Storage & Search at
Facebook
Nitish Upreti
nzu100@cse.psu.edu
Facebook's TAO & Unicorn data storage and search platforms
A Systems Perspective …
• How do you store Petabytes of graph data ?
• How to efficiently serve billion reads and millions of writes each
second ?
• How to search trillions of edges between tens of billions of users with
a search latency ranging maximum in tens of milliseconds (1
millisecond average) ?
• TAO : A read optimized graph data store to server Facebook’s “Social
Graph”.
• Unicorn : Online, In-Memory social graph-aware search and indexing
system.
PART 1 : TAO at Facebook
1. Aggregating & Filtering hundreds of items.
2. Custom Tailored Page with extreme customization and privacy checks.
A walk down memory lane :
Scaling Memcache in Facebook (NSDI’ 13)
• Originally Facebook was built by storing social graph in MySQL and
aggressively cached with Memcache.
• Issues with the original architecture :
• Inefficient Edge lists manipulation. ( Key Value semantics require the entire
edge lists to be reloaded)
• Expensive Read-After-Write consistency : Asynchronous Master/Slave
replication poses a problem for caches in data centers using a replica.
Goals for TAO
• Providing access to nodes and edges of a
constantly changing graph in data centers
across multiple regions.
• Optimize on reads and favor availability over
consistency.
• TAO does not implement complete graph
primitives but provide sufficient
expressiveness to handle most applications
needs.
• Example: Rendering a check-in would query
this event’s underlying nodes and edges every
time. Different users might see different
versions of this check-in.
Data Models and APIs
• Objects and Associations :
• Objects are nodes and Associations and edges.
• Objects are identified as 64-bit integer(id) and associations as (source,
destination) and a association type.
• At most one association of a given type exists between any two objects.
• Both associations and objects may contain key->value pairs.
• Actions may be encoded either as objects or associations ( comments are
objects).
• Although associations are directed, it is common for an association to be
tightly coupled with an inverse edge.
• Discovering the check-in object, however, requires the inbound edges or that
an id is stored in another Facebook system.
Object and Association APIs
• Object APIs :
• Allocate a new object and id.
• Retrieve, Update or Delete the object.
• There is no Compare and Set (due to eventual updated semantics).
• Association APIs :
• Edges could be bidirectional, either symmetrically like the example’s FRIEND
relationship or asymmetrically like AUTHORED and AUTHORED BY.
• Bidirectional edges are modeled as two separate associations. TAO provides
support for keeping associations in sync with their inverses, by allowing
association types to be configured with an inverse type.
• For such associations, creations, updates, and deletions are automatically
coupled with an operation on the inverse association.
Association Lists
• A characteristic of the social graph is that most of the data is old, but
many of the queries are for the newest subset. This creation-time
locality arises whenever an application focuses on recent items.
• For a famous celebrity ‘Justin’, then there might be thousands of
comments attached to his check-in, but only the most recent ones
will be rendered by default.
• TAO’s Association queries are organized around Association Lists.
They have associations, arranged in descending order by the time
field : (id1, type) → [anew ...aold]
• TAO enforces a per Association type upper bound (typically 6,000) on
the actual limit used for an association query. To enumerate the
elements of a longer association list the client must issue multiple
queries.
TAO Architecture
Key Ideas behind TAO’s architecture
Storage :
The data is persisted using MySQL.
The API is mapped to a small number of SQL queries.
Data is divided into logical shards. By default all object types
are stored in one table and association in others.
Every “object_id” has a corresponding “shard_id”. Objects are
bounded to a single shard throughout their lifetime.
An association is stored on the shard of its id1, so that every
association query can be served from a single server.
TAO Architecture ( Continued … )
Caching :
• A region / tier is made of multiple closely located Data centers.
• Multiple Cache Serves make up a tier (set of databases in a region are also called a
tier) that can collectively capable of answering any TAO Request.
• Each cache request maps to a server based on sharding scheme discussed.
• The cache is filled based on a LRU policy.
• Write operations on an association with an inverse may involve two shards, since the
forward edge is stored on the shard for id1 and the inverse edge is on the shard for
id2.
• Handling writes with multiple shards involve : Issuing an RPC call to the member
hosting id2, which will contact the database to create the inverse association. Once
the inverse write is complete, the caching server issues a write to the database for
id1.
• TAO does not provide atomicity between the two updates. If a failure occurs the
forward may exist without an inverse, these hanging associations are scheduled for
repair by an asynchronous job.
Leaders and Followers
• Builds a two level cache hierarchy (L1->L2). (All to All connections in
Single layer cache is susceptible to Hot Spots)
• Clients communicate with the closest followers directly.
• Each shard is hosted by one leader, and all writes to the shard go
through that leader, so it is naturally consistent. Followers, on the
other hand, must be explicitly notified of updates made via other
follower tiers.
• An object update in the leader enqueues invalidation messages to
each corresponding follower.
• Leaders serialize concurrent writes that arrive from followers. Leader
protects databases from “Thundering herds” by not issuing
concurrent writes and limiting maximum number of queries.
TAO’s Stack
Scaling Geographically
• High read workload scales with total number of follower servers.
• The assumption is that latency between followers and leaders is low.
• Followers behave identically in all regions, forwarding read misses and
writes to the local region’s leader tier. Leaders query the local region’s
database regardless of whether it is the master or slave. This means
that read latency is independent of inter-region latency.
• Writes are forwarded by the local leader to the leader that is in the
region with the master database. Read misses by followers are 25X as
frequent as writes in the workload thus read misses are served locally.
• Facebook chooses data center locations that are clustered into only a
few regions, where the intra-region latency is small (typically less than
1 millisecond). It is then sufficient to store one complete copy of the
social graph per region.
Scaling Geographically …
• Since each cache hosts multiple shards, a server may be both a master and a
slave at the same time. It is preferred to locate all of the master databases in a
single region.
• When an inverse association is mastered in a different region, TAO must traverse
an extra inter-region link to forward the inverse write.
• TAO embeds invalidation and refill messages in the database replication stream.
These messages are delivered in a region immediately after a transaction has
been replicated to a slave database. Delivering such messages earlier would
create cache inconsistencies, as reading from the local database would provide
stale data.
• If a forwarded write is successful then the local leader will update its cache with
the fresh value, even though the local slave database probably has not yet been
updated by the asynchronous replication stream. In this case followers will
receive two invalidates or refills from the write, one that is sent when the write
succeeds and one that is sent when the write’s transaction is replicated to the
local slave database.
TAO’s STACK (Multiple Region)
Consistency Matters
• In the end consistency is The KEY !
• Imagine a scenario : Likes on your Facebook post magically increasing
or decreasing ?
• TAO’s master/slave design ensures that all reads can be satisfied
within a single region, at the expense of potentially returning stale
data to clients. As long as a user consistently queries the same
follower tier, the user will typically have a consistent view of TAO
state.
Implementation
• All the data related to objects are serialized into a single ‘data’ column
( supporting flexible schema).
• Shards are mapped to cache servers using Consistent Hashing. TAO
rebalances load among followers with shard cloning, in which reads
to a shard are served by multiple followers in a tier.
• Versioning is used to omit replies if data has not changed.
• The master database is a consistent source of truth. We can mark
certain requests as critical and proxy them to master (authentication)
Failure Detection and Handling
• TAO servers employ aggressive network timeouts so as not to continue
waiting on responses that may never arrive.
• Databases are marked down in a global configuration if they crash / taken
offline for maintenance or if they get too far behind. When a master
database is down, one of its slaves is automatically promoted to be the
new master.
• Followers Failure
• Followers in other tier (Backup) share the responsibility of the shard.
• Leader Failure
• Followers route read requests around it directly to database.
• Write requests are rerouted to a random member of leader’s tier.
• Invalidation Message Failure
• Leaders queue message to disk if followers are unreachable.
• If a leader failure occurs and is replaced : All shards that map to it must be
invalidated in the followers, to restore consistency.
Some Performance Metrics
Replication: TAO’s slave storage servers lag their master by
less than 1 second during 85% of the tracing window, by
less than 3 seconds 99% of the time, and by less than 10
seconds 99.8% of the time.
PART 2 : Unicorn at Facebook
What is Graph Search ?
What is Unicorn?
• Online, In-Memory “social graph aware” indexing system serving
billions of query a day.
• The idea is to promote social proximity.
• Serves as the backend infrastructure for graph search.
• Searching all basic structured information on the social graph and
perform complex set of operations on the results.
• Why a big deal ?
Facebook engineer’s joked that – much like the mythical quadruped—this system would
solve all of our problems and heal our woes if only it existed.
Core Technical Ideas
• Applying common information retrieval architectural concepts in the
domain of social graph search.
• How do you promote socially relevant search results ?
• Building rich operators ( apply & extract ) that allow rich semantic
graph queries that allow multiple round trip algorithms for serving
complicated queries.
Data Model for Graph Search
• There are billions of users in social graph. An average user is friend
has approximately 130 friends.
• Best way to implement social graph (sparse) : Adjacency Lists.
Hits = Results
Posting List = Adjacency Lists
Hit Data is extra meta data.
Sort key helps us find globally important
ids.
Unicorn API & Popular Edge Types
• Client sends ‘Thrift’ requests to server. (Facebook’s own Protocol –
Buffer)
• Request is routed to closest Unicorn server.
• Several operators supported : Or, And, Difference.
• Meta-Data : ‘graduation year’ and ‘major’ for attended.
Unicorn’s Architecture
• In Distributed Systems: Never ever forget to
shard !
• All Posting lists are sharded by ‘result_id’.
• Index servers store adjacency lists and perform
set operations on those lists.
• Each index server is responsible for a particular
shard.
• Rack Aggregator benefits from the fact that
bandwidth to servers within a rack is higher.
Search across Verticals
Building and Updating Index
• Raw data is scraped from MySQL and indexes are built with Hadoop.
• The data is accessible via Hive.
• To avoid lag (common in batch processing). For pushing latest minute
data : Facebook uses Scribe.
• Each index server keep tracks of the last updated timestamp.
TYPEAHEAD Search
• It all started with a Type Ahead Search.
• Users are shown a list of possible match for the query as they are
typing.
• Index servers for Type Ahead contain posting lists for every name
prefix up to a predefined character limit.
• These posting lists contain the ids of users whose first or last name
matches the prefix.
• A simple Type ahead implementation would merely map input
prefixes to the posting lists for those prefixes and return the resultant
ids.
• How do you make this socially relevant ?
Serving Socially Relevant Results
• How do you ensure that search results are
socially relevant ?
• Can we “AND” the solution with the friend list
of user ?
( Ignores results for users who might be relevant but
are not friends with the user performing the search).
• We actually want a way to force some fraction
of the final results to possess a trait, while not
requiring this trait from all results.
• The answer is WeakAnd operator.
• The WeakAnd operator is a modification of
And that allows operands to be missing from
some fraction of the results within an index
shard.
• Implementation : Allow only finite number of
hits to be non-friends.
Priscilla Chan (3), looking for : “Melanie Mars” ….
Strong OR
• Requires certain operands to be present
in some fraction of the matches.
• Enforces diversity in the set.
• Example : Fetching geographically
diversity in the result set.
• At least 20% from San Francisco.
• An optional weight parameter as well.
Scoring Search Results
• We might want to prioritize results for individuals who are in close in age to
the user typing the query.
• This requires that we store the age (or birth date) of users with the index.
• For storing per-entity metadata, Unicorn provides a forward index, which is
simply a map of id to a blob that contains metadata for the id. The forward
index for an index shard only contains entries for the ids that reside on that
shard.
• Based on Thrift parameters included with the client’s request, the client
can select a scoring function in the index server that scores each result.
• Aggregators give priority to documents with higher score.
Graph Search
• Our discussion of graph search spans : users, pages, apps, events etc.
• Imagine a scenario : We might want to know the pages liked by friends of
Bill who likes Trekking :
1. First execute the query (and friend:7 likers:42)
2. Collecting the results, and create a new query that produces the union of the
pages liked by any of these individuals.
• Inefficient due to multiple round trips involved between index servers and
top aggregator.
• The ability to use the results of previous executions as seeds for future
executions creates new applications for a search system, and was the
inspiration for Facebook’s Graph Search consumer product. The idea was to
build a general-purpose, online system for users to find entities in the
social graph that matched a set of user-defined constraints.
Apply Operator
• A graph traversal operator that allows
client to query a set of ids and then use the
resultant ids to construct and execute a
new query.
• Apply is a ‘syntactic sugar’ to allow a
system to perform expensive operations
lower in the hardware stack. However, by
allowing clients to show semantic intent,
optimizations are possible to preserve
search time.
Extract Operator
• Say you want to look up people tagged in
photos of “Jon Jones”.
• Solution: ‘Apply’ operator to look up photos
of Jon Jones in the photos vertical and then
to query the users vertical for people
tagged in these photos.
• Now you need hundred of billions of new
terms in users vertical.
• Billions of “one to few” mapping.
• Better way: Store the ids of people tagged
in a photo in the forward index data for that
photo in the photos vertical. This is a case
where partially de-normalizing. We thus
store the result ids in the forward index of
the secondary vertical and do the lookup
inline.
• This is exactly what Extract operator
accomplishes.
Preserving Privacy
• Privacy is crucial !
• Certain graph edges cannot be shown to all users but rather only to
users who are friends with or in the same network as a particular
person.
• Unicorn itself does not have privacy information incorporated into its
index : Strict consistency and durability guarantees are absent that
are needed for a full privacy solution.
• Facebook PHP frontend makes a proper privacy check on the result.
This design decision imposes a modest efficiency penalty on the
overall system.
• However it also keeps privacy logic separate from Unicorn with the
DRY (“Don’t Repeat Yourself”) principle of software development
Lineage : Preserving Privacy
To enable clients to make privacy decisions, a string of
metadata is attached to each search result to describe
its lineage.
Lineage is a structured representation of the edges
that were traversed in order to yield a result.
Questions?

More Related Content

What's hot

Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLCockroachDB
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB FundamentalsMongoDB
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsScyllaDB
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil Nair
 

What's hot (20)

Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
6.hive
6.hive6.hive
6.hive
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQL
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systems
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL Systems
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016Anil nair rac_internals_sangam_2016
Anil nair rac_internals_sangam_2016
 

Viewers also liked

Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook NetworksAndy Carvin
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphAdrian-Tudor Panescu
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph PresentationAmy W. Tang
 
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudKey-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudUniversity of New South Wales
 
Social Network Analysis at LinkedIn
Social Network Analysis at LinkedInSocial Network Analysis at LinkedIn
Social Network Analysis at LinkedInMitul Tiwari
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 

Viewers also liked (7)

Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook Networks
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social Graph
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph Presentation
 
Dex
DexDex
Dex
 
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudKey-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
 
Social Network Analysis at LinkedIn
Social Network Analysis at LinkedInSocial Network Analysis at LinkedIn
Social Network Analysis at LinkedIn
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 

Similar to Facebook's TAO & Unicorn data storage and search platforms

Massively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentationMassively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentationkriptonium
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 
AdvanceDatabaseChapter6Advance Dtabases.pptx
AdvanceDatabaseChapter6Advance Dtabases.pptxAdvanceDatabaseChapter6Advance Dtabases.pptx
AdvanceDatabaseChapter6Advance Dtabases.pptxXanGwaps
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdfUsmanAhmed269749
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Alexey Mahotkin
 
Introducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper AlternativeIntroducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper AlternativeHostedbyConfluent
 
Database awareness
Database awarenessDatabase awareness
Database awarenesskloia
 

Similar to Facebook's TAO & Unicorn data storage and search platforms (20)

Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Massively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentationMassively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentation
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
NoSql
NoSqlNoSql
NoSql
 
Apache hive
Apache hiveApache hive
Apache hive
 
AdvanceDatabaseChapter6Advance Dtabases.pptx
AdvanceDatabaseChapter6Advance Dtabases.pptxAdvanceDatabaseChapter6Advance Dtabases.pptx
AdvanceDatabaseChapter6Advance Dtabases.pptx
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdf
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Introducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper AlternativeIntroducing Oxia: A Scalable Zookeeper Alternative
Introducing Oxia: A Scalable Zookeeper Alternative
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
MongoDB
MongoDBMongoDB
MongoDB
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Database awareness
Database awarenessDatabase awareness
Database awareness
 

More from Nitish Upreti

More from Nitish Upreti (7)

Spark
SparkSpark
Spark
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Project progress
Project progressProject progress
Project progress
 
Socail Influence & Homophilly
Socail Influence & HomophillySocail Influence & Homophilly
Socail Influence & Homophilly
 
Software testing
Software testingSoftware testing
Software testing
 
PSU CSE 541 Project Idea
PSU CSE 541 Project IdeaPSU CSE 541 Project Idea
PSU CSE 541 Project Idea
 

Recently uploaded

Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxKISHAN KUMAR
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxJoseeMusabyimana
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecTrupti Shiralkar, CISSP
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technologyabdulkadirmukarram03
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvementVijayMuni2
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabusViolet Violet
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid BodyAhmadHajasad2
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationMohsinKhanA
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...amrabdallah9
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingMarian Marinov
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS Bahzad5
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Projectreemakb03
 
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...Amil baba
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptxMUKULKUMAR210
 

Recently uploaded (20)

Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptx
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptx
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technology
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvement
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabus
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software Simulation
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & Logging
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
 
Présentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdfPrésentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdf
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Project
 
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptx
 

Facebook's TAO & Unicorn data storage and search platforms

  • 1. Graph Storage & Search at Facebook Nitish Upreti nzu100@cse.psu.edu
  • 3. A Systems Perspective … • How do you store Petabytes of graph data ? • How to efficiently serve billion reads and millions of writes each second ? • How to search trillions of edges between tens of billions of users with a search latency ranging maximum in tens of milliseconds (1 millisecond average) ? • TAO : A read optimized graph data store to server Facebook’s “Social Graph”. • Unicorn : Online, In-Memory social graph-aware search and indexing system.
  • 4. PART 1 : TAO at Facebook
  • 5. 1. Aggregating & Filtering hundreds of items. 2. Custom Tailored Page with extreme customization and privacy checks.
  • 6. A walk down memory lane : Scaling Memcache in Facebook (NSDI’ 13) • Originally Facebook was built by storing social graph in MySQL and aggressively cached with Memcache. • Issues with the original architecture : • Inefficient Edge lists manipulation. ( Key Value semantics require the entire edge lists to be reloaded) • Expensive Read-After-Write consistency : Asynchronous Master/Slave replication poses a problem for caches in data centers using a replica.
  • 7. Goals for TAO • Providing access to nodes and edges of a constantly changing graph in data centers across multiple regions. • Optimize on reads and favor availability over consistency. • TAO does not implement complete graph primitives but provide sufficient expressiveness to handle most applications needs. • Example: Rendering a check-in would query this event’s underlying nodes and edges every time. Different users might see different versions of this check-in.
  • 8. Data Models and APIs • Objects and Associations : • Objects are nodes and Associations and edges. • Objects are identified as 64-bit integer(id) and associations as (source, destination) and a association type. • At most one association of a given type exists between any two objects. • Both associations and objects may contain key->value pairs. • Actions may be encoded either as objects or associations ( comments are objects). • Although associations are directed, it is common for an association to be tightly coupled with an inverse edge. • Discovering the check-in object, however, requires the inbound edges or that an id is stored in another Facebook system.
  • 9. Object and Association APIs • Object APIs : • Allocate a new object and id. • Retrieve, Update or Delete the object. • There is no Compare and Set (due to eventual updated semantics). • Association APIs : • Edges could be bidirectional, either symmetrically like the example’s FRIEND relationship or asymmetrically like AUTHORED and AUTHORED BY. • Bidirectional edges are modeled as two separate associations. TAO provides support for keeping associations in sync with their inverses, by allowing association types to be configured with an inverse type. • For such associations, creations, updates, and deletions are automatically coupled with an operation on the inverse association.
  • 10. Association Lists • A characteristic of the social graph is that most of the data is old, but many of the queries are for the newest subset. This creation-time locality arises whenever an application focuses on recent items. • For a famous celebrity ‘Justin’, then there might be thousands of comments attached to his check-in, but only the most recent ones will be rendered by default. • TAO’s Association queries are organized around Association Lists. They have associations, arranged in descending order by the time field : (id1, type) → [anew ...aold] • TAO enforces a per Association type upper bound (typically 6,000) on the actual limit used for an association query. To enumerate the elements of a longer association list the client must issue multiple queries.
  • 11. TAO Architecture Key Ideas behind TAO’s architecture Storage : The data is persisted using MySQL. The API is mapped to a small number of SQL queries. Data is divided into logical shards. By default all object types are stored in one table and association in others. Every “object_id” has a corresponding “shard_id”. Objects are bounded to a single shard throughout their lifetime. An association is stored on the shard of its id1, so that every association query can be served from a single server.
  • 12. TAO Architecture ( Continued … ) Caching : • A region / tier is made of multiple closely located Data centers. • Multiple Cache Serves make up a tier (set of databases in a region are also called a tier) that can collectively capable of answering any TAO Request. • Each cache request maps to a server based on sharding scheme discussed. • The cache is filled based on a LRU policy. • Write operations on an association with an inverse may involve two shards, since the forward edge is stored on the shard for id1 and the inverse edge is on the shard for id2. • Handling writes with multiple shards involve : Issuing an RPC call to the member hosting id2, which will contact the database to create the inverse association. Once the inverse write is complete, the caching server issues a write to the database for id1. • TAO does not provide atomicity between the two updates. If a failure occurs the forward may exist without an inverse, these hanging associations are scheduled for repair by an asynchronous job.
  • 13. Leaders and Followers • Builds a two level cache hierarchy (L1->L2). (All to All connections in Single layer cache is susceptible to Hot Spots) • Clients communicate with the closest followers directly. • Each shard is hosted by one leader, and all writes to the shard go through that leader, so it is naturally consistent. Followers, on the other hand, must be explicitly notified of updates made via other follower tiers. • An object update in the leader enqueues invalidation messages to each corresponding follower. • Leaders serialize concurrent writes that arrive from followers. Leader protects databases from “Thundering herds” by not issuing concurrent writes and limiting maximum number of queries.
  • 15. Scaling Geographically • High read workload scales with total number of follower servers. • The assumption is that latency between followers and leaders is low. • Followers behave identically in all regions, forwarding read misses and writes to the local region’s leader tier. Leaders query the local region’s database regardless of whether it is the master or slave. This means that read latency is independent of inter-region latency. • Writes are forwarded by the local leader to the leader that is in the region with the master database. Read misses by followers are 25X as frequent as writes in the workload thus read misses are served locally. • Facebook chooses data center locations that are clustered into only a few regions, where the intra-region latency is small (typically less than 1 millisecond). It is then sufficient to store one complete copy of the social graph per region.
  • 16. Scaling Geographically … • Since each cache hosts multiple shards, a server may be both a master and a slave at the same time. It is preferred to locate all of the master databases in a single region. • When an inverse association is mastered in a different region, TAO must traverse an extra inter-region link to forward the inverse write. • TAO embeds invalidation and refill messages in the database replication stream. These messages are delivered in a region immediately after a transaction has been replicated to a slave database. Delivering such messages earlier would create cache inconsistencies, as reading from the local database would provide stale data. • If a forwarded write is successful then the local leader will update its cache with the fresh value, even though the local slave database probably has not yet been updated by the asynchronous replication stream. In this case followers will receive two invalidates or refills from the write, one that is sent when the write succeeds and one that is sent when the write’s transaction is replicated to the local slave database.
  • 18. Consistency Matters • In the end consistency is The KEY ! • Imagine a scenario : Likes on your Facebook post magically increasing or decreasing ? • TAO’s master/slave design ensures that all reads can be satisfied within a single region, at the expense of potentially returning stale data to clients. As long as a user consistently queries the same follower tier, the user will typically have a consistent view of TAO state.
  • 19. Implementation • All the data related to objects are serialized into a single ‘data’ column ( supporting flexible schema). • Shards are mapped to cache servers using Consistent Hashing. TAO rebalances load among followers with shard cloning, in which reads to a shard are served by multiple followers in a tier. • Versioning is used to omit replies if data has not changed. • The master database is a consistent source of truth. We can mark certain requests as critical and proxy them to master (authentication)
  • 20. Failure Detection and Handling • TAO servers employ aggressive network timeouts so as not to continue waiting on responses that may never arrive. • Databases are marked down in a global configuration if they crash / taken offline for maintenance or if they get too far behind. When a master database is down, one of its slaves is automatically promoted to be the new master. • Followers Failure • Followers in other tier (Backup) share the responsibility of the shard. • Leader Failure • Followers route read requests around it directly to database. • Write requests are rerouted to a random member of leader’s tier. • Invalidation Message Failure • Leaders queue message to disk if followers are unreachable. • If a leader failure occurs and is replaced : All shards that map to it must be invalidated in the followers, to restore consistency.
  • 21. Some Performance Metrics Replication: TAO’s slave storage servers lag their master by less than 1 second during 85% of the tracing window, by less than 3 seconds 99% of the time, and by less than 10 seconds 99.8% of the time.
  • 22. PART 2 : Unicorn at Facebook
  • 23. What is Graph Search ?
  • 24. What is Unicorn? • Online, In-Memory “social graph aware” indexing system serving billions of query a day. • The idea is to promote social proximity. • Serves as the backend infrastructure for graph search. • Searching all basic structured information on the social graph and perform complex set of operations on the results. • Why a big deal ? Facebook engineer’s joked that – much like the mythical quadruped—this system would solve all of our problems and heal our woes if only it existed.
  • 25. Core Technical Ideas • Applying common information retrieval architectural concepts in the domain of social graph search. • How do you promote socially relevant search results ? • Building rich operators ( apply & extract ) that allow rich semantic graph queries that allow multiple round trip algorithms for serving complicated queries.
  • 26. Data Model for Graph Search • There are billions of users in social graph. An average user is friend has approximately 130 friends. • Best way to implement social graph (sparse) : Adjacency Lists. Hits = Results Posting List = Adjacency Lists Hit Data is extra meta data. Sort key helps us find globally important ids.
  • 27. Unicorn API & Popular Edge Types • Client sends ‘Thrift’ requests to server. (Facebook’s own Protocol – Buffer) • Request is routed to closest Unicorn server. • Several operators supported : Or, And, Difference. • Meta-Data : ‘graduation year’ and ‘major’ for attended.
  • 28. Unicorn’s Architecture • In Distributed Systems: Never ever forget to shard ! • All Posting lists are sharded by ‘result_id’. • Index servers store adjacency lists and perform set operations on those lists. • Each index server is responsible for a particular shard. • Rack Aggregator benefits from the fact that bandwidth to servers within a rack is higher.
  • 30. Building and Updating Index • Raw data is scraped from MySQL and indexes are built with Hadoop. • The data is accessible via Hive. • To avoid lag (common in batch processing). For pushing latest minute data : Facebook uses Scribe. • Each index server keep tracks of the last updated timestamp.
  • 31. TYPEAHEAD Search • It all started with a Type Ahead Search. • Users are shown a list of possible match for the query as they are typing. • Index servers for Type Ahead contain posting lists for every name prefix up to a predefined character limit. • These posting lists contain the ids of users whose first or last name matches the prefix. • A simple Type ahead implementation would merely map input prefixes to the posting lists for those prefixes and return the resultant ids. • How do you make this socially relevant ?
  • 32. Serving Socially Relevant Results • How do you ensure that search results are socially relevant ? • Can we “AND” the solution with the friend list of user ? ( Ignores results for users who might be relevant but are not friends with the user performing the search). • We actually want a way to force some fraction of the final results to possess a trait, while not requiring this trait from all results. • The answer is WeakAnd operator. • The WeakAnd operator is a modification of And that allows operands to be missing from some fraction of the results within an index shard. • Implementation : Allow only finite number of hits to be non-friends. Priscilla Chan (3), looking for : “Melanie Mars” ….
  • 33. Strong OR • Requires certain operands to be present in some fraction of the matches. • Enforces diversity in the set. • Example : Fetching geographically diversity in the result set. • At least 20% from San Francisco. • An optional weight parameter as well.
  • 34. Scoring Search Results • We might want to prioritize results for individuals who are in close in age to the user typing the query. • This requires that we store the age (or birth date) of users with the index. • For storing per-entity metadata, Unicorn provides a forward index, which is simply a map of id to a blob that contains metadata for the id. The forward index for an index shard only contains entries for the ids that reside on that shard. • Based on Thrift parameters included with the client’s request, the client can select a scoring function in the index server that scores each result. • Aggregators give priority to documents with higher score.
  • 35. Graph Search • Our discussion of graph search spans : users, pages, apps, events etc. • Imagine a scenario : We might want to know the pages liked by friends of Bill who likes Trekking : 1. First execute the query (and friend:7 likers:42) 2. Collecting the results, and create a new query that produces the union of the pages liked by any of these individuals. • Inefficient due to multiple round trips involved between index servers and top aggregator. • The ability to use the results of previous executions as seeds for future executions creates new applications for a search system, and was the inspiration for Facebook’s Graph Search consumer product. The idea was to build a general-purpose, online system for users to find entities in the social graph that matched a set of user-defined constraints.
  • 36. Apply Operator • A graph traversal operator that allows client to query a set of ids and then use the resultant ids to construct and execute a new query. • Apply is a ‘syntactic sugar’ to allow a system to perform expensive operations lower in the hardware stack. However, by allowing clients to show semantic intent, optimizations are possible to preserve search time.
  • 37. Extract Operator • Say you want to look up people tagged in photos of “Jon Jones”. • Solution: ‘Apply’ operator to look up photos of Jon Jones in the photos vertical and then to query the users vertical for people tagged in these photos. • Now you need hundred of billions of new terms in users vertical. • Billions of “one to few” mapping. • Better way: Store the ids of people tagged in a photo in the forward index data for that photo in the photos vertical. This is a case where partially de-normalizing. We thus store the result ids in the forward index of the secondary vertical and do the lookup inline. • This is exactly what Extract operator accomplishes.
  • 38. Preserving Privacy • Privacy is crucial ! • Certain graph edges cannot be shown to all users but rather only to users who are friends with or in the same network as a particular person. • Unicorn itself does not have privacy information incorporated into its index : Strict consistency and durability guarantees are absent that are needed for a full privacy solution. • Facebook PHP frontend makes a proper privacy check on the result. This design decision imposes a modest efficiency penalty on the overall system. • However it also keeps privacy logic separate from Unicorn with the DRY (“Don’t Repeat Yourself”) principle of software development
  • 39. Lineage : Preserving Privacy To enable clients to make privacy decisions, a string of metadata is attached to each search result to describe its lineage. Lineage is a structured representation of the edges that were traversed in order to yield a result.