LinkedIn Graph Presentation

The Evolution of the Professional
Graph at LinkedIn

Chris Conrad Igor Perisic
Senior Engineering Manager, Sr. Director of Engineering, SNA
Social Graph

LinkedIn
•  The site officially launched on May 5, 2003. At the end of the first
month in operation, LinkedIn had a total of 4,500 members in the
network.
•  As of January 9, 2013, LinkedIn operates the world’s largest
professional network on the Internet with more than 200 million
members in over 200 countries and territories.
•  As of September 30, 2012, LinkedIn counts executives from all
2012 Fortune 500 companies as members; its corporate talent
solutions are used by 85 of the Fortune 100 companies.
•  As of the school year ending May 2012, there are over 20 million
students and recent college graduates on LinkedIn. They are
LinkedIn's fastest-growing demographic.

The Cloud
•  Cloud is the original name of our graph engine
•  Responsible for read scaling graph queries (and it used to do
search, too)
•  Stored 4 primary sets of data:

Cloud

Member Network
Data Cache

Group
Connections
Membership

What was wrong?
•  Large memory footprint
–  Network cache used simple but inefficient data structures
–  The size and density of the graph was increasing

•  Garbage Collector woes
–  Large JVM heap caused long GC pauses
–  Long GC pauses reduces availability resulting in site outages

C++ Graph
•  First project: migrate the network cache to a new data structure to
reduce memory usage
•  Second project: implement a C++ JNI library to move the graph
data off heap
•  Result: Drastic reduction in JVM heap utilization

Cloud

Java Heap libGraphJNI.so

Member Network
Data Cache
Connections

Group
Membership

New Problems
•  Growth
–  The size and density of the graph was increasing
–  We were running out of memory
–  We were running out of CPU cycles
–  Proliferation of services increased the overhead of maintaining client side
software load balancer
–  As of September 30, 2012, LinkedIn has 3,177 full-time employees located
around the world. LinkedIn started off 2012 with about 2,100 full-time
employees worldwide, up from around 1,000 at the beginning of 2011 and
about 500 at the beginning of 2010.

•  C++ code had a much higher maintenance cost
–  Coredumps are much less friendly than a NullPointerException
–  LinkedIn didn’thave the expertise or infrastructure to support C++
development

Split cloud
•  cloud-session: Move the load balancing logic into a service we
control
•  rgraph: Extract the C++ graph into its own service

cloud-session

Cloud rgraph

Java Heap libGraphJNI.so

Member Network
Data Cache
Connections

Group
Membership

New problems, same as the old
•  rgraph instances still had a large memory footprint
–  The density of the graph was increasing
–  We were running out of memory
–  We were running out of CPU cycles

•  cloud-session’s software load balancer implementation was
essentially a single point of failure

Distribute the Graph
•  Introduce Norbert a new cluster management system
•  Partition the graph data
•  Partition the network cache service

cloud-session
dgraph

Connections
Cloud

Java Heap Group
Membership
Member
Data

Network Cache
Service

What is the professional graph?
•  LinkedIn connections
•  Current and past co-workers
•  University colleagues and alumni
•  Group members
•  And what about geography, industry and skill overlap?

New requirements
•  Members aren’t the only type of node in the professional graph
•  LinkedIn connections aren’t the only type of edge in the
profession graph
•  We already supported groups and group membership

Making changes was hard
•  Code was rigid
–  Data was stored using class hierarchies, introducing data types was
prohibitively slow
–  Queries were built by combining object instances

•  BDBJE
•  Everything was back in the heap
–  Garbage collection time was starting to go up
–  GC pauses no longer caused outages, but flapping introduced high developer
and operational overhead

Graph as a Service
•  Custom persistence engine
–  Log structured
–  Memory mapped files keeps data out of the Java heap
–  Data described using DDL like schema

•  Custom SQL like query language
–  Query language understands DDL
–  Text based language reduces code changes

Graph Queries
•  Company(:id)[CompanyFollowers]

•  Member(:id)[MemberToMember{CreatedAt > :t}]

•  Member(:id)[topN(MemberToMember, Score, 10)]

What’s next?
•  Online schema migration
•  Automated repartitioning and data migration
•  Automated provisioning
•  Hierarchical data partitioning
•  Monitoring and statistics
•  Query optimization
•  Query fragment caching
•  Result set caching
•  Query parallelization
•  Very large data set handling
•  …

And we’re still growing

200M+ 2/sec
63% non U.S.

25th
Most visit website worldwide
90
(Comscore 6-12)

55 >2.6M
Company pages

85%
32

17
8
2 4 Fortune 100 Companies use
LinkedIn to hire
2004 2005 2006 2007 2008 2009 2010 2011
LinkedIn Members (Millions)

We’re Hiring
•  http://studentcareers.linkedin.com
•  Or email me at cconrad@linkedin.com

LinkedIn Graph Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to LinkedIn Graph Presentation

Similar to LinkedIn Graph Presentation (20)

More from Amy W. Tang

More from Amy W. Tang (12)

Recently uploaded

Recently uploaded (20)

LinkedIn Graph Presentation