1. The Evolution of the Professional
Graph at LinkedIn
Chris Conrad Igor Perisic
Senior Engineering Manager, Sr. Director of Engineering, SNA
Social Graph
2. LinkedIn
• The site officially launched on May 5, 2003. At the end of the first
month in operation, LinkedIn had a total of 4,500 members in the
network.
• As of January 9, 2013, LinkedIn operates the world’s largest
professional network on the Internet with more than 200 million
members in over 200 countries and territories.
• As of September 30, 2012, LinkedIn counts executives from all
2012 Fortune 500 companies as members; its corporate talent
solutions are used by 85 of the Fortune 100 companies.
• As of the school year ending May 2012, there are over 20 million
students and recent college graduates on LinkedIn. They are
LinkedIn's fastest-growing demographic.
4. The Cloud
• Cloud is the original name of our graph engine
• Responsible for read scaling graph queries (and it used to do
search, too)
• Stored 4 primary sets of data:
Cloud
Member Network
Data Cache
Group
Connections
Membership
5. What was wrong?
• Large memory footprint
– Network cache used simple but inefficient data structures
– The size and density of the graph was increasing
• Garbage Collector woes
– Large JVM heap caused long GC pauses
– Long GC pauses reduces availability resulting in site outages
6. C++ Graph
• First project: migrate the network cache to a new data structure to
reduce memory usage
• Second project: implement a C++ JNI library to move the graph
data off heap
• Result: Drastic reduction in JVM heap utilization
Cloud
Java Heap libGraphJNI.so
Member Network
Data Cache
Connections
Group
Membership
8. New Problems
• Growth
– The size and density of the graph was increasing
– We were running out of memory
– We were running out of CPU cycles
– Proliferation of services increased the overhead of maintaining client side
software load balancer
– As of September 30, 2012, LinkedIn has 3,177 full-time employees located
around the world. LinkedIn started off 2012 with about 2,100 full-time
employees worldwide, up from around 1,000 at the beginning of 2011 and
about 500 at the beginning of 2010.
• C++ code had a much higher maintenance cost
– Coredumps are much less friendly than a NullPointerException
– LinkedIn didn’thave the expertise or infrastructure to support C++
development
9. Split cloud
• cloud-session: Move the load balancing logic into a service we
control
• rgraph: Extract the C++ graph into its own service
cloud-session
Cloud rgraph
Java Heap libGraphJNI.so
Member Network
Data Cache
Connections
Group
Membership
10. New problems, same as the old
• rgraph instances still had a large memory footprint
– The density of the graph was increasing
– We were running out of memory
– We were running out of CPU cycles
• cloud-session’s software load balancer implementation was
essentially a single point of failure
11. Distribute the Graph
• Introduce Norbert a new cluster management system
• Partition the graph data
• Partition the network cache service
cloud-session
dgraph
Connections
Cloud
Java Heap Group
Membership
Member
Data
Network Cache
Service
18. What is the professional graph?
• LinkedIn connections
• Current and past co-workers
• University colleagues and alumni
• Group members
• And what about geography, industry and skill overlap?
19. New requirements
• Members aren’t the only type of node in the professional graph
• LinkedIn connections aren’t the only type of edge in the
profession graph
• We already supported groups and group membership
20. Making changes was hard
• Code was rigid
– Data was stored using class hierarchies, introducing data types was
prohibitively slow
– Queries were built by combining object instances
• BDBJE
• Everything was back in the heap
– Garbage collection time was starting to go up
– GC pauses no longer caused outages, but flapping introduced high developer
and operational overhead
21. Graph as a Service
• Custom persistence engine
– Log structured
– Memory mapped files keeps data out of the Java heap
– Data described using DDL like schema
• Custom SQL like query language
– Query language understands DDL
– Text based language reduces code changes
25. What’s next?
• Online schema migration
• Automated repartitioning and data migration
• Automated provisioning
• Hierarchical data partitioning
• Monitoring and statistics
• Query optimization
• Query fragment caching
• Result set caching
• Query parallelization
• Very large data set handling
• …
26. And we’re still growing
200M+ 2/sec
63% non U.S.
25th
Most visit website worldwide
90
(Comscore 6-12)
55 >2.6M
Company pages
85%
32
17
8
2 4 Fortune 100 Companies use
LinkedIn to hire
2004 2005 2006 2007 2008 2009 2010 2011
LinkedIn Members (Millions)