SlideShare a Scribd company logo
1 of 73
Social Network Analysis at
         Linkedin
                 Mitul Tiwari
     Search, Network, and Analytics (SNA)
                  LinkedIn
                      1
Who am I?




    2
Outline


About Me

About LinkedIn

Analytics products at LinkedIn

Fun Facts: Sequencing the Startup DNA


                    3
LinkedIn by the numbers

120,000,000+ users (August 4, 2011)

2+ new user registrations per second

81+ Million monthly unique users* (Comscore)

2 Billion People Searches in 2010

7.1 Billion page views in Q2 2010

2+ Million companies with LinkedIn Company Pages

2+ M LinkedIn Groups

                            4
Broad Range of Products




           5
User Profile




     6
Connections




     7
Communications

Communications




                        10


                 8
Hiring Solutions




       9
Job Search




    10
Company Pages




      11
Outline


About Me

About LinkedIn

Analytics products at LinkedIn

Fun Facts: Sequencing the Startup DNA


                    12
What do I mean by Analytics Products?




                 13
People You May Know




         14
Profile Stats: WVMP




        15
Viewers of this profile also ...




               16
Skills




  17
InMaps




  18
Related Searches


Millions of Searches everyday

Help users to explore and refine their queries




                     19
Analytics Products: Key Ideas

Recommendations
 People You May Know, Related Searches, Viewers of
 this profile ...

Insight
 Profile Stats: Who Viewed My Profile, Skills

Visualization
 InMaps
                      20
Analytics Products: Challenges


 LinkedIn: largest professional network

 120+ million members on LinkedIn

 Billions of pageviews

 Terabytes of data to process


                      21
Outline


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    22
Connection Strength


Let’s build “Connection Strength”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                     23
Connection Strength


How well do people know each other?

Connection Strength

Application: reorder updates to show updates
from close connections



                      24
Connection Strength
  How well do
                          Alice
people know each
     other?


               Bob                Carol




                     25
Connection Strength
  How well do
                          Alice
people know each
     other?


               Bob                Carol




                     26
Connection Strength
  How well do
                               Alice
people know each
     other?


               Bob                     Carol



                   Triangle closing


                          27
Connection Strength
  How well do
                              Alice
people know each
     other?


               Bob                    Carol



                 Triangle closing
Prob(Bob knows Carol) ~ the # of common connections

                         28
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      29
Pig Overview
Load: load data, specify format

Store: store data, specify format

Foreach, Generate: Projections, similar to select

Group by: group by column(s)

Join, Filter, Limit, Order, ...

User Defined Functions (UDFs)
                        30
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      31
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      32
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      33
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      34
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      35
Triangle Closing Example
                                   Alice




                  Bob                       Carol

                               connections = LOAD `connections` USING
1.(A,B),(B,A),(A,C),(C,A)      PigStorage();
2.(A,{B,C}),(B,{A}),(C,{A})
3.(A,{B,C}),(A,{C,B})
4.(B,C,1), (C,B,1)
                              36
Triangle Closing Example
                                    Alice




                  Bob                         Carol


1.(A,B),(B,A),(A,C),(C,A)
                              group_conn = GROUP connections BY
2.(A,{B,C}),(B,{A}),(C,{A})   source_id;
3.(A,{B,C}),(A,{C,B})
4.(B,C,1), (C,B,1)
                               37
Triangle Closing Example
                                     Alice




                  Bob                             Carol


1.(A,B),(B,A),(A,C),(C,A)
2.(A,{B,C}),(B,{A}),(C,{A})
                              pairs = FOREACH group_conn GENERATE
3.(A,{B,C}),(A,{C,B})         generatePair(connections.dest_id) as (id1, id2);
4.(B,C,1), (C,B,1)
                                38
Triangle Closing Example
                                     Alice




                  Bob                           Carol


1.(A,B),(B,A),(A,C),(C,A)
2.(A,{B,C}),(B,{A}),(C,{A})   common_conn = GROUP pairs BY (id1, id2);
                              common_conn = FOREACH common_conn
3.(A,{B,C}),(A,{C,B})         GENERATE flatten(group) as (source_id, dest_id),
4.(B,C,1), (C,B,1)            COUNT(pairs) as common_connections;
                                39
Connection Strength



Age

Company Overlap

School Overlap



                  40
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      41
Systems and Tools


Kafka (LinkedIn)

Hadoop (Apache)

Azkaban (LinkedIn)

Voldemort (LinkedIn)


                     42
Data Cycle




    43
Systems and Tools

Kafka
 publish-subscribe messaging system

 transfer data from production to HDFS

Hadoop

Azkaban

Voldemort

                      44
Systems and Tools

Kafka

Hadoop
 Java MapReduce and Pig

 process data

Azkaban

Voldemort

                    45
Systems and Tools

Kafka

Hadoop

Azkaban
 Hadoop workflow management tool

 to manage hundreds of Hadoop jobs

Voldemort

                     46
Systems and Tools

Kafka

Hadoop

Azkaban

Voldemort
 Key-value store

 store output of Hadoop jobs and serve in production

                      47
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      48
Hadoop MapReduce

Large-scale distributed data processing

Map phase: partition the problem in smaller
steps

Reduce phase: aggregate the output of smaller
steps

Allows parallelism and fault-tolerance

                     49
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      50
Our Workflow

         Triangle
Age                   School Overlap
         Closing



         Union




       push-to-prod



                 51
Our Workflow

                Triangle
  Age                        School Overlap
                Closing



                Union




push-to-qa    push-to-prod



                        52
Sample Workflow




      53
Azkaban Workflow Management

  Dependency management
  Regular Scheduling
  Monitoring
  Diverse jobs: Java, Pig, Clojure
  Configuration/Parameters
  Resource control/locking
  Restart/Stop/Retry
  Visualization
  History
  Logs
                           54
Our Workflow

         Triangle
Age                   School Overlap
         Closing



         Union




       push-to-prod



                 55
Our Workflow

         Triangle
Age                   School Overlap
         Closing



         Union




       push-to-prod



                 56
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      57
Voldemort

Large amount of data/Scalable

Quick lookup/low latency

Versioning and Rollback

Fault tolerance through replication

Read only

Offline index building

                        58
Voldemort RO Store




        59
Outline


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    60
Related Searches


Millions of Searches everyday

Help users to explore and refine their queries




                     61
How to build Related Searches?

 Collaborative Filtering: searches done in the
 same session



        Q1       Q2        Q3    Q4

             Time



                      62
How to build Related Searches?

 Searches correlated by results clicks


         Q1                 R1




         Qn                 Rm


                       63
How to build Related Searches?

 Searches with overlapping terms



 Q1     Jeff LinkedIn


 Q2         LinkedIn CEO




                        64
Outline


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    65
Sequencing the Startup DNA




            Done by Monica Rogati at
                   LinkedIn


            66
Sequencing the Startup DNA

•Top majors: Entrepreneurship, Computer Engineering
•Bottom: Nursing, Administration




                         67
Sequencing the Startup DNA

•Most founders age at first startup between 20 and 40




                          68
Sequencing the Startup DNA

•Top regions: San Francisco, NYC




                         69
Sequencing the Startup DNA


•Top Business Schools:
Stanford, Harvard, MIT Sloan




                         70
Things Covered


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    71
SNA Team



Thanks to SNA Team at LinkedIn

http://sna-projects.com

We are hiring!



                    72
Questions?




    73

More Related Content

What's hot

Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPremsankar Chakkingal
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learningJustin Sebok
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data MiningR A Akerkar
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101librarianrafia
 
Null values, insert, delete and update in database
Null values, insert, delete and update in databaseNull values, insert, delete and update in database
Null values, insert, delete and update in databaseHemant Suthar
 
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Nguyen Cao
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit Vpkaviya
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 

What's hot (20)

Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
web mining
web miningweb mining
web mining
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
Null values, insert, delete and update in database
Null values, insert, delete and update in databaseNull values, insert, delete and update in database
Null values, insert, delete and update in database
 
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 
Text MIning
Text MIningText MIning
Text MIning
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 

Viewers also liked

Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8Manuela Aparicio
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphAdrian-Tudor Panescu
 
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudKey-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudUniversity of New South Wales
 
Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook NetworksAndy Carvin
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph PresentationAmy W. Tang
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 

Viewers also liked (8)

Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social Graph
 
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudKey-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
 
Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook Networks
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph Presentation
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
Dex
DexDex
Dex
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 

Similar to Social Network Analysis at LinkedIn

Building Data Driven Products at Linkedin
Building Data Driven Products at LinkedinBuilding Data Driven Products at Linkedin
Building Data Driven Products at LinkedinMitul Tiwari
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api TrainingSpark Summit
 
Building a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDBBuilding a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDBMongoDB
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19ngamou
 
Clustering
ClusteringClustering
Clusteringbutest
 
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Zia Babar
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesReal-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesEugene Dvorkin
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesMax De Marzi
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases MongoDB
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
UNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt pythonUNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt pythonSuganthiDPSGRKCW
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidYousun Jeong
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analyticsdatablend
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
RDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondRDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondFadi Maali
 

Similar to Social Network Analysis at LinkedIn (20)

Building Data Driven Products at Linkedin
Building Data Driven Products at LinkedinBuilding Data Driven Products at Linkedin
Building Data Driven Products at Linkedin
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
Building a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDBBuilding a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDB
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
 
Clustering
ClusteringClustering
Clustering
 
Bill howe 2_databases
Bill howe 2_databasesBill howe 2_databases
Bill howe 2_databases
 
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesReal-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker Notes
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
UNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt pythonUNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt python
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druid
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analytics
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
RDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondRDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and Beyond
 

More from Mitul Tiwari

Large scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInLarge scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInMitul Tiwari
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
 
Modeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systemsModeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systemsMitul Tiwari
 
Large scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationLarge scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationMitul Tiwari
 
Metaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendationsMetaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendationsMitul Tiwari
 
Related searches at LinkedIn
Related searches at LinkedInRelated searches at LinkedIn
Related searches at LinkedInMitul Tiwari
 
Structural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender SystemsStructural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender SystemsMitul Tiwari
 
Organizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsOrganizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsMitul Tiwari
 
Large-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and OpportunityLarge-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and OpportunityMitul Tiwari
 

More from Mitul Tiwari (9)

Large scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInLarge scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedIn
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
Modeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systemsModeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systems
 
Large scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationLarge scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluation
 
Metaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendationsMetaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendations
 
Related searches at LinkedIn
Related searches at LinkedInRelated searches at LinkedIn
Related searches at LinkedIn
 
Structural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender SystemsStructural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender Systems
 
Organizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsOrganizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its Applications
 
Large-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and OpportunityLarge-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and Opportunity
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Social Network Analysis at LinkedIn

  • 1. Social Network Analysis at Linkedin Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn 1
  • 3. Outline About Me About LinkedIn Analytics products at LinkedIn Fun Facts: Sequencing the Startup DNA 3
  • 4. LinkedIn by the numbers 120,000,000+ users (August 4, 2011) 2+ new user registrations per second 81+ Million monthly unique users* (Comscore) 2 Billion People Searches in 2010 7.1 Billion page views in Q2 2010 2+ Million companies with LinkedIn Company Pages 2+ M LinkedIn Groups 4
  • 5. Broad Range of Products 5
  • 12. Outline About Me About LinkedIn Analytics products at LinkedIn Fun Facts: Sequencing the Startup DNA 12
  • 13. What do I mean by Analytics Products? 13
  • 14. People You May Know 14
  • 16. Viewers of this profile also ... 16
  • 19. Related Searches Millions of Searches everyday Help users to explore and refine their queries 19
  • 20. Analytics Products: Key Ideas Recommendations People You May Know, Related Searches, Viewers of this profile ... Insight Profile Stats: Who Viewed My Profile, Skills Visualization InMaps 20
  • 21. Analytics Products: Challenges LinkedIn: largest professional network 120+ million members on LinkedIn Billions of pageviews Terabytes of data to process 21
  • 22. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 22
  • 23. Connection Strength Let’s build “Connection Strength” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 23
  • 24. Connection Strength How well do people know each other? Connection Strength Application: reorder updates to show updates from close connections 24
  • 25. Connection Strength How well do Alice people know each other? Bob Carol 25
  • 26. Connection Strength How well do Alice people know each other? Bob Carol 26
  • 27. Connection Strength How well do Alice people know each other? Bob Carol Triangle closing 27
  • 28. Connection Strength How well do Alice people know each other? Bob Carol Triangle closing Prob(Bob knows Carol) ~ the # of common connections 28
  • 29. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 29
  • 30. Pig Overview Load: load data, specify format Store: store data, specify format Foreach, Generate: Projections, similar to select Group by: group by column(s) Join, Filter, Limit, Order, ... User Defined Functions (UDFs) 30
  • 31. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 31
  • 32. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 32
  • 33. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 33
  • 34. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 34
  • 35. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 35
  • 36. Triangle Closing Example Alice Bob Carol connections = LOAD `connections` USING 1.(A,B),(B,A),(A,C),(C,A) PigStorage(); 2.(A,{B,C}),(B,{A}),(C,{A}) 3.(A,{B,C}),(A,{C,B}) 4.(B,C,1), (C,B,1) 36
  • 37. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) group_conn = GROUP connections BY 2.(A,{B,C}),(B,{A}),(C,{A}) source_id; 3.(A,{B,C}),(A,{C,B}) 4.(B,C,1), (C,B,1) 37
  • 38. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) 2.(A,{B,C}),(B,{A}),(C,{A}) pairs = FOREACH group_conn GENERATE 3.(A,{B,C}),(A,{C,B}) generatePair(connections.dest_id) as (id1, id2); 4.(B,C,1), (C,B,1) 38
  • 39. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) 2.(A,{B,C}),(B,{A}),(C,{A}) common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn 3.(A,{B,C}),(A,{C,B}) GENERATE flatten(group) as (source_id, dest_id), 4.(B,C,1), (C,B,1) COUNT(pairs) as common_connections; 39
  • 41. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 41
  • 42. Systems and Tools Kafka (LinkedIn) Hadoop (Apache) Azkaban (LinkedIn) Voldemort (LinkedIn) 42
  • 44. Systems and Tools Kafka publish-subscribe messaging system transfer data from production to HDFS Hadoop Azkaban Voldemort 44
  • 45. Systems and Tools Kafka Hadoop Java MapReduce and Pig process data Azkaban Voldemort 45
  • 46. Systems and Tools Kafka Hadoop Azkaban Hadoop workflow management tool to manage hundreds of Hadoop jobs Voldemort 46
  • 47. Systems and Tools Kafka Hadoop Azkaban Voldemort Key-value store store output of Hadoop jobs and serve in production 47
  • 48. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 48
  • 49. Hadoop MapReduce Large-scale distributed data processing Map phase: partition the problem in smaller steps Reduce phase: aggregate the output of smaller steps Allows parallelism and fault-tolerance 49
  • 50. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 50
  • 51. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 51
  • 52. Our Workflow Triangle Age School Overlap Closing Union push-to-qa push-to-prod 52
  • 54. Azkaban Workflow Management Dependency management Regular Scheduling Monitoring Diverse jobs: Java, Pig, Clojure Configuration/Parameters Resource control/locking Restart/Stop/Retry Visualization History Logs 54
  • 55. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 55
  • 56. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 56
  • 57. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 57
  • 58. Voldemort Large amount of data/Scalable Quick lookup/low latency Versioning and Rollback Fault tolerance through replication Read only Offline index building 58
  • 60. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 60
  • 61. Related Searches Millions of Searches everyday Help users to explore and refine their queries 61
  • 62. How to build Related Searches? Collaborative Filtering: searches done in the same session Q1 Q2 Q3 Q4 Time 62
  • 63. How to build Related Searches? Searches correlated by results clicks Q1 R1 Qn Rm 63
  • 64. How to build Related Searches? Searches with overlapping terms Q1 Jeff LinkedIn Q2 LinkedIn CEO 64
  • 65. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 65
  • 66. Sequencing the Startup DNA Done by Monica Rogati at LinkedIn 66
  • 67. Sequencing the Startup DNA •Top majors: Entrepreneurship, Computer Engineering •Bottom: Nursing, Administration 67
  • 68. Sequencing the Startup DNA •Most founders age at first startup between 20 and 40 68
  • 69. Sequencing the Startup DNA •Top regions: San Francisco, NYC 69
  • 70. Sequencing the Startup DNA •Top Business Schools: Stanford, Harvard, MIT Sloan 70
  • 71. Things Covered Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 71
  • 72. SNA Team Thanks to SNA Team at LinkedIn http://sna-projects.com We are hiring! 72