SlideShare a Scribd company logo
1 of 37
LinkedIn Segmentation & Targeting
Platform: A Big Data Application
Hadoop Summit, June 2013
Hien Luu, Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
About Us
*
Hien Luu Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Our mission
Connect the world’s professionals to make
them more productive and successful
Over 200M members and counting
2 4 8
17
32
55
90
145
2004 2005 2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
*
>88%Fortune 100 Companies
use LinkedIn Talent Soln to hire
Company Pages
>2.9M
Professional searches in 2012
>5.7B
Languages
19
>30MFastest growing demographic:
Students and NCGs
The world’s largest professional network
Over 64% of members are now international
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
Other Company Facts
*
• Headquartered in Mountain View, Calif., with offices around the world!
• As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around
the world
Source :
http://press.linkedin.com/about
Agenda
 Company Overview
• Big Data @ LinkedIn
• The Segmentation & Targeting Problem
• Solution : LinkedIn Segmentation & Targeting Platform
• Q & A
Big Data @ LinkedIn
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn : Big Data Story
©2013 LinkedIn Corporation. All Rights Reserved.
Our Big Data Story depends on Infrastructure!
• On-line Data Infrastructure
• Near-line Data Infrastructure
• Offline Data Infrastructure
Oracle or
Espresso
Updates
Web
Serving
Teradata
Data Streams
Near-lineOn-line Off-line
Big Data Story : On-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
On-line Data Infrastructure
• Supports typical OLTP requirements
• Highly concurrent R/W access
• Transactional guarantees
• Back-up & Recovery
• Supports a central LinkedIn Data Principle!
• “All data everywhere”
• All OLTP databases need to provide a
time-line consistent change stream
• For this, we developed and open-
sourced Databus!
Oracle or
Espresso
Updates
Web
Serving
On-line
Big Data Story : On-line Data
Oracle or
Espresso Data Change Events
Search
Index
Graph
Index
Read
Replicas
Updates
Standar
dization
A user updates the company, title, & school on his profile. He also accepts a
connection
The write is made to an Oracle or Espresso Master and DataBus replicates it:
• the profile change is applied to the Standardization service
 E.g. the many forms of IBM were canonicalized for search-friendliness
• …. and to the Search Index
 Recruiters can find you immediately by new keywords
• the connection change is applied to the Graph Index service
 The user can now start receiving feed updates from his new connections
Big Data Story : On-line Data
Databus streams also update Hadoop!
Oracle or
Espresso
Search
Index
Graph
Index
Read
Replica
Updates
Standar
dization
Data Change Events
Big Data Story : Near-line & Off-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
2 Main Sources of Data @ LinkedIn
• User-provided data
• e.g. Member Profile data (e.g. employment, education history, endorsements)
• Tracking data via web site instrumentation
• e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares
Oracle or
Espresso
Updates
Databus
Web
Servers
Teradata
The
Segmentation & Targeting
Problem
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
Segmentation & Targeting Attribute types
Bhaskar Ghosh
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step 1 : Take some information about users
Member ID Join Date Country Responded to
Promotion X1
1 01/01/2013 FR F
2 01/02/2013 BE F
3 01/03/2013 FR F
4 02/01/2013 FR T
Step 2 : Provide some targeting criteria for a new promotion
Pick members where
• Join Date between('01/01/2013", '01/31/2013") and
• Country="FR" and
• Responded to Promotion X1="F"
 Members 1 & 3
Step 3 : Target them for a different email campaign (promotion_X2)
Attributes
Segment
Definition
Segment
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Problem Definition
• The business wants to launch new campaigns often
• The business wants to specify targeting criteria (segment
definitions) using an arbitrary set of attributes
• The attributes often need to be computed to fulfill the targeting
criteria
• This data resides on Hadoop or TD
• The business is most comfortable with SQL-like languages
Segmentation & Targeting Solution
©2013 LinkedIn Corporation. All Rights Reserved.
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Attribute
Serving
Engine
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Self-service
Support various
data sources
Attribute
consolidation
Attribute
availability
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute computation
~225M
PB
TB
TB
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute Portal Web Application
Attribute & Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute &
Definition
Metadata
TD Executor
Hive Executor
Pig Executor
REST
REST
REST
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
M/R
Stitcher
/path/dataset1
/path/dataset2
/path/dataset3
/path/dataset4
/path/lnkd_big_table
Data
Loader
Attribute consolidation & availability
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn big table, the most sought after data
Segmentation
Propensity
Model
Ad hoc analysis
LinkedIn big table
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Serving
Engine
Self-service
Attribute predicate
expression
Build
segments
Build lists
Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Serving Engine
$
count filter sum
complex
expressions
Σ1234
LinkedIn big table
~225M
~240
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Inverted
Index
Inverted
Index
Inverted
Index
M/R
Indexer
LinkedIn big table
Attribute &
Definition
Metadata
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Who are north American recruiters that
don’t work for a competitor?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are the job seekers?
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
JSON Predicate
Expression
JSON Lucene
Query Parser
Inverted
Index
Inverted
Index
Inverted
Index
Segment &
List
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Complex tree-like attribute predicate expressions
LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
A marketing campaign is represented by a list
Conclusion
©2013 LinkedIn Corporation. All Rights Reserved.
Move at business speed and scale at LinkedIn scale
 Segmentation & Targeting Platform
– Self-service
– Multiple data sources & massive data volume
– Support complex expression evaluation in seconds
– Attribute availability at business speed
Engineering Team
 Jessica Ho
 Swetha Karthik
 Raj Rangaswamy
 Tony Tong
 Ajinkya Harkare
 Hien Luu
 Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
Questions?
More info: data.linkedin.com
©2013 LinkedIn Corporation. All Rights Reserved.

More Related Content

Viewers also liked

Connecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInConnecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInAnmol Bhasin
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...Anmol Bhasin
 
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendixLinked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendixAndres Bang
 
Leadership in Uncertain Times - Hudson
Leadership in Uncertain Times - HudsonLeadership in Uncertain Times - Hudson
Leadership in Uncertain Times - HudsonHudsonAPAC
 
LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016Denis Curtin
 
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...LinkedIn Talent Solutions
 
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...LinkedIn Talent Solutions
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
The latest in LinkedIn talent pool reports  | Talent Connect AnaheimThe latest in LinkedIn talent pool reports  | Talent Connect Anaheim
The latest in LinkedIn talent pool reports | Talent Connect AnaheimLinkedIn Talent Solutions
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016Carl Steinbach
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a ProLive Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a ProLinkedIn
 
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - AmsterdamJorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - AmsterdamJorge Lascas
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionDeepak Agarwal
 
LinkedIn Communication Architecture
LinkedIn Communication ArchitectureLinkedIn Communication Architecture
LinkedIn Communication ArchitectureLinkedIn
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationAmy W. Tang
 
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInA Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInAmy W. Tang
 

Viewers also liked (20)

Connecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInConnecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedIn
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
 
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendixLinked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
 
Leadership in Uncertain Times - Hudson
Leadership in Uncertain Times - HudsonLeadership in Uncertain Times - Hudson
Leadership in Uncertain Times - Hudson
 
LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016
 
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
 
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
The latest in LinkedIn talent pool reports  | Talent Connect AnaheimThe latest in LinkedIn talent pool reports  | Talent Connect Anaheim
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a ProLive Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
 
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - AmsterdamJorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
 
LinkedIn Communication Architecture
LinkedIn Communication ArchitectureLinkedIn Communication Architecture
LinkedIn Communication Architecture
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedInA Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
 

Similar to LinkedIn Member Segmentation Platform: A Big Data Application

How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
Linked in for small businesses 2013
Linked in for small businesses 2013Linked in for small businesses 2013
Linked in for small businesses 2013Richard Masters
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)Jun Rao
 
#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraphVincent Biret
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation frameworkJoseph Adler
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analyticsSrinu Adira
 
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfUnveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfAqsaBatool21
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bhaskar Ghosh
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn mislam77
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
How Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfHow Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfAqsaBatool21
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryInside Analysis
 
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)vivekkaushik795
 
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...Vincent Biret
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationInside Analysis
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution Sirinporn Setworaya
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfAqsaBatool21
 

Similar to LinkedIn Member Segmentation Platform: A Big Data Application (20)

How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Ict careers
Ict careersIct careers
Ict careers
 
Linked in for small businesses 2013
Linked in for small businesses 2013Linked in for small businesses 2013
Linked in for small businesses 2013
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
 
#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph#SPSOttawa introduction to the #microsoftGraph
#SPSOttawa introduction to the #microsoftGraph
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation framework
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
Add-On Demo
Add-On DemoAdd-On Demo
Add-On Demo
 
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdfUnveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
Unveiling The Powerhouse LinkedIn Data Scraper By AhmadsoftwareCom.pdf
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
How Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdfHow Can I Extract Leads From LinkedIn Profiles.pdf
How Can I Extract Leads From LinkedIn Profiles.pdf
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
 
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
Synopsis_rt_v_k.pptx(fgfefefehgftgegfeh)
 
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
#SPSToronto The SharePoint Framework and the Microsoft Graph on steroids with...
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
 
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdfWhat Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
What Is The Best Tool To Scrape LinkedIn Businesses Data.pdf
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

LinkedIn Member Segmentation Platform: A Big Data Application

  • 1. LinkedIn Segmentation & Targeting Platform: A Big Data Application Hadoop Summit, June 2013 Hien Luu, Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 3. ©2013 LinkedIn Corporation. All Rights Reserved. Our mission Connect the world’s professionals to make them more productive and successful
  • 4. Over 200M members and counting 2 4 8 17 32 55 90 145 2004 2005 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) 200+ The world’s largest professional network Growing at more than 2 members/sec Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 5. * >88%Fortune 100 Companies use LinkedIn Talent Soln to hire Company Pages >2.9M Professional searches in 2012 >5.7B Languages 19 >30MFastest growing demographic: Students and NCGs The world’s largest professional network Over 64% of members are now international Source : http://press.linkedin.com/about ©2013 LinkedIn Corporation. All Rights Reserved.
  • 6. Other Company Facts * • Headquartered in Mountain View, Calif., with offices around the world! • As of June 1, 2013, LinkedIn has ~3,700 full-time employees located around the world Source : http://press.linkedin.com/about
  • 7. Agenda  Company Overview • Big Data @ LinkedIn • The Segmentation & Targeting Problem • Solution : LinkedIn Segmentation & Targeting Platform • Q & A
  • 8. Big Data @ LinkedIn ©2013 LinkedIn Corporation. All Rights Reserved.
  • 9. LinkedIn : Big Data Story ©2013 LinkedIn Corporation. All Rights Reserved. Our Big Data Story depends on Infrastructure! • On-line Data Infrastructure • Near-line Data Infrastructure • Offline Data Infrastructure Oracle or Espresso Updates Web Serving Teradata Data Streams Near-lineOn-line Off-line
  • 10. Big Data Story : On-line Data ©2013 LinkedIn Corporation. All Rights Reserved. On-line Data Infrastructure • Supports typical OLTP requirements • Highly concurrent R/W access • Transactional guarantees • Back-up & Recovery • Supports a central LinkedIn Data Principle! • “All data everywhere” • All OLTP databases need to provide a time-line consistent change stream • For this, we developed and open- sourced Databus! Oracle or Espresso Updates Web Serving On-line
  • 11. Big Data Story : On-line Data Oracle or Espresso Data Change Events Search Index Graph Index Read Replicas Updates Standar dization A user updates the company, title, & school on his profile. He also accepts a connection The write is made to an Oracle or Espresso Master and DataBus replicates it: • the profile change is applied to the Standardization service  E.g. the many forms of IBM were canonicalized for search-friendliness • …. and to the Search Index  Recruiters can find you immediately by new keywords • the connection change is applied to the Graph Index service  The user can now start receiving feed updates from his new connections
  • 12. Big Data Story : On-line Data Databus streams also update Hadoop! Oracle or Espresso Search Index Graph Index Read Replica Updates Standar dization Data Change Events
  • 13. Big Data Story : Near-line & Off-line Data ©2013 LinkedIn Corporation. All Rights Reserved. 2 Main Sources of Data @ LinkedIn • User-provided data • e.g. Member Profile data (e.g. employment, education history, endorsements) • Tracking data via web site instrumentation • e.g. pages viewed, email opened/sent, social gestures : posts/likes/shares Oracle or Espresso Updates Databus Web Servers Teradata
  • 14. The Segmentation & Targeting Problem ©2013 LinkedIn Corporation. All Rights Reserved.
  • 16. Segmentation & Targeting Attribute types Bhaskar Ghosh
  • 17. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2)
  • 18. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Step 1 : Take some information about users Member ID Join Date Country Responded to Promotion X1 1 01/01/2013 FR F 2 01/02/2013 BE F 3 01/03/2013 FR F 4 02/01/2013 FR T Step 2 : Provide some targeting criteria for a new promotion Pick members where • Join Date between('01/01/2013", '01/31/2013") and • Country="FR" and • Responded to Promotion X1="F"  Members 1 & 3 Step 3 : Target them for a different email campaign (promotion_X2) Attributes Segment Definition Segment
  • 19. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Problem Definition • The business wants to launch new campaigns often • The business wants to specify targeting criteria (segment definitions) using an arbitrary set of attributes • The attributes often need to be computed to fulfill the targeting criteria • This data resides on Hadoop or TD • The business is most comfortable with SQL-like languages
  • 20. Segmentation & Targeting Solution ©2013 LinkedIn Corporation. All Rights Reserved.
  • 21. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Attribute Serving Engine
  • 22. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Computation Engine Self-service Support various data sources Attribute consolidation Attribute availability
  • 23. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute computation ~225M PB TB TB ~240
  • 24. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Portal Web Application Attribute & Definition Metadata
  • 25. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Attribute & Definition Metadata TD Executor Hive Executor Pig Executor REST REST REST
  • 26. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. M/R Stitcher /path/dataset1 /path/dataset2 /path/dataset3 /path/dataset4 /path/lnkd_big_table Data Loader Attribute consolidation & availability
  • 27. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. LinkedIn big table, the most sought after data Segmentation Propensity Model Ad hoc analysis LinkedIn big table
  • 28. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Attribute Serving Engine Self-service Attribute predicate expression Build segments Build lists
  • 29. Segmentation & Targeting ©2013 LinkedIn Corporation. All Rights Reserved. Serving Engine $ count filter sum complex expressions Σ1234 LinkedIn big table ~225M ~240
  • 30. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Inverted Index Inverted Index Inverted Index M/R Indexer LinkedIn big table Attribute & Definition Metadata
  • 31. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Who are north American recruiters that don’t work for a competitor? Who are the LinkedIn Talent Solution prospects in Europe? Who are the job seekers?
  • 32. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. JSON Predicate Expression JSON Lucene Query Parser Inverted Index Inverted Index Inverted Index Segment & List
  • 33. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. Complex tree-like attribute predicate expressions
  • 34. LinkedIn Segmentation & Targeting Platform ©2013 LinkedIn Corporation. All Rights Reserved. A marketing campaign is represented by a list
  • 35. Conclusion ©2013 LinkedIn Corporation. All Rights Reserved. Move at business speed and scale at LinkedIn scale  Segmentation & Targeting Platform – Self-service – Multiple data sources & massive data volume – Support complex expression evaluation in seconds – Attribute availability at business speed
  • 36. Engineering Team  Jessica Ho  Swetha Karthik  Raj Rangaswamy  Tony Tong  Ajinkya Harkare  Hien Luu  Sid Anand ©2013 LinkedIn Corporation. All Rights Reserved.
  • 37. Questions? More info: data.linkedin.com ©2013 LinkedIn Corporation. All Rights Reserved.

Editor's Notes

  1. We’re making great strides toward our mission:LinkedIn has over 225 million members, and we’re now adding more than two members per second. This is the fastest rate of absolute member growth in the company’s history. Sixty-four percent of LinkedIn members are currently located outside of the United States.LinkedIn counts executives from all 2012 Fortune 500 companies as members; its corporate talent solutions are used by 88 of the Fortune 100 companies.More than 2.9 million companies have LinkedIn Company Pages.LinkedIn members did over 5.7 billion professionally-oriented searches on the platform in 2012.[See http://press.linkedin.com/about for a complete list of LinkedIn facts and stats]
  2. Email Campaign & Ad targetingAcquire new paid customersRetain and engage existing customersPromote new productsTraining and other important announcements* Talk about the speed of changing segmentation and targeting criteria
  3. Professional identitySocial dataBehavioral
  4. Given the business problem that Sid outlined, the solution we came up with has two partsThe first part is about compute attributes based on the attribute definitionThe second part is about serving the attribute values to define segments, effectively performing user segmentation
  5. The attribute computation engine needs to support these 4 high level requirementsSelf-service meaning thatThere needs to be an easy way for someone on the business team to express the computational logic to compute a set of attributes for the needs of their marketing campaignsThis engine takes care of the complexity in executing the computational logic in terms of when, how as well as where to store the computation resultSupport various data sourcesData are in multiple places – TD and Hadoop. We need support thatFortunately SQL and HiveSQL are very similarAttribute consolidationOnce all the attributes are computed, they needed to be consolidated into a single dataset to make it easy everyone to consume and analyzeData availabilityRegister with Hive and copy the data onto TD system for business folks to consume
  6. At the high level, the attribute computation engine needs to be able compute attributes that come from different data sets, and some of these data sets are huge.The output of the computation engine is this big table – 225M roows, one for each member, ~240 columns, one for each attributesBehavioral Data Site Engagement,OL Transactions,Searches,Comments,Discussions….Social DataConnections,Follows,EndorsementsDemographic DataThis data comes from member profileLocation,Gender,Title,Function,Seniority,Education
  7. Self-service way to manage attributesA web application where a member of marketing operations or business analyst team can use to express the computation logic in the form SQL select statement. And we call that attribute definition.The SQL statement is either a Teradata SQL statement or Hive QL statementThe web application validates the SQL statements to make sure they are valid and plus we need to extract the attribute name and their types, which will be useful for various purposeThe metadata about the attribute definitions and attributes are captured in a MySQL database. For HIVE QL queries - we support Hive hints as well general tuning parameters like split sizeOnce an attribute definition passes the validation step, it will go through an approval process, which is designed toMake sure there is no attribute duplicatesMake sure the query properly tunedOne of the benefits of this attribute portal is the centralization attribute definitions and make it easy to discovery attributes, the logic behind these attributes and data sourcewhen someone starts working on a marketing campaign, they first identify the targeting criteria based on the goals of the campaignfrom the set of targeting criteria, they identify what are the needed member attributes
  8. Attribute computing workhorseThese executors are scheduled to run on a regular basisThey contact the attribute definition metadata repository to retrieve what attribute definitions to executeThey execute the query in parallel using APIsTD executorExecute using JDBC and store result in temporary tablesWe are using an in house library called LASSEN, which is an M/R library that leverages the power of MapReduce framework to quickly and efficiently download the data to HDFS. Hive executorProgrammatically execute these Hive queriesOne of the classes in Hive is not thread safe, therefore we can’t execute Hive QLs in parallel using multiple threads, so we use multiple Hive executors to approach insteadPig executorExecute pig script filesHas the ability to rerun only the failed scriptsInteresting runtime detailsWe have all kinds of queries, simple one and complex ones. The complex ones that may take hours to complete. However we don’t want a query that takes 5 or 6 hours. That would delay the attribute computing phase for all the queries. Our system has a built in mechanism to kill a long running query that exceeds certain amount of timeWhat about failed queries – even though we validate them at the attribute def. submission time, some of them will fail at runtime due to various reason. Our system is built to be resilient against these failed queries. Only the attributes of the failed queries will not be available. Our system collects accounting information about each of the queries – so we know how many queries were successfully completed, how many failed and how long each takes.The output of each attribute definition is stored in a separated folder. So if we have 50 attribute definitions, the attribute values are scattered across 50 folders
  9. Once the executors are completed executing and materializing the attributesThe job of the stitcher is to combine all these attributes together into a single data set, which I call LinkedIn big tableIt is an MapReduce job and it acts as a gateway to perform some validations like member id must not be less than 0 or certain values can’t be longer than certain lengthThe output of sticher is a single data set in Avro format that contains one record for every single LinkedIn memberThis output is also registered in Hive for data scientists to consumeTo make the linkedIn big table available for business analysts to generate more insights and further analysis, this same date set is copied onto TD via Data Loader componentThe processing executing these attribute definitions or select statements, stitching the attributes together into s single dataset and load the data onto TD takes about 5 to 6 hours.Not all attributes need to be refreshed daily, so we have a concept partial refresh and full refreshPartial refresh – only a subset of needed attribute definitions are executed and it takes much less time – 2-3 hours vs 5 to 6 hrs
  10. Linkedin big table – 200GBThe LinkedIn big table is used for multiple purposesPropensity modelRanking model, where each member is assigned a certain score to indicate how likely a member belongs to certain class of member or likely to take an action.i.e job seeker, or how likely someone will upgrade to paid subscription.Business analysts and data scientistsFor their own analysis The most sought after dataA very rich data set that contains all kinds of interesting attributes about our membersBecause of the heavy lifting has been done and data is available in a single placeOthers don’t to have hunt down what data sets
  11. Self-service – web application for business analysts and marketing team to useSomeone who is not familiar with SQLUI that support drag and dropAttribute predicate expression is basically a boolean expression that is evaluated to true or false by comparing an attribute value to an expected valueFor example, whether the country is United States or whether a member has more than 30 connectionsIn order to build segments – we need a way for expressing attribute predicates i.e. country in canada or in united statesSave this expression and evaluate it at a later pointBuilding segmentCombining various attribute predicates into a segmentBuild listsCombining segments together to target a certain set of member population for a marketing campagin
  12. Based on the requirements I talked about in the previous slide, the serving engine needs to support the following features/operationsCount – how many members meet certain criteriaFilter members that meet certain criteriaSum – each member is assigned a life time value for a particular product, so we need the ability compute the total dollar amount of a segment based on how many members meet the defined criteriaComplex nested expression with support for conjunction (and) and disjunction (or)The core problem that the serving engine needs to solve is to support arbitrary predicate expression against any of the attributes and return the result in a reasonable amount of time. We basically think this is an information retrieval problem, so we leverage Lucene to help us with this problemTo support those arbitrary predicate expressions, we found Lucene to be pretty good at this kind of problem.
  13. Map reduce applicationConsume data in Avro format and create Lucece indexesUsing custom writable to wrap a Lucene documentEach Lucence document contains all the 240+ attributes for each memberUse custom OutputFormat to build Lucene index segmentStore on local disk of reducer taskCopy onto HDFS at the end of the reduce taskLinkedIn big table – 200GBIndex – 175GB* # of map and reduce task
  14. First one requires only one attributes – job seeker statusSecond requires two attributesTalent solution prospectsCountry where they work inFirst one would need 3 attributesWhether a member is a recruiterThe country that member works inWhether the company they work is considered a competitor of LinkedIn
  15. JSON Predicate Expression – use JSON to define the format of the predicate expression. JSON is well suited for this purpose and it supports nested data structure, fairly flexible, easy to parseSupports different data typesFor each data types, certain operators are supported.An JSON predicate expression consists of an attribute name, data type, operator, and one or more valuesThe JSON predication expression is the contract between the browser and serverStoring the predicate expression in mysql and evaluate it at run time
  16. Web applicationHas a UI for defining segments and listsSegment builderDrag arbitrary attributes and build predicate expressionsWith a click of a button, marketing team can get a sense of how many members meet the defined criteria define in the segmentThis will allow them a chance to change the criteria to increase the count for decrease the countSegments are meant as building blocks
  17. Segments are building blocks and certain reusable Each marketing campaign is represented by a list, which is a collection of segments, each segment can be one of the two types.Inclusions – include members that meet the defined criteria of each of the selected segmentsNet count and raw countExclusions – exclude those members
  18. One of things we are working on is to improve the turn around time for attributes – from the time an attribute is defined to the time it is available for building segments
  19. * Give a shout out for engineering team that work on this platform