SlideShare a Scribd company logo
1 of 37
Download to read offline
The Evolution of Metadata:
LinkedIn’s Journey
Sept 25, 2019
Shirshanka Das, Principal Staff Engineer, LinkedIn
Mars Lan, Staff Engineer, LinkedIn
@shirshanka
LinkedIn’s Data Ecosystem
What is Metadata?
What datasets do we have in our data warehouse (Hadoop, Teradata)
How do we easily find them
What does their schema look like
What datasets derive from these datasets
"Confused" by Sara Beyer (CC BY-NC-ND 2.0)
? ?
Attempt 1: Crawl Phase
Attempt 1: Crawl Phase
Crawl all catalogs you can
Parse all the logs you can
ETL into an opinionated data model
Build a search index + a lookup store
Build an app to serve this info
This is a useful product!
We gave it a clever name: WhereHows
And we open sourced it in 2016
Attempt 1: A few things we observed
Pull-based integrations
Central team’s burden
Pipeline fragility, freshness
Rigidity of model
Hard to iterate quickly
"Grimaces" by Fouquier (CC BY-NC-ND 2.0)
What is Metadata?
What datasets do we have in our entire data
ecosystem?
Espresso, Kafka, Hadoop, Teradata, Search, Pinot …
20+ data systems
What does their schema look like?
What datasets derive from these datasets?
What business types are contained in these schemas?
Who owns these datasets?
Where are datasets being copied?
…
"[Katsiaryna Lenets] © 123RF.com"
Attempt 2: Walk Phase
Attempt 2: Walk Phase
Separate REST-ful service to support this diversity
Dataset Naming
Generic Schema model for all kinds of data stores
and formats
Scalable integration patterns
Pub-Sub using Kafka
Support REST + Kafka ingest route
We called it TMS: THE Metadata Store :)
We couldn’t open source it, too coupled with our internal business concepts :(
New extensions to the model
Business metadata
Ownership
Attempt 2: A few things we observed
The Good
Teams were now accountable for the quality of their metadata and their custom ETL
Aggregate base metadata from all data platforms, overlay new metadata “aspects” on top easily
The Not So Good
New kind of metadata needed —> get in line behind central TMS team
Source of truth versus “Reflection of truth” debates —> no one wants to take a dependency
Standardized Event as interface: requires data model adapters everywhere, low incentive for producer to
excel at their job
What is Metadata?
"Thinking" by Elvin (CC BY-NC 2.0)
What is Metadata?
"Thinking" by Elvin (CC BY-NC 2.0)
What is Metadata?
"Thinking" by Elvin (CC BY-NC 2.0)
Models,
Features, …
Some observations about the problem
There is value in local sub-graphs, but
the real value lies in the global model
Cannot execute this with a single monolith
data model + service
Different teams care about different sub-
graphs
Central team bottleneck
Micro-services can help?
Back to silo-ed metadata problem
Attempt 3: Run Together
Distributed but collaborative authorship of model
Single ORM-like layer to auto-generate integrations for
CRUD operations
Search queries
Graph traversal
Distributed deployment and ownership of services possible
We’re calling this, the Generalized Metadata Architecture (GMA)
An Example Metadata Graph
DATASET
urn
platform
name
fabric
USER
Urn
firstName
lastName
PROFILE
{
firstName: John
lastName: Doe
Ldap: jdoe
}
GROUP
urn
name
size
OWNERSHIP
{
owners: [{
type: SRE,
user: jdoe
}, …]
}
MEMBERSHIP
{
admin: jdoe,
members: [{
jdoe,
…]
}
Owned By
Has Admin
Has Member
1N
1
1
1
NAspects
Relationships
Entity Anchors
…
SCHEMA
{
fields: [{
type: integer,
name: id
}, …]
}
Metadata Serving
API Endpoints
CRUD DAO
Search DAO
Graph DAO
Graph DB
Search Index
Document Store
Metadata Service
Metadata Indexing
Metadata Service
API Endpoints
CRUD DAO
Search Index
Graph DB
Search
processor
Graph
processor
Metadata
Audit
Event
Putting it all together
GIT Web App Service
CRUD Event
Event Processor
User, Groups
Service Dataset Service
SoT DB SoT DB
Change Event
Event Processor
Change
Event
Hadoop
Data Lake
Analytics
+
Relevance
Search
Design Decision Cheat-Sheet
Generic versus Specific Types:
Support strong-types layered over generic storage, model-first development
Expose strongly-typed REST API + Graph APIs for metadata traversal
Integration Strategy: Crawling versus Pub-Sub versus REST-ful API:
Prefer unified “Pub-Sub + REST-ful” API, build crawlers that publish
Single Source of Truth versus Replicated versus Federated:
For new metadata systems, integrate as ORM layer
For existing metadata systems, use Kafka to adapt
Federation strategy only supports lookup cases, hard to support search and graph well
What about X?
A very brief survey of other similar systems in this space
Hive Metastore: limited to Hadoop Dataset, focused on query planning
Atlas: generic model, missing strongly-typed API, dynamic types supported
Marquez: strongly opinionated model, focused on data pipeline metadata
Ground: generic model, missing strongly-typed API, research prototype
Amundsen: Web-app backed by Hive Metastore, strongly opinionated model
M
onolithic
Services
Metadata Platform by the numbers
Datasets : ~ 5M
Dashboards: ~2K
Features: ~3K
Metrics: ~30K
Schemas: ~100K
People: 10K+
CRUD Events: ~2M / day
Relationship Events: ~3M / day
Powered by Metadata
Metadata Platform
???
A Web App: Data Hub
Search and Discover Data Constructs
Find relationships, lineage, data quality, …
Enrich metadata
Some interesting parallels to the metadata platform in terms of extensibility
How do we add new pages to this app in a multi-team environment?
Data Hub: Search
Data Hub: Browse
Data Hub: a Dataset’s page
Data Hub: Lineage
Data Hub: a Metric’s page
Data Management with Compliance
Metadata Platform
Data Access Layer
Data
Management
(Purge, Export,
…)
Std Frameworks
Operations
Application Code
Physical Data
Data Pipeline Operations
Powered by Metadata
Metadata Platform
Search and
Discovery
beyond just
Datasets
Data Management,
Access
with Compliance
Operational
Monitoring,
Incremental Compute
AI : Model, Feature
reproducibility,
explainability
… and we’re just
getting started
It’s open source!
Open Source : the details!
Alpha release out currently at https://lnkd.in/datahub-alpha
Check it out at: Github project wherehows, branch: datahub (https://lnkd.in/datahub-github)
Capabilities:
Generic Modeling Layer with CRUD on MySQL
Search on Elastic
Graph on Neo4j*
Interfaces:
The DataHub Web App
REST-ful service from model files
Event Processors for metadata events
Metadata Model:
Datasets
People
Integrations
Data Catalogs (Hive, Kafka, JDBC*)
People (LDAP)
* coming real soon
Open sourcing the Data Model
LinkedIn Internal Model Open Source Model
Open Source : what’s coming next!
More Entities: [Jobs, Flows]
More Aspects within existing entities: [e.g. ReplicationPolicy, HiveSpecification]
More Integrations: [Calcite-compatible systems for fine-grain lineage]
App Features
User pages
Social interactions
Get engaged and
contribute to the global model
Build integrations with systems
Thank You!

More Related Content

What's hot

LinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformLinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting Platform
Hien Luu
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
Anna Shymchenko
 

What's hot (20)

Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
LinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting PlatformLinkedIn Segmentation & Targeting Platform
LinkedIn Segmentation & Targeting Platform
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph Presentation
 
DataHub
DataHubDataHub
DataHub
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 

Similar to The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]

Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
kalai75
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 

Similar to The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019] (20)

Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Big Data
Big DataBig Data
Big Data
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 

More from Shirshanka Das

Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Shirshanka Das
 

More from Shirshanka Das (6)

Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 

Recently uploaded (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]

  • 1. The Evolution of Metadata: LinkedIn’s Journey Sept 25, 2019 Shirshanka Das, Principal Staff Engineer, LinkedIn Mars Lan, Staff Engineer, LinkedIn @shirshanka
  • 3. What is Metadata? What datasets do we have in our data warehouse (Hadoop, Teradata) How do we easily find them What does their schema look like What datasets derive from these datasets "Confused" by Sara Beyer (CC BY-NC-ND 2.0) ? ?
  • 5. Attempt 1: Crawl Phase Crawl all catalogs you can Parse all the logs you can ETL into an opinionated data model Build a search index + a lookup store Build an app to serve this info This is a useful product! We gave it a clever name: WhereHows And we open sourced it in 2016
  • 6. Attempt 1: A few things we observed Pull-based integrations Central team’s burden Pipeline fragility, freshness Rigidity of model Hard to iterate quickly "Grimaces" by Fouquier (CC BY-NC-ND 2.0)
  • 7. What is Metadata? What datasets do we have in our entire data ecosystem? Espresso, Kafka, Hadoop, Teradata, Search, Pinot … 20+ data systems What does their schema look like? What datasets derive from these datasets? What business types are contained in these schemas? Who owns these datasets? Where are datasets being copied? … "[Katsiaryna Lenets] © 123RF.com"
  • 9. Attempt 2: Walk Phase Separate REST-ful service to support this diversity Dataset Naming Generic Schema model for all kinds of data stores and formats Scalable integration patterns Pub-Sub using Kafka Support REST + Kafka ingest route We called it TMS: THE Metadata Store :) We couldn’t open source it, too coupled with our internal business concepts :( New extensions to the model Business metadata Ownership
  • 10. Attempt 2: A few things we observed The Good Teams were now accountable for the quality of their metadata and their custom ETL Aggregate base metadata from all data platforms, overlay new metadata “aspects” on top easily The Not So Good New kind of metadata needed —> get in line behind central TMS team Source of truth versus “Reflection of truth” debates —> no one wants to take a dependency Standardized Event as interface: requires data model adapters everywhere, low incentive for producer to excel at their job
  • 11. What is Metadata? "Thinking" by Elvin (CC BY-NC 2.0)
  • 12. What is Metadata? "Thinking" by Elvin (CC BY-NC 2.0)
  • 13. What is Metadata? "Thinking" by Elvin (CC BY-NC 2.0) Models, Features, …
  • 14. Some observations about the problem There is value in local sub-graphs, but the real value lies in the global model Cannot execute this with a single monolith data model + service Different teams care about different sub- graphs Central team bottleneck Micro-services can help? Back to silo-ed metadata problem
  • 15. Attempt 3: Run Together Distributed but collaborative authorship of model Single ORM-like layer to auto-generate integrations for CRUD operations Search queries Graph traversal Distributed deployment and ownership of services possible We’re calling this, the Generalized Metadata Architecture (GMA)
  • 16. An Example Metadata Graph DATASET urn platform name fabric USER Urn firstName lastName PROFILE { firstName: John lastName: Doe Ldap: jdoe } GROUP urn name size OWNERSHIP { owners: [{ type: SRE, user: jdoe }, …] } MEMBERSHIP { admin: jdoe, members: [{ jdoe, …] } Owned By Has Admin Has Member 1N 1 1 1 NAspects Relationships Entity Anchors … SCHEMA { fields: [{ type: integer, name: id }, …] }
  • 17. Metadata Serving API Endpoints CRUD DAO Search DAO Graph DAO Graph DB Search Index Document Store Metadata Service
  • 18. Metadata Indexing Metadata Service API Endpoints CRUD DAO Search Index Graph DB Search processor Graph processor Metadata Audit Event
  • 19. Putting it all together GIT Web App Service CRUD Event Event Processor User, Groups Service Dataset Service SoT DB SoT DB Change Event Event Processor Change Event Hadoop Data Lake Analytics + Relevance Search
  • 20. Design Decision Cheat-Sheet Generic versus Specific Types: Support strong-types layered over generic storage, model-first development Expose strongly-typed REST API + Graph APIs for metadata traversal Integration Strategy: Crawling versus Pub-Sub versus REST-ful API: Prefer unified “Pub-Sub + REST-ful” API, build crawlers that publish Single Source of Truth versus Replicated versus Federated: For new metadata systems, integrate as ORM layer For existing metadata systems, use Kafka to adapt Federation strategy only supports lookup cases, hard to support search and graph well
  • 21. What about X? A very brief survey of other similar systems in this space Hive Metastore: limited to Hadoop Dataset, focused on query planning Atlas: generic model, missing strongly-typed API, dynamic types supported Marquez: strongly opinionated model, focused on data pipeline metadata Ground: generic model, missing strongly-typed API, research prototype Amundsen: Web-app backed by Hive Metastore, strongly opinionated model M onolithic Services
  • 22. Metadata Platform by the numbers Datasets : ~ 5M Dashboards: ~2K Features: ~3K Metrics: ~30K Schemas: ~100K People: 10K+ CRUD Events: ~2M / day Relationship Events: ~3M / day
  • 24. A Web App: Data Hub Search and Discover Data Constructs Find relationships, lineage, data quality, … Enrich metadata Some interesting parallels to the metadata platform in terms of extensibility How do we add new pages to this app in a multi-team environment?
  • 27. Data Hub: a Dataset’s page
  • 29. Data Hub: a Metric’s page
  • 30. Data Management with Compliance Metadata Platform Data Access Layer Data Management (Purge, Export, …) Std Frameworks Operations Application Code Physical Data
  • 32. Powered by Metadata Metadata Platform Search and Discovery beyond just Datasets Data Management, Access with Compliance Operational Monitoring, Incremental Compute AI : Model, Feature reproducibility, explainability … and we’re just getting started
  • 34. Open Source : the details! Alpha release out currently at https://lnkd.in/datahub-alpha Check it out at: Github project wherehows, branch: datahub (https://lnkd.in/datahub-github) Capabilities: Generic Modeling Layer with CRUD on MySQL Search on Elastic Graph on Neo4j* Interfaces: The DataHub Web App REST-ful service from model files Event Processors for metadata events Metadata Model: Datasets People Integrations Data Catalogs (Hive, Kafka, JDBC*) People (LDAP) * coming real soon
  • 35. Open sourcing the Data Model LinkedIn Internal Model Open Source Model
  • 36. Open Source : what’s coming next! More Entities: [Jobs, Flows] More Aspects within existing entities: [e.g. ReplicationPolicy, HiveSpecification] More Integrations: [Calcite-compatible systems for fine-grain lineage] App Features User pages Social interactions Get engaged and contribute to the global model Build integrations with systems