Data Mesh @ Yelp - 2019

•

2 likes•799 views

Yelp has operated our connector ecosystem to feed vital data to domain-specific teams and data stores. We share some of our learning and experiences on operating such system. We will touch on what is the next phase of the system evolution.

Engineering

Yelp’s Mission
Connecting
people with great
local businesses

Who am I?
My name is
Steven, my
preferred
pronoun is “he”
I graduated from UC Berkeley EECS in 2005
This is my second term in Yelp (2017 - now)
Last term is 2011 - 2015
I consider myself a generalist in the ﬁeld

Who am I?
I work in team
metrics-data
within
metrics-platform

Data powers
decision making
OnLine Transaction Processing (OLTP)
We use MySQL to power yelp.com
Each transaction interacts with small amount of
data
Display reviews, photos, tips of a business
OLTP queries’ results are expected to return quickly
No one wants to wait for more than 2 seconds for a
business page to load

OLTP example:
ﬁnd the titles an
author has
written. Take
advantage of an
index
https://en.wikipedia.org/wiki/Library_catalog#/media/File:Schlagwortkatalog.jpg

Data powers
decision making
Developers want to ﬁnd out what local business has
the most reviews
Table scan on the review table?
OnLine Analytical Processing (OLAP)
Queries that scan majority of data relative to total
amount of data
Need specialized system to support such queries
Yelp uses AWS Redshift as a data warehouse to
support OLAP queries.

OLAP example:
average number
of pages in a
book stored
inside main
stack. Need to
scan all the titles.
https://www.dailycal.org/2013/12/08/best-worst-foods-sneak-main-stacks/

Data Fabric We want to avoid n * m programs to transport data
n is the number of source, and m is the number of sink
Domain speciﬁc data stores are here to stay
Stonebraker, “One Size Fits All”: An Idea Whose Time
Has Come and Gone”
Stream-Table Duality
We can formulate the transport of data as streams

https://docs.confluent.io/current/streams/concepts.html

Image source: https://images-na.ssl-images-amazon.com/images/I/71UfEHhZ2uL._SL1000_.jpg

Beneﬁts
Connector
Ecosystem
Lower the barrier of entry
It’s easy to move data between data stores
High performance implementation
Each data store has its own performance
characteristics.
Streams-processing over batch processing
Near real-time data availability

Image source: https://images-na.ssl-images-amazon.com/images/I/71GmEqny4NL._SL1000_.jpg

Lesson Learned
Connector
Ecosystem
Schematized data is good
Lessen the likelihood of malformed data
Schema evolution can be diﬃcult
Making incompatible schema change can break many
things. Discourage them in registration phase.
Decouple data producers and data consumers
We need automation to inform data producers how to
manage data life cycle as producers do not think about
who uses the data.

Image source: https://i.ytimg.com/vi/03y8DJrzzjA/maxresdefault.jpg

Desirable
Improvements
Data Producers should own their data life cycle
Speciﬁc connector owner does not have visibility of
data semantics.
Data Consumers are stakeholders
Consumers don’t want to out incompatible changes
after its been rolled out.
Self-serve mechanism accelerates changes
The only way to rapidly evolves is to self-serve

Data Mesh Data speciﬁcations are like microservices APIs
They are contracts between producers and consumers
Each team owns their data speciﬁcations
To avoid accidentally abstraction leakage
Decentralization allows rapid experiments
Common conventions are promoted to minimize
frictions among diﬀerent domain systems

https://martinfowler.com/articles/data-monolith-to-mesh.html

yelp.com/dataset_challenge
Academic
dataset from 10
cities across the
globe!
Your academic project, research or visualizations
submitted by December 31, 2019
=
a $5,000 prize* !
*See full terms on website
6M reviews
1M business attributes
190K businesses
200K photos

What's hot

Modern Data architecture DesignKujambu Murugesan

VerticaAndrey Sidelev

Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker

Data meshManojKumarR41

Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra

Data MeshPiethein Strengholt

Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock

Time to Talk about Data MeshLibbySchulze

Data Lakehouse Symposium | Day 1 | Part 2Databricks

Modernizing to a Cloud Data ArchitectureDatabricks

Managed Feature Store for Machine LearningLogical Clocks

A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks

Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra

Introduction SQL Analytics on Lakehouse ArchitectureDatabricks

Webinar Data Mesh - Part 3Jeffrey T. Pollock

Data Platform Architecture Principles and Evaluation CriteriaScyllaDB

DW Migration Webinar-March 2022.pptxDatabricks

Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY

Introducing MLflow for End-to-End Machine Learning on DatabricksDatabricks

Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks

What's hot (20)

Modern Data architecture Design

Vertica

Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021

Data mesh

Data Lakehouse, Data Mesh, and Data Fabric (r1)

Data Mesh

Data Mesh Part 4 Monolith to Mesh

Time to Talk about Data Mesh

Data Lakehouse Symposium | Day 1 | Part 2

Modernizing to a Cloud Data Architecture

Managed Feature Store for Machine Learning

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Modern Data Warehousing with the Microsoft Analytics Platform System

Introduction SQL Analytics on Lakehouse Architecture

Webinar Data Mesh - Part 3

Data Platform Architecture Principles and Evaluation Criteria

DW Migration Webinar-March 2022.pptx

Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines

Introducing MLflow for End-to-End Machine Learning on Databricks

Building Lakehouses on Delta Lake with SQL Analytics Primer

Similar to Data Mesh @ Yelp - 2019

BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week

TSE_Pres12.pptxssuseracaaae2

An Overview of VIEWShiyong Lu

Data warehousing and business intelligence project reportsonalighai

Real World End to End machine Learning PipelineSrivatsan Srinivasan

The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis

Introduction to Semantic Web for GIS PractitionersEmanuele Della Valle

The NoSQL MovementRalucaGheorghita

Data Mesh using Microsoft FabricNathan Bijnens

Fbdl enabling comprehensive_data_servicesCindy Irby

Big data journey to the cloud maz chaudhri 5.30.18Cloudera, Inc.

Why Data Virtualization? An Introduction by DenodoJusto Hidalgo

markfinleyResumeMarch2016Mark Finley

NoSQL, What it is and how our projects can benefit from itHeather Campbell

The future of scaling forrester research - GigaSpaces Road Show 2011Nati Shalom

Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas

AWS Initiate Day Dublin 2019 – Big Data Meets AIAmazon Web Services

President Election of Korea in 2017Jongwook Woo

The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S

oracle-adw-melts snowflake-report.pdfssuserf8f9b2

Similar to Data Mesh @ Yelp - 2019 (20)

BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...

TSE_Pres12.pptx

An Overview of VIEW

Data warehousing and business intelligence project report

Real World End to End machine Learning Pipeline

The Right Data Warehouse: Automation Now, Business Value Thereafter

Introduction to Semantic Web for GIS Practitioners

The NoSQL Movement

Data Mesh using Microsoft Fabric

Fbdl enabling comprehensive_data_services

Big data journey to the cloud maz chaudhri 5.30.18

Why Data Virtualization? An Introduction by Denodo

markfinleyResumeMarch2016

NoSQL, What it is and how our projects can benefit from it

The future of scaling forrester research - GigaSpaces Road Show 2011

Coping with Data Variety in the Big Data Era: The Semantic Computing Approach

AWS Initiate Day Dublin 2019 – Big Data Meets AI

President Election of Korea in 2017

The Evolving Role of the Data Engineer - Whitepaper | Qubole

oracle-adw-melts snowflake-report.pdf

Recently uploaded

Risk Management in Engineering Construction ProjectErbil Polytechnic University

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani

Cooling Tower SERD pH drop issue (11 April 2024) .pptxmamansuratman0253

chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMMNanaAgyeman13

Research Methodology for Engineering pdfCaalaaAbdulkerim

Input Output Management in Operating SystemRashmi Bhat

Gravity concentration_MI20612MI_________Romil Mishra

Work Experience-Dalton Park.pptxfvvvvvvvLewisJB

Correctly Loading Incremental Data at ScaleAlluxio, Inc.

Earthing details of Electrical Substationstephanwindworld

Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303

System Simulation and Modelling with types and Event SchedulingBootNeck1

POWER SYSTEMS-1 Complete notes examplesDr. Gudipudi Nageswara Rao

National Level Hackathon Participation Certificate.pdfRajuKanojiya4

Past, Present and Future of Generative AIabhishek36461

Engineering Drawing section of solidnamansinghjarodiya

11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad

Indian Dairy Industry Present Status and.pptMadan Karki

Main Memory Management in Operating SystemRashmi Bhat

Designing pile caps according to ACI 318-19.pptxErbil Polytechnic University

Recently uploaded (20)

Risk Management in Engineering Construction Project

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf

Cooling Tower SERD pH drop issue (11 April 2024) .pptx

chpater16.pptxMMMMMMMMMMMMMMMMMMMMMMMMMMM

Research Methodology for Engineering pdf

Input Output Management in Operating System

Gravity concentration_MI20612MI_________

Work Experience-Dalton Park.pptxfvvvvvvv

Correctly Loading Incremental Data at Scale

Earthing details of Electrical Substation

Energy Awareness training ppt for manufacturing process.pptx

System Simulation and Modelling with types and Event Scheduling

POWER SYSTEMS-1 Complete notes examples

National Level Hackathon Participation Certificate.pdf

Past, Present and Future of Generative AI

Engineering Drawing section of solid

11. Properties of Liquid Fuels in Energy Engineering.pdf

Indian Dairy Industry Present Status and.ppt

Main Memory Management in Operating System

Designing pile caps according to ACI 318-19.pptx

Data Mesh @ Yelp - 2019

1. Data Mesh @ Yelp Sep 12, 2018

2. Yelp’s Mission Connecting people with great local businesses

3. Who am I? My name is Steven, my preferred pronoun is “he” I graduated from UC Berkeley EECS in 2005 This is my second term in Yelp (2017 - now) Last term is 2011 - 2015 I consider myself a generalist in the ﬁeld

4. Who am I? I work in team metrics-data within metrics-platform

5. Who am I? I work in team metrics-data within metrics-platform

6. Data powers decision making OnLine Transaction Processing (OLTP) We use MySQL to power yelp.com Each transaction interacts with small amount of data Display reviews, photos, tips of a business OLTP queries’ results are expected to return quickly No one wants to wait for more than 2 seconds for a business page to load

7. OLTP example: ﬁnd the titles an author has written. Take advantage of an index https://en.wikipedia.org/wiki/Library_catalog#/media/File:Schlagwortkatalog.jpg

8. Data powers decision making Developers want to ﬁnd out what local business has the most reviews Table scan on the review table? OnLine Analytical Processing (OLAP) Queries that scan majority of data relative to total amount of data Need specialized system to support such queries Yelp uses AWS Redshift as a data warehouse to support OLAP queries.

9. OLAP example: average number of pages in a book stored inside main stack. Need to scan all the titles. https://www.dailycal.org/2013/12/08/best-worst-foods-sneak-main-stacks/

10. More throughput Lower Latency

11. More throughput Lower Latency

12. Data Fabric We want to avoid n * m programs to transport data n is the number of source, and m is the number of sink Domain speciﬁc data stores are here to stay Stonebraker, “One Size Fits All”: An Idea Whose Time Has Come and Gone” Stream-Table Duality We can formulate the transport of data as streams

13. https://docs.confluent.io/current/streams/concepts.html

14. https://docs.confluent.io/current/streams/concepts.html

15.

16. Image source: https://images-na.ssl-images-amazon.com/images/I/71UfEHhZ2uL._SL1000_.jpg

17. Beneﬁts Connector Ecosystem Lower the barrier of entry It’s easy to move data between data stores High performance implementation Each data store has its own performance characteristics. Streams-processing over batch processing Near real-time data availability

18. Image source: https://images-na.ssl-images-amazon.com/images/I/71GmEqny4NL._SL1000_.jpg

19. Lesson Learned Connector Ecosystem Schematized data is good Lessen the likelihood of malformed data Schema evolution can be diﬃcult Making incompatible schema change can break many things. Discourage them in registration phase. Decouple data producers and data consumers We need automation to inform data producers how to manage data life cycle as producers do not think about who uses the data.

20. Image source: https://i.ytimg.com/vi/03y8DJrzzjA/maxresdefault.jpg

21. Desirable Improvements Data Producers should own their data life cycle Speciﬁc connector owner does not have visibility of data semantics. Data Consumers are stakeholders Consumers don’t want to out incompatible changes after its been rolled out. Self-serve mechanism accelerates changes The only way to rapidly evolves is to self-serve

22. Data Mesh Data specifications are like microservices APIs They are contracts between producers and consumers Each team owns their data specifications To avoid accidentally abstraction leakage Decentralization allows rapid experiments Common conventions are promoted to minimize frictions among different domain systems

23. https://martinfowler.com/articles/data-monolith-to-mesh.html

24. yelp.com/dataset_challenge Academic dataset from 10 cities across the globe! Your academic project, research or visualizations submitted by December 31, 2019 = a $5,000 prize* ! *See full terms on website 6M reviews 1M business attributes 190K businesses 200K photos

25. Questions/Suggestions? smoy@yelp.com

26. Thank you.

Data Mesh @ Yelp - 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Mesh @ Yelp - 2019

Similar to Data Mesh @ Yelp - 2019 (20)

Recently uploaded

Recently uploaded (20)

Data Mesh @ Yelp - 2019