Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Mesh in Practice
Max Schultze - max.schultze@zalando.de
Arif Wider - awider@thoughtworks.com
17-11-2020
How Europe’s ...
2
Max Schultze
● Lead Data Engineer
● MSc in Computer Science
● Took part in early
development of Apache Flink
● Retired s...
3
TABLE OF
CONTENTS
Zalando’s Data Platform
What’s this Data Mesh?
Data Mesh in Practice
4
Zalando’s Data Platform
5
Zalando’s Data Platform
Ingestion
Storage
Serving
6
Web
Tracking
Event Bus
DWH
Ingestion
Storage
Serving
Zalando’s Data Platform
7
Web
Tracking
Event Bus
DWH
Ingestion
Storage
Serving
Metastore
Zalando’s Data Platform
8
Web
Tracking
Event Bus
Ingestion
Storage
Serving
Metastore
Processing Platform
Fast Query Layer
DWH
Data
Catalog
Zalando...
9
Centralization Challenges
Datasets provided by central data infrastructure team
● Lack of ownership
?
10
Field_A Field_B
Record_1
Record_2
Record_3
Datasets provided by central data infrastructure team
● Lack of ownership
Da...
11
Centralization Challenges
Datasets provided by central data infrastructure team
● Lack of ownership
Data pipelines oper...
12
A Recurring Pattern
13
A Recurring Pattern
14
A Recurring Pattern
15
A Recurring Pattern
16
Why is that?
central
data platform
17
Why is that?
checkout
service
checkout
events
18
What is Data Mesh?
Old wine applied to new bottles…
→ Product Thinking
→ Domain-Driven Distributed Architecture
→ Infra...
19
Data as a Product
Data
Product
What is my market?
What are the desires of
my customers?
What “price” is justified?
How ...
20
Domain-Driven Distributed Architecture… applied to Data
Domain
21
Domain-Driven Distributed Architecture… applied to Data
Domain
→
Aggregated
Domain
22
Domain-Driven Distributed Architecture… applied to Data
Discoverable
Addressable
Self-describing
Trustworthy
Interopera...
23
...backed by domain-agnostic self-service data infrastructure
Data Infra as a Platform
Discoverable
Addressable
Self-de...
24
It’s a mindset shift
FROM TO
Centralized ownership Decentralized ownership
Pipelines as first class concern Domain Data...
25
Data Mesh in Practice
26
Recap:
● From Bottleneck to Infra Platform
Data Mesh in Practice
Data Infra as a Platform
27
Recap:
● From Bottleneck to Infra Platform
● From Data Monolith to Interoperable Services
Data Mesh in Practice
Data In...
28
Data Lake Storage
Governance Layer
Central Services with Global Interoperability
29
Data Lake Storage
Bring Your Own Bucket (BYOB)
Governance Layer
30
Processing Platform
Simplify Data Processing
Data Lake Storage
Governance Layer
31
Processing Platform
Simplify Data Sharing
Data Lake Storage
Governance Layer
32
Central Services with Global Interoperability
Decentralized ownership does not imply decentralized infrastructure!
Inte...
33
Recap:
● Datasets provided through pipelines of central data infrastructure teams
Data Mesh in Practice
?
34
How to Ensure Data Quality?
Make conscious decisions
● Opt-in instead of default storage
35
How to Ensure Data Quality?
Make conscious decisions
● Opt-in instead of default storage
● Behavioral changes - data is...
36
Care About Your User!
● Classification of Usage
37
Care About Your User!
● Classification of Usage
● Dedicate resources to
○ Understand usage
○ Ensure quality
38
Some Numbers
39
Some Numbers
● 40 teams using BYOB
40
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
Processing Platform
41
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
● First curated data teams
Data Products
O...
42
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
● First curated data teams
● 0 operational...
43
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
● First curated data teams
● 0 operational...
44
It’s a Journey
45
“Off the shelf” data tooling
46
“Off the shelf” data tooling
De-centralized archiving
47
“Off the shelf” data tooling
De-centralized archiving
De-centralized GDPR deletion tooling
48
“Off the shelf” data tooling
Template driven data preparation
De-centralized archiving
De-centralized GDPR deletion too...
49
Data Mesh in Practice
How Europe’s Leading
Online Platform for Fashion
Goes Beyond the Data Lake
Max Schultze
max.schul...
Upcoming SlideShare
Loading in …5
×

of

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 1 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 2 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 3 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 4 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 5 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 6 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 7 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 8 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 9 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 10 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 11 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 12 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 13 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 14 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 15 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 16 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 17 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 18 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 19 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 20 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 21 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 22 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 23 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 24 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 25 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 26 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 27 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 28 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 29 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 30 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 31 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 32 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 33 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 34 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 35 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 36 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 37 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 38 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 39 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 40 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 41 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 42 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 43 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 44 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 45 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 46 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 47 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 48 Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Slide 49
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

2 Likes

Share

Download to read offline

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake

Download to read offline

The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake

  1. 1. Data Mesh in Practice Max Schultze - max.schultze@zalando.de Arif Wider - awider@thoughtworks.com 17-11-2020 How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake @mcs1408 @arifwider
  2. 2. 2 Max Schultze ● Lead Data Engineer ● MSc in Computer Science ● Took part in early development of Apache Flink ● Retired semi-professional Magic: the Gathering player Who are we? Arif Wider ● Software engineering professor (full) at HTW Berlin, Germany ● Fellow technology consultant with ThoughtWorks Germany (part-time) ● Former Head of AI at ThoughtWorks ● Coffee geek
  3. 3. 3 TABLE OF CONTENTS Zalando’s Data Platform What’s this Data Mesh? Data Mesh in Practice
  4. 4. 4 Zalando’s Data Platform
  5. 5. 5 Zalando’s Data Platform Ingestion Storage Serving
  6. 6. 6 Web Tracking Event Bus DWH Ingestion Storage Serving Zalando’s Data Platform
  7. 7. 7 Web Tracking Event Bus DWH Ingestion Storage Serving Metastore Zalando’s Data Platform
  8. 8. 8 Web Tracking Event Bus Ingestion Storage Serving Metastore Processing Platform Fast Query Layer DWH Data Catalog Zalando’s Data Platform
  9. 9. 9 Centralization Challenges Datasets provided by central data infrastructure team ● Lack of ownership ?
  10. 10. 10 Field_A Field_B Record_1 Record_2 Record_3 Datasets provided by central data infrastructure team ● Lack of ownership Data pipelines operated by central data infrastructure team ● Lack of quality Centralization Challenges
  11. 11. 11 Centralization Challenges Datasets provided by central data infrastructure team ● Lack of ownership Data pipelines operated by central data infrastructure team ● Lack of quality Organizational scaling ● Central team becomes the bottleneck
  12. 12. 12 A Recurring Pattern
  13. 13. 13 A Recurring Pattern
  14. 14. 14 A Recurring Pattern
  15. 15. 15 A Recurring Pattern
  16. 16. 16 Why is that? central data platform
  17. 17. 17 Why is that? checkout service checkout events
  18. 18. 18 What is Data Mesh? Old wine applied to new bottles… → Product Thinking → Domain-Driven Distributed Architecture → Infrastructure as a Platform … creates value from Data https://martinfowler.com/articles/data-monolith-to-mesh.html by Zhamak Dehghani
  19. 19. 19 Data as a Product Data Product What is my market? What are the desires of my customers? What “price” is justified? How to do marketing? What’s the USP? Are my customers happy?
  20. 20. 20 Domain-Driven Distributed Architecture… applied to Data Domain
  21. 21. 21 Domain-Driven Distributed Architecture… applied to Data Domain → Aggregated Domain
  22. 22. 22 Domain-Driven Distributed Architecture… applied to Data Discoverable Addressable Self-describing Trustworthy Interoperable Secure Domain → Aggregated Domain
  23. 23. 23 ...backed by domain-agnostic self-service data infrastructure Data Infra as a Platform Discoverable Addressable Self-describing Trustworthy Interoperable Secure Domain → Aggregated Domain
  24. 24. 24 It’s a mindset shift FROM TO Centralized ownership Decentralized ownership Pipelines as first class concern Domain Data as first class concern Data as a by-product Data as a Product Siloed Data Engineering Team Cross-functional Domain-Data Teams Centralized Data Lake / Warehouse Ecosystem of Data Products
  25. 25. 25 Data Mesh in Practice
  26. 26. 26 Recap: ● From Bottleneck to Infra Platform Data Mesh in Practice Data Infra as a Platform
  27. 27. 27 Recap: ● From Bottleneck to Infra Platform ● From Data Monolith to Interoperable Services Data Mesh in Practice Data Infra as a Platform central data platform
  28. 28. 28 Data Lake Storage Governance Layer Central Services with Global Interoperability
  29. 29. 29 Data Lake Storage Bring Your Own Bucket (BYOB) Governance Layer
  30. 30. 30 Processing Platform Simplify Data Processing Data Lake Storage Governance Layer
  31. 31. 31 Processing Platform Simplify Data Sharing Data Lake Storage Governance Layer
  32. 32. 32 Central Services with Global Interoperability Decentralized ownership does not imply decentralized infrastructure! Interoperability is created through convenient solutions of a self service platform. Decentral Storage Central Infrastructure Decentral Ownership Central Governance
  33. 33. 33 Recap: ● Datasets provided through pipelines of central data infrastructure teams Data Mesh in Practice ?
  34. 34. 34 How to Ensure Data Quality? Make conscious decisions ● Opt-in instead of default storage
  35. 35. 35 How to Ensure Data Quality? Make conscious decisions ● Opt-in instead of default storage ● Behavioral changes - data is a product
  36. 36. 36 Care About Your User! ● Classification of Usage
  37. 37. 37 Care About Your User! ● Classification of Usage ● Dedicate resources to ○ Understand usage ○ Ensure quality
  38. 38. 38 Some Numbers
  39. 39. 39 Some Numbers ● 40 teams using BYOB
  40. 40. 40 Some Numbers ● 40 teams using BYOB ● 100 teams using the processing platform Processing Platform
  41. 41. 41 Some Numbers ● 40 teams using BYOB ● 100 teams using the processing platform ● First curated data teams Data Products On Data Products On Data Products Processing Platform
  42. 42. 42 Some Numbers ● 40 teams using BYOB ● 100 teams using the processing platform ● First curated data teams ● 0 operational effort for the central team Data Products On Data Products On Data Products Processing Platform
  43. 43. 43 Some Numbers ● 40 teams using BYOB ● 100 teams using the processing platform ● First curated data teams ● 0 operational effort for the central team Data Products On Data Products On Data Products Processing Platform It’s a journey ;)
  44. 44. 44 It’s a Journey
  45. 45. 45 “Off the shelf” data tooling
  46. 46. 46 “Off the shelf” data tooling De-centralized archiving
  47. 47. 47 “Off the shelf” data tooling De-centralized archiving De-centralized GDPR deletion tooling
  48. 48. 48 “Off the shelf” data tooling Template driven data preparation De-centralized archiving De-centralized GDPR deletion tooling
  49. 49. 49 Data Mesh in Practice How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake Max Schultze max.schultze@zalando.de @mcs1408 Arif Wider awider@thoughtworks.com @arifwider
  • samkiller

    Feb. 22, 2021
  • ghostx2

    Dec. 16, 2020

The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.

Views

Total views

388

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

34

Shares

0

Comments

0

Likes

2

×