Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2QEP1Ny.
Zhamak Dehghani introduces Data Mesh, the next generation data platform, that shifts to a paradigm drawing from modern distributed architecture considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product. Filmed at qconsf.com.
Zhamak Dehghani is a principal technology consultant at ThoughtWorks with a focus on distributed systems and modern data platform architecture at Enterprise. She is a member of ThoughtWorks global Technology Advisory Board and contributes to the creation of ThoughtWorks Technology Radar.
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
data-mesh-paradigm/
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
8. 66%
⬆︎
2019
2018
2018
19%⬇︎
2019
The Inconvenient Truth
NewVantage Partners Releases 2019 Big Data and AI Executive Survey (link)
Accelerated i
nvestment in Bi
g Data/AI
Unproven business results
On every measure
- Create data-driven culture
- Treating data as business assets
- Competing on data and analytics
FAILED
10. BIG DATA
ANALYTICAL
ARCHITECTURE
Running the Business
Capabilities & Services
Operational Data
Products & Applications
Communications
Infrastructure
Reports | ML Models
Optimizing the Business
Insights
Analytical Data
Data Transfer & Copying
Data Pipelines & Storage Infra
THE GREAT DIVIDE
OPERATIONAL
SYSTEMS
ARCHITECTURE
32. DOMAIN ORIENTED DATA
DECOMPOSITION & OWNERSHIP
Domains aligned
with the source
Domains aligned
with the consumption
Newly reified shared domains
Claims patients
critical moments of
intervention
Patients longitudinal h
ealth records
Sensors
Bio markers
33. SOURCE ORIENTED (NATIVE)
DOMAIN DATA
Claims
Sensors bio markers
Facts & reality of business
Immutable timed events /
Historical snapshots
Change less frequently
Permanently captured
34. CONSUMER ORIENTED
DOMAIN DATA
Fit for consumer purpose
Aggregation / Projections / Transformed
Change more often
Can be recreated
Patients critical
moments of
intervention
Patients longitudinal h
ealth records
35. 33
DISTRIBUTED PIPELINES IN DOMAINS
Ingestion ServingTransformation
More cleansing, integrity checks here More aggregations, ML modell
36. Domain
Driven
Distributed
Architecture
Domains are the first class concern
Top Level partitions
Data pipelines are second class concern
Implementation details
Architectural Quantum shifts from a
a pipeline to a domain (datasets)
Domain datasets are immutable (time series)
39. DOMAIN DATA AS A PRODUCT
Aka Data Products
SHARED | DISCOVERABLE
ADDRESSABLE
TRUSTWORTHY
(DEFINED & MONITORED SLOs)
SELF-DESCRIB
INTER OPERABLE
(GOVERENED
BY GLOBAL STANDARDS)
SECURE
(GOVERENED
BY GLOBAL ACCES
41. Domain Datasets as a product
Discoverable
Inter-operable
Explicit Quality Objectives
Secure
Shared
Data consumers as customers
Data Product Owner role
Cross-functional team ownership
Success criteria: Decreased lead time to discover and consume a data produc
Product
Thinking
43. DATA INFR AS A PLATFORM
Domain agnostic shared/centr
self-serve tooling / infrastructu
secure data products quickly, d
them, execute their pipelines,
Data Infra as a PlatformData infra engineers
44. Scalable polyglot storage on demand
Encryption for data at rest and in motion
Unified data access control
Data product discoverability
Data product SLO / metrics collection & sharing
Data pipeline orchestration / templates
Data Product CI/CD pipeline
Automate ecosystem governance
Guidelines
Data product scaffolding
45. Data infrastructure & Tooling (DataOps)
Shared, Self-serve, as a Platform
Domain agnostic
Owned by data infra and tooling team
At incubation it’s opinionated
Ideally built on Cloud data services (despite lack of maturity)
Success criteria: reduced lead time to create new secure & discoverable
data products
Platform
Thinking
49. Data Infra as a Platform
(storage, pipeline, discoverability, Access control, etc.)
Federated Ecosystem Governance
(enable interoperability)
Decentralized
Data Product ow
nership
Data infra engi
neers
Abstracting tech comp
ity
Domain Data Products as Ar
chitecture
Quanta
DATA MESH
ARCHITECTURE
51. *DP: Data Pro
[ Call Centre Claims Domain ]
Legacy
Call
Centre
Call Centre
Claims DPCDC
Call Centre Claims
daily snapshots
Online
Claims DPOnline
Claims
Online Claims Events
Online Claims daily snapshots
[ Online Claims Domain ]
[ Claims Domain ]
Claims Events
Claims Snapshots
Claims Data
Product
Native | Aggregate
Data Products
52. [ Members ]
Members Snap
shots
Members
Data Product
*DP: Data Pro
[Claims Domain]
Claims Events
Claims
Snapshots
Claims Data
Product
Aggregate | Fit-for-purpose
Data Products
Members daily
interventions
[ Members intervention]
54. Polyglot Input Data Ports [1..n]
On-prem to cloud
Polyglot Output Data Po
On-prem to cloud
Control Ports
https://members-dp/controls/describe
https://members-dp/controls/audit
https://claims-dp./daily-snapshots/12092019
Sidecars Policy enforcement, discovery, audit, etc.
Data Pipeline
58. FROM TO
DATA MESH PARADIGM SHIFT
Centralized ownership Decentralized ownership
Monolithic Distributed
Pipeline first class concern Domain data first class concern
Data as a by-product Data as a product
Siloed data engineering team Cross-functional data domain teams
59. 57
ADOPT A NEW LANGUAGE
FROM
TO
Ingesting Serving
Extracting & Loading Discovering & Consuming
Flowing data through centralized Pipelines Publishing output data ports
Ecosystem of Data ProductsCentralized Data Lake | Warehouse | Platform