SlideShare a Scribd company logo
1 of 32
Download to read offline
www.scling.com
Data democratised
Next data analytics & protection, 2019-12-11
Lars Albertsson (@lalleal)
Scling
1
www.scling.com
Big data adoption
22
● 2003-2007: Only Google
● 2007-2014: Hadoop era (Europe). Highly technical
companies succeed and disrupt.
● 2015-2019: Enterprise adoption (Europe). Big data
gone from Gartner hype cycle. “New normal”
● 2019: Many enterprises in production, but big data and
machine learning ROI still confined to high-tech.
www.scling.com
Data value efficiency gap
aka
disrupted or disruptor
3
Early Spotify recommendations
Creator of Luigi, Annoy
www.scling.com
Efficiency gap, latency
4
We just took a machine
learning pipeline in
production after 8 months.
Great success!
Scandinavian retail
(pycon.se, 2019)Document similarity
pipeline finally in
production. Estimated 3
months, took 8 months.
Scandinavian telecom
(NDSML Summit 2019)
2016: Data platform approval
2018: Pipeline in production
Dutch bank
(Dataworks Summit 2018)
Bonnier News
(Riga DevOpsDays 2018)
Platform + 1st pipeline in production.
Seven weeks, 1 person.
Scandinavian retail
2018
New pipeline: < 1 day
Mend pipeline: < 1 hour
Spotify DataOps
transform, 2013
Platform + 1st pipeline in production.
Three weeks, 4 persons.
20 pipelines in 8 months.
www.scling.com
Efficiency gap, data cost & value
● Data processing produces datasets
● Each dataset has business value
○ Financial, sales, forecasting reports
○ A/B test, auto completion, insights
○ Recommendations, fraud
● Proxy metric: datasets / day
○ S-M traditional: < 10
○ Bank, telecom, media: 10-1000
5
2016: 20000 datasets / day
2017: 100B events collected / day
Spotify
2016: 1600 000 000
datasets / day
Google
www.scling.com
Data efficiency key factors
6
Data democratisation
● Making data available,
usable, accessible DataOps
● Short path from idea to production
● Cross-functional teams
○ Data engineering, domain experts, product, (data science)
○ Aligned with value, not function
● Low cost of failure
○ Machine and human failure
○ Risks ok → move fast
● Engineered operations
www.scling.com
Service-oriented organisations
● Teams own services
● Teams own data
7
www.scling.com
Data-centric innovation
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ quality?
○ extraction?
○ data governance?
○ history?
8
www.scling.com
Data-centric innovation
● Need data from teams
○ willing?
○ backlog?
○ collected?
○ useful?
○ quality?
○ extraction?
○ data governance?
○ history?
● Innovation friction
Value adding Waste
9
www.scling.com
Centralising data
10
Data lake
www.scling.com
More data - decreased friction
11
Data lake
Stream storage
www.scling.com
Hadoop is dead?
12
www.scling.com
Traditional systems
13
Mutation
www.scling.com
Data lake
Transformation
Cold
store
Data pipelines at a glance
14
Mutation
Immutable,
shareable
www.scling.com
Data lake
Transformation
Cold
store
Data pipelines at a glance
15
Mutation
Immutable,
shareable
Early Hadoop:
● Weak indexing
● No transactions
● Weak security
● Batch transformations
DataOps workflows:
● Immutable, shared data
● Resilient to failure
● Quick error recovery
● Low-risk experiments
www.scling.com
Late Hadoop adoption
16
Mutation
Can you please
implement mutability,
transactions, SQL, etc?
We would like to keep
our workflows.
Anything, as long as
you are buying.
DataOps workflows:
● Immutable, shared data
● Resilient to failure
● Quick error recovery
● Low-risk experiments
www.scling.com
Complex business logic - MDM @ Spotify ~2014
● 10 pipelines like this
● Pipeline dev environment
● Pipeline continuous deployment
infrastructure
One team of five engineers
17
www.scling.com
Data value = data + domain expertise + data practices
18
Disrupt?
https://xkcd.com/1831/
+ 1000s of failures...
www.scling.com
Data value = data + domain expertise + data practices
19
Disrupt?
https://xkcd.com/1831/
Adapt?
+ 1000s of failures...
www.scling.com
Data value = data + domain expertise + data practices
20
Data lake
Stream storage
Client data +
domain expertise
Practices from
data leaders
Disrupt?
https://xkcd.com/1831/
Collaborate?
Data-value-as-a-service
Adapt?
+ 1000s of failures...
www.scling.com
Factors of democratisation
21
Siloed Shared
Distributed
storage
Homogeneous
storage
CoordinatedOrganic
www.scling.com
Factors of democratisation
22
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
CoordinatedOrganic
www.scling.com
Factors of democratisation
23
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
CoordinatedOrganic
www.scling.com
Factors of democratisation
24
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
CoordinatedOrganic
www.scling.com
Factors of democratisation
25
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
Common glossary,
semantics
Tribal
knowledge
CoordinatedOrganic
www.scling.com
Factors of democratisation
26
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
Common glossary,
semantics
Tribal
knowledge
Common data
provenance
Unclear data
origin
CoordinatedOrganic
www.scling.com
Factors of democratisation
27
Siloed Shared
Distributed
storage
Homogeneous
storage
Documentation
read+write accessNeed-to-know
basis
Code read+write
access
Closed code
ownership
Coordinated data
governanceLocal rituals
Common glossary,
semantics
Tribal
knowledge
Common DataOps
procedures
Lay-on-hands
deployment
Common data
provenance
Unclear data
origin
CoordinatedOrganic
www.scling.com
An e-shopping tale
28
1. Log in, search for product X
○ X + 100s of accessories, random order
2. Find X in product catalog
○ No link to web shop
3. Put in cart, delivery?
○ Ask for address, customer club number
4. …
Full story: “Avoid artificial stupidity” blog post
1. Log in, search for product X
○ Popular items first
2. Find X in product catalog
○ Take me to shop
3. Put in cart, delivery?
○ I am logged in
4. ...
www.scling.com
● Include minimal governance, security, privacy
Data lake
Transformation
Cold
store
Document a clean architecture
29
Mutation
Immutable,
shareable
● Align team with use case
○ Zero budget
● Ingest only necessary data
● Key technical component: Workflow orchestrator (Luigi / Airflow)
A lean start
30
www.scling.com
An MVP is minimal
31
Out of scope
Minimal privacy -
limiting access
One use
case
In scope
Minimal
privacy
Security
One DB
source
One use
caseData
scala-
bility
High
availa-
bility
Dura-
bility
Most
privacy
Self
service
Data
quality
Auto-
mation
Clusters
Audita-
bility
Scalable
BI
Fill lake
Real-
time
Lineage
● Remove complexity wherever possible
○ Unfamiliar tools may be less complex
● Pay attention to human and social factors
Journey towards data value
32
“Five dysfunctions of a data engineering team” -
Jesse Anderson
● Only database admins
● Set up for failure
● No one understands schema
● No veterans
● Too ambitious
“Avoiding big data antipatterns” -
Alex Holmes
● Big data tech for small data
● Point-to-point data integration
● Single tool for the job
● Excess volume or precision
● Lack of security

More Related Content

What's hot

Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift leftLars Albertsson
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesLars Albertsson
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practiceLars Albertsson
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...Big Data Spain
 
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015Institute e-Austria Timisoara
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Enabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio BernalEnabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio BernalBig Data Spain
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applicationsLars Albertsson
 
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache HadoopInfoFarm
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineDataWorks Summit
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideVoltDB
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Dataconomy Media
 

What's hot (20)

Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Enabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio BernalEnabling the Bank of the Future by Ignacio Bernal
Enabling the Bank of the Future by Ignacio Bernal
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache Hadoop
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data Pipeline
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s Guide
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 

Similar to Data democratised

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...Jochem van Grondelle
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data opsLars Albertsson
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Denodo
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Building & Scaling Data Teams
Building & Scaling Data TeamsBuilding & Scaling Data Teams
Building & Scaling Data TeamsOutreach Digital
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)Denodo
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big DataPaul Barsch
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 

Similar to Data democratised (20)

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Building & Scaling Data Teams
Building & Scaling Data TeamsBuilding & Scaling Data Teams
Building & Scaling Data Teams
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 

More from Lars Albertsson

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divideLars Albertsson
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with ScalametaLars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfLars Albertsson
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfLars Albertsson
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetLars Albertsson
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipelineLars Albertsson
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Lars Albertsson
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven productsLars Albertsson
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 

More from Lars Albertsson (15)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 

Recently uploaded

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 

Recently uploaded (20)

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 

Data democratised

  • 1. www.scling.com Data democratised Next data analytics & protection, 2019-12-11 Lars Albertsson (@lalleal) Scling 1
  • 2. www.scling.com Big data adoption 22 ● 2003-2007: Only Google ● 2007-2014: Hadoop era (Europe). Highly technical companies succeed and disrupt. ● 2015-2019: Enterprise adoption (Europe). Big data gone from Gartner hype cycle. “New normal” ● 2019: Many enterprises in production, but big data and machine learning ROI still confined to high-tech.
  • 3. www.scling.com Data value efficiency gap aka disrupted or disruptor 3 Early Spotify recommendations Creator of Luigi, Annoy
  • 4. www.scling.com Efficiency gap, latency 4 We just took a machine learning pipeline in production after 8 months. Great success! Scandinavian retail (pycon.se, 2019)Document similarity pipeline finally in production. Estimated 3 months, took 8 months. Scandinavian telecom (NDSML Summit 2019) 2016: Data platform approval 2018: Pipeline in production Dutch bank (Dataworks Summit 2018) Bonnier News (Riga DevOpsDays 2018) Platform + 1st pipeline in production. Seven weeks, 1 person. Scandinavian retail 2018 New pipeline: < 1 day Mend pipeline: < 1 hour Spotify DataOps transform, 2013 Platform + 1st pipeline in production. Three weeks, 4 persons. 20 pipelines in 8 months.
  • 5. www.scling.com Efficiency gap, data cost & value ● Data processing produces datasets ● Each dataset has business value ○ Financial, sales, forecasting reports ○ A/B test, auto completion, insights ○ Recommendations, fraud ● Proxy metric: datasets / day ○ S-M traditional: < 10 ○ Bank, telecom, media: 10-1000 5 2016: 20000 datasets / day 2017: 100B events collected / day Spotify 2016: 1600 000 000 datasets / day Google
  • 6. www.scling.com Data efficiency key factors 6 Data democratisation ● Making data available, usable, accessible DataOps ● Short path from idea to production ● Cross-functional teams ○ Data engineering, domain experts, product, (data science) ○ Aligned with value, not function ● Low cost of failure ○ Machine and human failure ○ Risks ok → move fast ● Engineered operations
  • 8. www.scling.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? 8
  • 9. www.scling.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? ● Innovation friction Value adding Waste 9
  • 11. www.scling.com More data - decreased friction 11 Data lake Stream storage
  • 14. www.scling.com Data lake Transformation Cold store Data pipelines at a glance 14 Mutation Immutable, shareable
  • 15. www.scling.com Data lake Transformation Cold store Data pipelines at a glance 15 Mutation Immutable, shareable Early Hadoop: ● Weak indexing ● No transactions ● Weak security ● Batch transformations DataOps workflows: ● Immutable, shared data ● Resilient to failure ● Quick error recovery ● Low-risk experiments
  • 16. www.scling.com Late Hadoop adoption 16 Mutation Can you please implement mutability, transactions, SQL, etc? We would like to keep our workflows. Anything, as long as you are buying. DataOps workflows: ● Immutable, shared data ● Resilient to failure ● Quick error recovery ● Low-risk experiments
  • 17. www.scling.com Complex business logic - MDM @ Spotify ~2014 ● 10 pipelines like this ● Pipeline dev environment ● Pipeline continuous deployment infrastructure One team of five engineers 17
  • 18. www.scling.com Data value = data + domain expertise + data practices 18 Disrupt? https://xkcd.com/1831/ + 1000s of failures...
  • 19. www.scling.com Data value = data + domain expertise + data practices 19 Disrupt? https://xkcd.com/1831/ Adapt? + 1000s of failures...
  • 20. www.scling.com Data value = data + domain expertise + data practices 20 Data lake Stream storage Client data + domain expertise Practices from data leaders Disrupt? https://xkcd.com/1831/ Collaborate? Data-value-as-a-service Adapt? + 1000s of failures...
  • 21. www.scling.com Factors of democratisation 21 Siloed Shared Distributed storage Homogeneous storage CoordinatedOrganic
  • 22. www.scling.com Factors of democratisation 22 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis CoordinatedOrganic
  • 23. www.scling.com Factors of democratisation 23 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership CoordinatedOrganic
  • 24. www.scling.com Factors of democratisation 24 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals CoordinatedOrganic
  • 25. www.scling.com Factors of democratisation 25 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals Common glossary, semantics Tribal knowledge CoordinatedOrganic
  • 26. www.scling.com Factors of democratisation 26 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals Common glossary, semantics Tribal knowledge Common data provenance Unclear data origin CoordinatedOrganic
  • 27. www.scling.com Factors of democratisation 27 Siloed Shared Distributed storage Homogeneous storage Documentation read+write accessNeed-to-know basis Code read+write access Closed code ownership Coordinated data governanceLocal rituals Common glossary, semantics Tribal knowledge Common DataOps procedures Lay-on-hands deployment Common data provenance Unclear data origin CoordinatedOrganic
  • 28. www.scling.com An e-shopping tale 28 1. Log in, search for product X ○ X + 100s of accessories, random order 2. Find X in product catalog ○ No link to web shop 3. Put in cart, delivery? ○ Ask for address, customer club number 4. … Full story: “Avoid artificial stupidity” blog post 1. Log in, search for product X ○ Popular items first 2. Find X in product catalog ○ Take me to shop 3. Put in cart, delivery? ○ I am logged in 4. ...
  • 29. www.scling.com ● Include minimal governance, security, privacy Data lake Transformation Cold store Document a clean architecture 29 Mutation Immutable, shareable
  • 30. ● Align team with use case ○ Zero budget ● Ingest only necessary data ● Key technical component: Workflow orchestrator (Luigi / Airflow) A lean start 30
  • 31. www.scling.com An MVP is minimal 31 Out of scope Minimal privacy - limiting access One use case In scope Minimal privacy Security One DB source One use caseData scala- bility High availa- bility Dura- bility Most privacy Self service Data quality Auto- mation Clusters Audita- bility Scalable BI Fill lake Real- time Lineage
  • 32. ● Remove complexity wherever possible ○ Unfamiliar tools may be less complex ● Pay attention to human and social factors Journey towards data value 32 “Five dysfunctions of a data engineering team” - Jesse Anderson ● Only database admins ● Set up for failure ● No one understands schema ● No veterans ● Too ambitious “Avoiding big data antipatterns” - Alex Holmes ● Big data tech for small data ● Point-to-point data integration ● Single tool for the job ● Excess volume or precision ● Lack of security