SlideShare a Scribd company logo
1 of 38
Download to read offline
www.scling.com
The lean principles of
DataOps
Berlin Buzzwords, 2020-06-08
Lars Albertsson, Founder, Scling
Christopher Bergh, CEO & Head Chef, DataKitchen
1
www.scling.com
Scling - data-value-as-a-service
2
Data lake
Stream storage
● Extract value from your data
● Data platform + custom data pipelines
● Imitate data leaders:
○ Quick idea-to-production
○ Operational efficiency
Our marketing strategy:
● Promiscuously share knowledge
○ On slides devoid of glossy polish
www.scling.com
1994: OS/2 Warp CID installation
3
Grmbl, who
reinstalled my
machine?
www.scling.com
IT craft to factory
4
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
www.scling.com
Security Waterfall
Data factories
5
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DB-oriented
architecture
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
The Toyota Way
Selected lean principles:
● Long-term over short-term
● The right process will produce the right results
● Eliminate waste (muda)
● Continuous improvement (kaizen)
● Use pull systems to avoid unnecessary production
● Quality takes precedence (jidoka)
○ Stop to fix problems
● Standardised tasks and processes
● Reliable technology that serves people and process
● Develop your people
● Decisions slowly by consensus
● Relentless reflection (hansei), organisational learning
6
www.scling.com
Common waste species
● Cognitive waste
● Delivery waste
● Operational waste
● Product waste
7
www.scling.com
Cognitive waste
● Why do we have 25 time formats?
○ ISO 8601, UTC assumed
○ ISO 8601 + timezone
○ Millis since epoch, UTC
○ Nanos since epoch, UTC
○ Millis since epoch, user local time
○ …
○ Float of seconds since epoch, as string.
WTF?!?
● my-kafka-topic-name, your_topic_name
8
● Definition of an order:
○ Abandoned cart?
○ Payment refused?
○ Returned goods?
○ Free promotion?
● Data entity source of truth
○ MySQL, Kafka, data lake?
www.scling.com
What causes cognitive waste?
● We are autonomous!
○ Teams can choose technology, format, process, ...
● Cognitive debt
○ Short-term over long-term
○ Decisions without consensus
● Recognition and rewards
○ "You have made a similar independent pipeline, great work!"
9
www.scling.com
Avoiding cognitive waste
● Reusing semantic definitions
● Reusing code & technical definitions
○ Code transparency & sharing
○ Standardised technology
○ Document decisions & consensus process
● Read-only sharing not enough
○ Must be empowered to change for reuse and to improve quality
○ Standardised processes
10
www.scling.com
Eliminating cognitive waste
● Refactoring code, semantics, docs
● Low risk - what will I break downstream?
○ Standardised, automated, trusted QA process
○ End-to-end pipeline testing
● "Creating a pipeline - one day! Replace old pipeline - 18 months."
11
www.scling.com
Delivery waste
● Friction from code to production
○ Ideal: Idea, research, write code+tests, done. Everything else is friction.
● Code inventory
○ Code not yet fully utilised
● Data inventory
○ Data not yet fully processed
12
www.scling.com
Data product quality assurance
● Product quality = f(code, data)
○ Cannot do full QA on code only
○ Only real data is production data
● Test in production
○ Quick QA cycle = quick production deployment
○ Measure, monitor, validate
13
www.scling.com
Eliminating delivery friction
14
● In theory simple - scrutinise everything
○ Positive engineering: writing code, tests, docs, refactor, improve
○ All else is negative
● You are limited by your assumptions
○ State of practice far from state of art
But the test suite
takes 3 hours.
We have this
checklist.
Security must
approve.
X must be
released before Y.
That is another
team's job.
We don't have
access.
We must test in
staging first.
We haven't
performance
tested yet.
www.scling.com
So get rid of the waste. Resources:
No tradeoff between speed and quality!
15
www.scling.com
● Code not yet fully utilised
● Code on its way to production
○ In a notebook
○ Waiting for approval
○ Waiting for release
○ Internally released, waiting
for dependants to upgrade
● Tests not fully used
○ Cover code (shared component),
but not yet executed
Code inventory
16
www.scling.com
Data inventory
● Data collected, but not yet fully processed
○ Traditional lazy joins & SQL processing at runtime
● Eliminate with eager processing = pipeline
○ Process, join, denormalise
● Fatal problems → offline crash
○ "Andon" cord - stop and fix before significant harm is done
17
www.scling.com
Operational waste
● Friction in operational manoeuvres
○ Fear of mistakes
● Cost of incidents
○ Time to recovery
○ Impact of incident
○ Frequency of incidents
18
www.scling.com
Separating offline and online
19
Raw
19
Fraud
serviceFraud
model
Orders Orders
Replication /
Backup
Standard procedures Standard proceduresLightweight procedures
● QA driven by internal efficiency
● Continuous deployment
● New pipeline < 1 day
● Upgrade < 1 hour
● Bug recovery < 1 hour
Careful handover Careful handover
www.scling.com
20
Cost of a software error
Online
● User impact
● Data corruption
● Cascading corruption
● Unbounded recovery
www.scling.com
21
Cost of a software error
Nearline
● Data corruption
● Downstream impact
● Bounded recovery
Online
● User impact
● Data corruption
● Cascading corruption
● Unbounded recovery
Job
Stream
Stream
Job
Stream
www.scling.com
22
Cost of a software error
Nearline
● Data corruption
● Downstream impact
● Bounded recovery
Offline
● Temporary data
corruption
● Downstream impact
● Easy recovery
Online
● User impact
● Data corruption
● Cascading corruption
● Unbounded recovery
Job
Stream
Stream
Job
Stream
www.scling.com
Data speed Innovation speed
23
Nearline
Data processing tradeoff
23
Job
Stream
OfflineOnline
Stream
Job
Stream
www.scling.com
Product waste
● Work not driven by use case
● Unrealised data potential due to friction
○ Unawareness of data
○ Difficulty to use data
● Hidden quality problems
● Collaboration and communication overhead
24
Data democratisation -
making data accessible
and usable
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
Waste: Your Team’s Time Not Well Spent
25
Percentage
Time Team
Spends Per
Week
Current
Errors &
Operational Tasks
New Features &
Data For Customers
Improvements & Debt
Challenges:
• Complex roles
• Complex organizations
• Complex toolchains
• Complex data
• Complex collaboration
Copyright 2020 DataKitchen, Inc.
Waste: Data Analytics is like the US Auto
Industry in the 1970s
Current
High Errors
Production
Errors
Data Analytics
Team
Deployment
Latency
Weeks, Months
Dev Prod
Challenges:
• Slow to add new features,
rapidly address consumer
requests, changing data sets
• Lack of trust by data
consumers
• Slow model deployment, slow
to move to cloud
• Team morale
26
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
Waste: Conway’s Law and Data Pipelines
Data Analytics Follows Conway's Law
The structure of how teams are organized to do Data Science, Data
Engineering, Analytics, and Production is reflected in their data
pipelines.
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
Waste: A cornucopia of collaboration complexity
D D
P
D
D
D D
D
D
D
P
D
P
P
D Development - Data Analytic Team P Production - Data Analytic Team
Centralized Dev Centralized Dev & Prod Decentralized Dev Decentralized Dev & Prod
How do we create
together without conflicts?
(Data Engineer & Data
Scientist)
How do we deploy safely
and rapidly? (Data Team and
Production Team)
How to balance centralized
control vs self service freedom?
(Home Office Data Team and
Line of Business Analysts)
How to reuse/incorporate what
another team deployed?
(Multiple Data & Production
Teams in Many Orgs)
DE
DS
BI
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
Why? Data Teams Are Suffering
Data teams are caught between three competing forces:
• Unaware Data Providers – unaware that they send
crappy, late, and error prone data sets
• Demanding Data Consumers – demand trusted, original
insight at the speed of Amazon delivery
• Critical Supporting Teams – need flawless ongoing
production and collaboration with other teams/people
Make for:
• A beaten down, distraught, disempowered work
environment
• Teams that cannot create and innovate
• Lack of trust all around
29
Unaware Data
Providers
Demanding Data
Consumers
Critical Supporting
Teams
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
DataOps – Solution To That Suffering
DataOps – The technical practices,
cultural norms, and architecture
that enable:
• Rapid cycles of experimentation
and innovation to delivery of new
insights to our customers
• Low error rates
• Collaboration across complex sets
of people, technology, and
environments
• Clear measurement and monitoring
of results
30Source: Gartner
“Organizations that adopt a DevOps- and DataOps-based
approach are more successful in implementing end-to-end,
reliable, robust, scalable and repeatable solutions.”
Sumit Pal, Gartner, November 2018
People,
Process,
Organization
Technical
Environment
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DataOps Benefit: Lower Cost, More Insight
31
After DataOps
Percentage
Time Team
Spends Per
Week
Before DataOps
New Features &
Data For Customers
Errors &
Operational Tasks
New Features &
Data For Customers
Improvements & Debt
Errors & Operational
Tasks
Process Improvements
& Tech Debt Reduction
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DataOps Benefit: Faster, Better & Happier
32
After DataOpsBefore DataOps
High Errors
Production
Errors Low Errors
Data Analytics
Team
Deployment
Latency
Weeks, Months
Dev Prod
Hours & Mins
Dev Prod
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time,
Collaboration, and Measurement
Industrial Manufacturing
Teams
Business
Management
Concept
Data Science, Engineering
and Analytics Teams
IT and Software TeamsOrganization
Team Management Agile, Kanban, Scrum, DA, etc.
Team Management Six Sigma,
Total Quality Management
Organizational
Management
Method
Technical
Environment and
Process DevOps
AIOps
DevSecOps
DataOps
ModelOps
MLOps
…
GitOps
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time,
Collaboration, and Measurement
Industrial Manufacturing
Teams
Business
Management
Concept
Data Science, Engineering
and Analytics Teams
IT and Software TeamsOrganization
Team Management Agile, Kanban, Scrum, DA, etc.
Team Management Six Sigma,
Total Quality Management
Organizational
Management
Method
Technical
Environment and
Process DevOps
AIOps
DevSecOps
DataOps
ModelOps
MLOps
…
GitOps
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time,
Collaboration, and Measurement
Industrial Manufacturing
Teams
Business
Management
Concept
Data Science, Engineering
and Analytics Teams
IT and Software TeamsOrganization
Team Management Agile, Kanban, Scrum, DA, etc.
Team Management Six Sigma,
Total Quality Management
Organizational
Management
Method
Technical
Environment and
Process DevOps
AIOps
DevSecOps
DataOps
ModelOps
MLOps
…
GitOps
Copyright 2020 by DataKitchen, Inc.  All Rights Reserved.
DevOps vs DataOps (and all those *Opses)
Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time,
Collaboration, and Measurement
Industrial Manufacturing
Teams
Business
Management
Concept
Data Science, Engineering
and Analytics Teams
IT and Software TeamsOrganization
Team Management Agile, Kanban, Scrum, DA, etc.
Team Management Six Sigma,
Total Quality Management
Organizational
Management
Method
Technical
Environment and
Process DevOps
AIOps
DevSecOps
DataOps
ModelOps
MLOps
…
GitOps
Copyright 2020 by DataKitchen, Inc. All Rights Reserved.
What You Do Is Much Less Important Than
How You Do It
37
“We realized that the true problem, the true difficulty, and where
the greatest potential is – is building the machine that makes
the machine. It’s building the factory.” – Elon Musk
94% of causes were common cause. We often attribute problems
to a specific case, and look for a person to blame, rather than
focusing on the underlying process – Dr Deming
www.scling.com
Questions?
38

More Related Content

What's hot

Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicRadovan Baćović
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTechWell
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOpsSteven Ensslen
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessInside Analysis
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analyticsRob Winters
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...TamrMarketing
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile DataDataKitchen
 
Webinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data IntegrationWebinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data IntegrationSnapLogic
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActionsRob Winters
 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaPeter O'Kelly
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIInside Analysis
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...Seeling Cheung
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 

What's hot (20)

Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovic
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
 
Open Data Science Conference Agile Data
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile Data
 
Webinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data IntegrationWebinar: Attaining Excellence in Big Data Integration
Webinar: Attaining Excellence in Big Data Integration
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActions
 
MLUC 2011 XQuery Enigma
MLUC 2011 XQuery EnigmaMLUC 2011 XQuery Enigma
MLUC 2011 XQuery Enigma
 
An Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BIAn Ounce of Prevention: Forging Healthy BI
An Ounce of Prevention: Forging Healthy BI
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 

Similar to The lean principles of data ops

DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesLars Albertsson
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divideLars Albertsson
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
 
Overcoming Digital Transformation Pain Points
Overcoming Digital Transformation Pain PointsOvercoming Digital Transformation Pain Points
Overcoming Digital Transformation Pain PointsInductive Automation
 
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000DDeveloping and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000Ddclsocialmedia
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
Data Con LA 2022 - Practical Solutions to Complex Supply Chain Problems
Data Con LA 2022 - Practical Solutions to Complex Supply Chain ProblemsData Con LA 2022 - Practical Solutions to Complex Supply Chain Problems
Data Con LA 2022 - Practical Solutions to Complex Supply Chain ProblemsData Con LA
 
Introduction for Embedding Infobright for OEMs
Introduction for Embedding Infobright for OEMsIntroduction for Embedding Infobright for OEMs
Introduction for Embedding Infobright for OEMsInfobright
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL DatabaseNuoDB
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetLars Albertsson
 
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...Codemotion
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataSociety of Petroleum Engineers
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Cubodrom profile
Cubodrom profileCubodrom profile
Cubodrom profilecubodrom
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 

Similar to The lean principles of data ops (20)

DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Overcoming Digital Transformation Pain Points
Overcoming Digital Transformation Pain PointsOvercoming Digital Transformation Pain Points
Overcoming Digital Transformation Pain Points
 
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000DDeveloping and Implementing a QA Plan During Your Legacy Data to S1000D
Developing and Implementing a QA Plan During Your Legacy Data to S1000D
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Data Con LA 2022 - Practical Solutions to Complex Supply Chain Problems
Data Con LA 2022 - Practical Solutions to Complex Supply Chain ProblemsData Con LA 2022 - Practical Solutions to Complex Supply Chain Problems
Data Con LA 2022 - Practical Solutions to Complex Supply Chain Problems
 
Introduction for Embedding Infobright for OEMs
Introduction for Embedding Infobright for OEMsIntroduction for Embedding Infobright for OEMs
Introduction for Embedding Infobright for OEMs
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big Data
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Cubodrom profile
Cubodrom profileCubodrom profile
Cubodrom profile
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 

More from Lars Albertsson

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with ScalametaLars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfLars Albertsson
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfLars Albertsson
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift leftLars Albertsson
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityLars Albertsson
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processingLars Albertsson
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisisLars Albertsson
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipelineLars Albertsson
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platformLars Albertsson
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science teamLars Albertsson
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Lars Albertsson
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big dataLars Albertsson
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practiceLars Albertsson
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applicationsLars Albertsson
 

More from Lars Albertsson (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
Data democratised
Data democratisedData democratised
Data democratised
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0Test strategies for data processing pipelines, v2.0
Test strategies for data processing pipelines, v2.0
 
10 ways to stumble with big data
10 ways to stumble with big data10 ways to stumble with big data
10 ways to stumble with big data
 
Protecting privacy in practice
Protecting privacy in practiceProtecting privacy in practice
Protecting privacy in practice
 
Testing data streaming applications
Testing data streaming applicationsTesting data streaming applications
Testing data streaming applications
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 

Recently uploaded (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 

The lean principles of data ops

  • 1. www.scling.com The lean principles of DataOps Berlin Buzzwords, 2020-06-08 Lars Albertsson, Founder, Scling Christopher Bergh, CEO & Head Chef, DataKitchen 1
  • 2. www.scling.com Scling - data-value-as-a-service 2 Data lake Stream storage ● Extract value from your data ● Data platform + custom data pipelines ● Imitate data leaders: ○ Quick idea-to-production ○ Operational efficiency Our marketing strategy: ● Promiscuously share knowledge ○ On slides devoid of glossy polish
  • 3. www.scling.com 1994: OS/2 Warp CID installation 3 Grmbl, who reinstalled my machine?
  • 4. www.scling.com IT craft to factory 4 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  • 6. www.scling.com The Toyota Way Selected lean principles: ● Long-term over short-term ● The right process will produce the right results ● Eliminate waste (muda) ● Continuous improvement (kaizen) ● Use pull systems to avoid unnecessary production ● Quality takes precedence (jidoka) ○ Stop to fix problems ● Standardised tasks and processes ● Reliable technology that serves people and process ● Develop your people ● Decisions slowly by consensus ● Relentless reflection (hansei), organisational learning 6
  • 7. www.scling.com Common waste species ● Cognitive waste ● Delivery waste ● Operational waste ● Product waste 7
  • 8. www.scling.com Cognitive waste ● Why do we have 25 time formats? ○ ISO 8601, UTC assumed ○ ISO 8601 + timezone ○ Millis since epoch, UTC ○ Nanos since epoch, UTC ○ Millis since epoch, user local time ○ … ○ Float of seconds since epoch, as string. WTF?!? ● my-kafka-topic-name, your_topic_name 8 ● Definition of an order: ○ Abandoned cart? ○ Payment refused? ○ Returned goods? ○ Free promotion? ● Data entity source of truth ○ MySQL, Kafka, data lake?
  • 9. www.scling.com What causes cognitive waste? ● We are autonomous! ○ Teams can choose technology, format, process, ... ● Cognitive debt ○ Short-term over long-term ○ Decisions without consensus ● Recognition and rewards ○ "You have made a similar independent pipeline, great work!" 9
  • 10. www.scling.com Avoiding cognitive waste ● Reusing semantic definitions ● Reusing code & technical definitions ○ Code transparency & sharing ○ Standardised technology ○ Document decisions & consensus process ● Read-only sharing not enough ○ Must be empowered to change for reuse and to improve quality ○ Standardised processes 10
  • 11. www.scling.com Eliminating cognitive waste ● Refactoring code, semantics, docs ● Low risk - what will I break downstream? ○ Standardised, automated, trusted QA process ○ End-to-end pipeline testing ● "Creating a pipeline - one day! Replace old pipeline - 18 months." 11
  • 12. www.scling.com Delivery waste ● Friction from code to production ○ Ideal: Idea, research, write code+tests, done. Everything else is friction. ● Code inventory ○ Code not yet fully utilised ● Data inventory ○ Data not yet fully processed 12
  • 13. www.scling.com Data product quality assurance ● Product quality = f(code, data) ○ Cannot do full QA on code only ○ Only real data is production data ● Test in production ○ Quick QA cycle = quick production deployment ○ Measure, monitor, validate 13
  • 14. www.scling.com Eliminating delivery friction 14 ● In theory simple - scrutinise everything ○ Positive engineering: writing code, tests, docs, refactor, improve ○ All else is negative ● You are limited by your assumptions ○ State of practice far from state of art But the test suite takes 3 hours. We have this checklist. Security must approve. X must be released before Y. That is another team's job. We don't have access. We must test in staging first. We haven't performance tested yet.
  • 15. www.scling.com So get rid of the waste. Resources: No tradeoff between speed and quality! 15
  • 16. www.scling.com ● Code not yet fully utilised ● Code on its way to production ○ In a notebook ○ Waiting for approval ○ Waiting for release ○ Internally released, waiting for dependants to upgrade ● Tests not fully used ○ Cover code (shared component), but not yet executed Code inventory 16
  • 17. www.scling.com Data inventory ● Data collected, but not yet fully processed ○ Traditional lazy joins & SQL processing at runtime ● Eliminate with eager processing = pipeline ○ Process, join, denormalise ● Fatal problems → offline crash ○ "Andon" cord - stop and fix before significant harm is done 17
  • 18. www.scling.com Operational waste ● Friction in operational manoeuvres ○ Fear of mistakes ● Cost of incidents ○ Time to recovery ○ Impact of incident ○ Frequency of incidents 18
  • 19. www.scling.com Separating offline and online 19 Raw 19 Fraud serviceFraud model Orders Orders Replication / Backup Standard procedures Standard proceduresLightweight procedures ● QA driven by internal efficiency ● Continuous deployment ● New pipeline < 1 day ● Upgrade < 1 hour ● Bug recovery < 1 hour Careful handover Careful handover
  • 20. www.scling.com 20 Cost of a software error Online ● User impact ● Data corruption ● Cascading corruption ● Unbounded recovery
  • 21. www.scling.com 21 Cost of a software error Nearline ● Data corruption ● Downstream impact ● Bounded recovery Online ● User impact ● Data corruption ● Cascading corruption ● Unbounded recovery Job Stream Stream Job Stream
  • 22. www.scling.com 22 Cost of a software error Nearline ● Data corruption ● Downstream impact ● Bounded recovery Offline ● Temporary data corruption ● Downstream impact ● Easy recovery Online ● User impact ● Data corruption ● Cascading corruption ● Unbounded recovery Job Stream Stream Job Stream
  • 23. www.scling.com Data speed Innovation speed 23 Nearline Data processing tradeoff 23 Job Stream OfflineOnline Stream Job Stream
  • 24. www.scling.com Product waste ● Work not driven by use case ● Unrealised data potential due to friction ○ Unawareness of data ○ Difficulty to use data ● Hidden quality problems ● Collaboration and communication overhead 24 Data democratisation - making data accessible and usable
  • 25. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. Waste: Your Team’s Time Not Well Spent 25 Percentage Time Team Spends Per Week Current Errors & Operational Tasks New Features & Data For Customers Improvements & Debt Challenges: • Complex roles • Complex organizations • Complex toolchains • Complex data • Complex collaboration
  • 26. Copyright 2020 DataKitchen, Inc. Waste: Data Analytics is like the US Auto Industry in the 1970s Current High Errors Production Errors Data Analytics Team Deployment Latency Weeks, Months Dev Prod Challenges: • Slow to add new features, rapidly address consumer requests, changing data sets • Lack of trust by data consumers • Slow model deployment, slow to move to cloud • Team morale 26
  • 27. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. Waste: Conway’s Law and Data Pipelines Data Analytics Follows Conway's Law The structure of how teams are organized to do Data Science, Data Engineering, Analytics, and Production is reflected in their data pipelines.
  • 28. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. Waste: A cornucopia of collaboration complexity D D P D D D D D D D P D P P D Development - Data Analytic Team P Production - Data Analytic Team Centralized Dev Centralized Dev & Prod Decentralized Dev Decentralized Dev & Prod How do we create together without conflicts? (Data Engineer & Data Scientist) How do we deploy safely and rapidly? (Data Team and Production Team) How to balance centralized control vs self service freedom? (Home Office Data Team and Line of Business Analysts) How to reuse/incorporate what another team deployed? (Multiple Data & Production Teams in Many Orgs) DE DS BI
  • 29. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. Why? Data Teams Are Suffering Data teams are caught between three competing forces: • Unaware Data Providers – unaware that they send crappy, late, and error prone data sets • Demanding Data Consumers – demand trusted, original insight at the speed of Amazon delivery • Critical Supporting Teams – need flawless ongoing production and collaboration with other teams/people Make for: • A beaten down, distraught, disempowered work environment • Teams that cannot create and innovate • Lack of trust all around 29 Unaware Data Providers Demanding Data Consumers Critical Supporting Teams
  • 30. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. DataOps – Solution To That Suffering DataOps – The technical practices, cultural norms, and architecture that enable: • Rapid cycles of experimentation and innovation to delivery of new insights to our customers • Low error rates • Collaboration across complex sets of people, technology, and environments • Clear measurement and monitoring of results 30Source: Gartner “Organizations that adopt a DevOps- and DataOps-based approach are more successful in implementing end-to-end, reliable, robust, scalable and repeatable solutions.” Sumit Pal, Gartner, November 2018 People, Process, Organization Technical Environment
  • 31. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DataOps Benefit: Lower Cost, More Insight 31 After DataOps Percentage Time Team Spends Per Week Before DataOps New Features & Data For Customers Errors & Operational Tasks New Features & Data For Customers Improvements & Debt Errors & Operational Tasks Process Improvements & Tech Debt Reduction
  • 32. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DataOps Benefit: Faster, Better & Happier 32 After DataOpsBefore DataOps High Errors Production Errors Low Errors Data Analytics Team Deployment Latency Weeks, Months Dev Prod Hours & Mins Dev Prod
  • 33. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  • 34. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  • 35. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  • 36. Copyright 2020 by DataKitchen, Inc.  All Rights Reserved. DevOps vs DataOps (and all those *Opses) Lean, Learning Origination, and W Edwards Deming Principles: Focus on Low Errors, Cycle Time, Collaboration, and Measurement Industrial Manufacturing Teams Business Management Concept Data Science, Engineering and Analytics Teams IT and Software TeamsOrganization Team Management Agile, Kanban, Scrum, DA, etc. Team Management Six Sigma, Total Quality Management Organizational Management Method Technical Environment and Process DevOps AIOps DevSecOps DataOps ModelOps MLOps … GitOps
  • 37. Copyright 2020 by DataKitchen, Inc. All Rights Reserved. What You Do Is Much Less Important Than How You Do It 37 “We realized that the true problem, the true difficulty, and where the greatest potential is – is building the machine that makes the machine. It’s building the factory.” – Elon Musk 94% of causes were common cause. We often attribute problems to a specific case, and look for a person to blame, rather than focusing on the underlying process – Dr Deming