5. Big data:
● Food
● Activity
● Exercises
● Challenges
● Social network
● Workshops
● Personal Coaches
● CRM
● Fulfillment
● Meal kits
● Supermarket foods
● E-commerce
● Cruises
...for 56 years
6. 2017: fill lake with data; provide analysts access
2019: upstream control and governance
7. Data Entry Transformation 1 Transformation 2
Inaccurate
(GIGO)
Missing
Defaults
Dropped
records
Truncation
Encoding
changes
Data type
change
Stale
3rd party
Disagree
In General, What Can Go Wrong?
Shape
change
Dupes
Dupes
9. Accurate
% records quarantined
% records in range
% records matching
Coherent
% records missing entity ID
% records missing foreign key
Complete
% records dupes
% records missing
% records complete
% fields complete
Consistent % records consistent
Defined
% tables defined
% fields defined
% dimensions defined
% measures defined
Timely
Mean time to arrival
95th percentile time to arrival
Volume Number of Records
Trust NPS
“If you can't measure it, you
can't improve it”
- Peter Drucker
Data Quality
Scorecard
10. Facet: Accuracy
Publish Schema Publish Schema
Adhere to Schema
Field Ranges
Source teams then: Source teams now (WIP):
Data team superpowers:
1. Auto consumption
2. Auto checks
3. Quarantine
4. Reporting
Data did not always match schema
Hard to trust
Hard to automate
No accountability
11. Accurate
% records quarantined
% records in range
% records matching
Facet: Accuracy
Publish Schema Publish Schema
Adhere to Schema
Field Ranges
Source teams then: Source teams now (WIP):
Data team superpowers:
1. Auto consumption
2. Auto checks
3. Quarantine
4. Reporting
Data did not always match schema
Hard to trust
Hard to automate
No accountability
12. Facet: Defined
Table-level data dictionaries
Business-level data dictionary
(Business Glossary)
https://medium.com/@leapingllamas
13. Facet: Defined. Flow from master
Data catalog is
master for table-level
definitions and
business glossary
Mapping table from
master to BI tool: here,
Looker dimensions and
measures
Tool compares
master to BI tool and
updates/injects and
creates pull request
Manually
reviewed and
merged
Master definitions
appear to users
14. Facet: Defined. Flow from master
Data catalog is
master for table-level
definitions and
business glossary
Mapping table from
master to BI tool: here,
Looker dimensions and
measures
Tool compares
master to BI tool and
updates/injects and
creates pull request
Manually
reviewed and
merged
Master definitions
appear to users
Open sourcing: https://github.com/ww-tech/lookml-tools
15. Facet: Defined. Style Guide
Open sourcing: https://github.com/ww-tech/lookml-tools
LookML
linter
16. Defined
% tables defined
% fields defined
Facet: Defined
+
LookML
updater
LookML
linter
Defined
% dimensions defined
% measures defined
17. Easy to lose trust. Hard to regain!
We asked:
● NPS data: would you recommend our data to a friend?
● NPS infrastructure: would you recommend our infrastructure (Looker, BigQuery etc) to a friend?
● NPS support: would you recommend CIE’s support to a friend?
We will resurvey at end of 2019
In April, 2019, we surveyed data-related NPS with analysts, data scientists, and
some decisions makers and execs
Trust NPS
Facet: Trust
18. 1 Accurate
% records quarantined
% records in range
% records matching
2 Coherent
% records missing entity ID
% records missing foreign key
3 Complete
% records dupes
% records missing
% records complete
% fields complete
4 Consistent % records consistent
5 Defined
% tables defined
% fields defined
% dimensions defined
% measures defined
6 Timely
Mean time to arrival
95th percentile time to arrival
7 Volume Number of Records
8 Trust NPS
“If you can't measure it, you
can't improve it”
- Peter Drucker
Data Quality
Scorecard
Reference Data
Server logs
Metadata
Schema
Data catalog +
lookml-tools
Survey
19. Integrate into normal workflows
Our engineers work in Slack, so let them do data quality work there too
20. Integrate into team culture
Agile BI engineering team
● BI engineering teams set aside 10% of time for explicit data quality work
● Expect DQ dashboards for all new sources
● Weekly data quality meetings
● Now proactive, rather than reactive or retrospective
21. Data Quality is a Shared Responsibility
Adhere to
Schema
Automated
consumption
DQ Dashboards
Subscribe /
Report
Value Ranges Automated checks
Data
dictionaries
Investigate Investigate
Data dictionaries
+ glossary
Investigate
Single Source of Truth
Investigate
Data Catalog
Data
dictionaries
docsschemaMonitor/
investigate
22. What Questions Do You Have For Me?
Carl Anderson
carl.anderson@weighwatchers.com
@leapingllamas
https://medium.com/ww-tech-blog
We are hiring:
BI engineers, engineers, and data scientists for our Toronto office (a few blocks away).
Find our booth in recruiting hall.