Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar will demonstrate how chronic business challenges can often be attributed to the root problem of poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. Establishing this framework allows organizations to more efficiently identify business and data problems caused by structural issues versus practice-oriented defects; giving them the skillset to prevent these problems from re-occurring.
Learning Objectives:
Understanding foundational data quality concepts based on the DAMA DMBOK
Utilizing data quality engineering in support of business strategy
Case Studies illustrating data quality success
Data quality guiding principles & best practices
Steps for improving data quality at your organization
1. Dr. Peter Aiken, Founder, paiken@datablueprint.com
Karen Akens, Data Consultant, kakens@datablueprint.com
Data Quality Success Stories
Dataversity Webinar 7-12-2016
3. Copyright 2016 by Data Blueprint
3
Karen Akens, CDMP
• Data management and solution development
experience for numerous government and
commercial clients
• Connector between Business & IT based on
practical experience in both arenas
• Focus on Data Quality, Data Governance &
Stewardship, and Business Intelligence
• Speaker at EDW, DGIQ, various DAMA chapters
• Board member of DAMA-Central Virginia.
4. Copyright 2016 by Data Blueprint
4
• Information transparency
• Analytics
• Business Intelligence
• Increasing efficiencies
• Decreasing costs
• Driving holistic decision-
making across the
organization
High
Quality
Data is
Critical
5. Copyright 2016 by Data Blueprint
5
Getting Started with Data Quality
Our approach begins with discovering…
• The data that is most impactful to your
business needs
• Your organizational capabilities to manage
data as an asset (foundational practices)
• The state of your technical environment
(technical practices)
… and then laying out the path forward in
a roadmap…
• That is achievable and matches your
organization’s abilities to deliver
• That builds momentum with specific, short-
term win projects
• That outlines a long-term vision and
implementation milestones
6. Copyright 2016 by Data Blueprint
Client’s Data Landscape
6
• Growth through acquisition
• No data accountability
• Fractured Technology Landscape
• Need to Align with Global Education Strategy
Challenge
• No Comprehensive BI Capability
• Lack of Unified Product and Portfolio
Management
• Poor Data Quality & Unreliable Reporting
• Increasing Costs Due to Poor Data Mgmt.
Business
Impact
• Centralized Data Governance Program
• Formalize Data Stewardship
• Become Proactive vs. Reactive
• Increase Transparency and Decrease Cost
Opportunity
7. Copyright 2016 by Data Blueprint
7
Case Study - Supplier Master
• Business Value Achievements:
• 1) Consolidated number of suppliers getting better terms and conditions
2) Reduced suppliers with immediate payment terms, increasing cash flow
3) Removed duplicate supplier, increasing ability to track spending/reduce risk
dup/payments
4) Increase in email addresses, (order email/remittance email) faster communication
vendors, reduce cost of remittances via post.
• 5) Improving quality contact information risk of missed payments and supplier
relationships
7
0
200
400
600
800
1000
1200
1400
1600
20-Oct-14
2-Dec-14
Data Governance Board 12/14/2014
8. Copyright 2016 by Data Blueprint
Challenges from a Lack of Data Quality
8
It’s a ticking time bomb waiting to explode
No
Account
Creation
Controls
No
Standard
Product
Hierarchy
No
Universal
Product
Model
Visible
Inactive
Records
Labor
Intensive
Manual
Data
Clean-up
Inconsistent
Use of
Business
Terms
Duplicate
Accounts
Missing
Remittance
Info
Inaccurate
Reports
Missing
Data
Who owns
the data?
Who
fixes?
9. Copyright 2016 by Data Blueprint
9
Selling
the
Message
Share
60-second
Elevator
Speech
Use Current
Inconsistencies
that Impact
Reporting
Obtain Senior
Level
Sponsorship
Quantify Value
of Data
Perform Data
Quality Pilot
Demonstrate
Stewardship
Success Story
And ask for help before you think you need it….
10. Copyright 2016 by Data Blueprint
If you want to avoid situations like this…
10
One US system had 11,500 active cost
centers, increasing risk mis-posting &
mapping effort
One S. African system missing
electronic remittance info in 88% of
cases, payments sent by post, increasing
cost & lag time
No Standard Product Hierarchy
Can’t determine product profitability
90% of suppliers in one US system on immediate
payment terms, impact to cash flow
Lost revenue of $2 million annually,
not utilizing rights previously granted
No processes for deactivating vendors, 222,000
obsolete vendors removed from one of two US
systems, many systems still contain ROT
Supply Chain – 320 hours every year end
tracking missing 1099 data to avoid tax
penalties
Data issues not fixed at the source;
never ending battle - financial resources spend 35%
time reconciling data
Data that Matters…
11. Copyright 2016 by Data Blueprint
…you need to have this…
Enterprise
Data
Strategy
Data
Governance &
Stewardship
Framework
which
articulates
roles of data
owners and
data stewards
Senior level
sponsorship &
organizational
culture that
treats data as a
strategic asset Data
Governance
Board with a
mandate to
drive data
quality
enterprise wide
Master Data
Management
solution
Data quality
principles that
are embedded
in process &
system design
across the
enterprise
Standard
Business
Glossary with
an authoring
and publishing
process
but not all at once!
12. Copyright 2016 by Data Blueprint
Where to Start When Developing a Data Quality Framework
No Accountability or Responsibility for Data
•Many resources create, review or manage data
•No formal data stewardship roles and responsibilities
•Difficult to determine who is accountable & responsible for
data
Establish Data Ownership & Increase Data
Accountability
•Define clear data ownership & stewardship roles,
accountability & responsibility of data.
•Define a vetting & onboarding process ensuring resource
capacity
•Establish decision rights
•Maintain a master list of all Data Stewards and their related
data domains.
Inconsistent Master Data
•Fire drill to fix data issues in isolation
•Little standardization across Lines of Business and
Geographies
•Difficult to report on a global level at needed level of detail
•No formal master data change control process
Consistent Master Data Management
•Develop master data standards
•Establish change control
•Define consistent data models
•Ongoing governance and stewardship of master data
Inconsistent Data Definitions
Poor Data Quality
•Business Terms definitions differ by group
•Data monitored in silos
•Fragmented use of a variety of tools
•Focus on find and fix instead of root cause analysis
•No standard reporting/tracking metrics
Term Authoring & Publishing
Increase Data Quality
•Establish and implement process to define business
accredited terms & publish for consumption enterprise wide
•Stewards define business rules used to structure & profile
data
•Develop and implement DQ standards
•Ongoing Score carding & DQ metric reporting
13. Copyright 2016 by Data Blueprint
13
13
Every work stream has a part to play if organization is to move
from a reactive to proactive approach to improving data quality
Principle Implications
1. Capture data right, first
time
Wherever possible all data is captured once, at source, and
validated on input
2. ‘Engineer-in’ positive
impacts on data quality
Wherever possible data quality improvement is automated,
proactive and on-going
Systems, processes and products are inherently designed to
improve data quality. e.g.
• The possibility of errors when data is entered or
changed is ‘engineered out’
• Processes are designed to enter and maintain
accurate data
• Data entry is quick and intuitive for users
3. Integrate data quality into
business processes
Data quality standards and rules are defined and integrated
into day-to-day operations e.g. instances of non-compliance
are fixed at root cause
There is clear accountability throughout the organization for
promoting & sustaining good quality data
14. Copyright 2016 by Data Blueprint
Discovery - Identify potential data quality
issues.
Profile Data - Review sample data and existing
data creation and usage process to provide
context for business rule discussion with Data
Owners and Business Data Stewards.
Develop Business Rules - Work with Data
Owners and Business Data Stewards to review
documented business rules and capture
undocumented rules.
Define Metrics - Define metrics and acceptable
thresholds against which to measure levels of
quality.
Evaluate Data with Metrics - Execute business
rules against production data and evaluate
results. Utilize acceptable thresholds set by the
Data Governance Board to evaluate the data.
Findings Review - Review the Findings with the
Data Owners and Business Data Stewards.
Remediate Anomalies - Implement and
execute remediation process to fix problems
with production data.
Monitor Health - Define and implement a
continuous monitoring/remediation plan to
prevent and/or fix data quality problems in the
future.
Repeatable Process
15. Copyright 2016 by Data Blueprint
Profile Data
Develop
Business Rules
Define Metrics
Evaluate Data
with Metrics
Remediate
Anomalies
Monitor Health
Discovery
Findings
Review
Findings
Review
Discovery
16. Copyright 2016 by Data Blueprint
Identifying Business Need & Resources
• Discovery process – not solely the responsibility of business, IT, or Data
Governance/Data Quality organizations. Requires collaboration.
• Business need or problem definitions can be influenced by a variety of
sources such as:
Migrating to One ERP and One CRM
Master Data Management Processes
Suspected data quality deficiencies impacting BV & regulatory requirements
Data Governance Board initiatives
Needs of data-centric business strategies and opportunities
Directives from executive sponsorship team
17. Copyright 2016 by Data Blueprint
Identifying Business Need & Resources
Identify Key Resources
Business
Data Quality
Center of
Excellence
IT
Data Quality
Analyst
IT Data
Steward
Business
Data
Steward
Data
Owner
18. Copyright 2016 by Data Blueprint
Identifying Business Need & Resources
Refine Problem & Develop Initial Business Case
• Data quality team refines original
problem statement to ensure that
the defined project objectives are
achievable and in alignment with
enterprise strategy.
Refinement of
Problem
Statement
• Begin a list of potential business impacts related to
degraded quality of data within the project scope.
• Human capital expense for manual correction
• Revenue lost due to inaccurate information
• Regulatory fines from compliance violations
• Damage to corporate reputation
Initial
Development of
Business Case
19. Copyright 2016 by Data Blueprint
19
Profile Data
Develop
Business Rules
Define Metrics
Evaluate Data
with Metrics
Remediate
Anomalies
Monitor Health
Discovery
Findings
Review
Findings
Review
20. Copyright 2016 by Data Blueprint
20
What to Include?
Data Quality team should work to define the specific data elements and their
encompassing source systems which will be included in the analysis.
Focus on Data that Answers Questions
Confirm that the data available in the defined data sources is capable of
answering the questions posed by the project problem statement.
Identifying and Requesting Data
21. Copyright 2016 by Data Blueprint
21
• Allows for a query against live data that can be re-utilized in a
repeatable process.
• Preferred for access to current data.
• Provides greater flexibility of data import options.
• Requires effort from IT team members and may have an
associated cost.
Build a Direct
Database
Connection
• Useful when direct connection is not available.
• Requires knowledgeable analyst for identifying correct
format and uploading.
• Each data load requires a new data extraction effort
Extract Data into
Flat Files
Identifying and Requesting Data
Consider - Staging Area for data preparation
Two Options
22. Copyright 2016 by Data Blueprint
22
An initial profile should be run against the data
without any business rules to confirm a
successful data import.
This profile serves two purposes
• It is a “sense” check, allowing the analyst an overview of the
data to ensure the data was loaded properly.
• It provides an overview against which initial observations can
be made.
Initial Data Profiling and Discovery
23. Copyright 2016 by Data Blueprint
23
Initial Data Profile Output
Uniqueness
• Percentages
• Counts
• Key Fields
Nulls
• Percentages
• Counts
• Key Fields
Min/Max
• Unexpected
Values
• Values
outside
domain
Data Review
at a Glance
24. Copyright 2016 by Data Blueprint
Profile Data
Develop
Business Rules
Define Metrics
Evaluate Data
with Metrics
Remediate
Anomalies
Monitor Health
Discovery
Findings
Review
Findings
Review
25. Copyright 2016 by Data Blueprint
• Data owners
• Business data stewards
• IT data stewards
Conduct a data profiling debrief session
• Purpose of the data profiling exercise
• Scope of the data included in the profile
• Expectations of them to assist in the development and
application of business rules to future profiles.
Communicate to the data owners
Initial Data Profiling and Discovery
Report Findings to Data Owners and Stewards
26. Copyright 2016 by Data Blueprint
• It may be advisable to extract information from
reporting tool results into another format
which can be shared with all members of
the data quality team.
• Excel
• PDF
• Peculiarities of the data profile should be
highlighted for review with the data
owners.
• Any inferences about potential business
rules, as well as questions about
patterns in the data, should be noted.
Initial Data Profiling and Discovery
Collect and Report Information from Profile
27. Copyright 2016 by Data Blueprint
27
Next Steps…
Initial profiling is just the beginning of
the Data Quality Process…
The real benefit is in developing business
rules that can be applied to data in order
to continue the repeatable process and
develop actionable insights.
28. Copyright 2016 by Data Blueprint
28
Profile Data
Develop
Business Rules
Define Metrics
Evaluate Data
with Metrics
Remediate
Anomalies
Monitor Health
Discovery
Findings
Review
Findings
Review
29. Copyright 2016 by Data Blueprint
29
Defining Business Rules and Metrics
Sourcing Business Rules
Possible sources of
business rules
• Master Data Standards
documents
• Subject matter expert
interviews
• Data Stewards, Owners, and
Consumers
• Desktop procedures documents
• Process and system
documentation
What to look for
• Allowable values
• Required fields
• Links between fields
• Fields that link between data
domains
• Potential duplicate records
• Insights into patterns that might
be found in the data profile
30. Copyright 2016 by Data Blueprint
30
Defining Business Rules and Metrics
Example Business Rules
Business Rule Related Business Action Data Quality Check
Tax Identifier is required for all
non-employee vendors.
A W-9 is required before entering a
new vendor into the vendor
management system.
Rule is violated if VendorType <>
‘Employee’ andTax ID is Null
Tax Identifiers should be entered
in the valid format for type of
identification number.
Consistent formatting of tax
identification numbers allows for
higher confidence in searching and
validation.
Rule is violated if tax ID is not in a
valid format for the type, i.e. SSNs
should be 999-99-9999;FEINs should
be 99-9999999
Entities (companies,employees,
products, etc.) should be unique
and duplicates should not be
entered.
Entity names entered into the system
should be entered in a consistent
format to assist with presentation
and elimination of duplicates.
Rule is violated if entity names are
duplicated.
E-mail addresses must be
entered in valid formats.
Complete e-mail addresses should be
entered into the system in order to
ensure valid contact information.
Rule is violated if email address field
is not a valid format (e.g.
user@domain.com)
31. Copyright 2016 by Data Blueprint
31
Defining Business Rules and Metrics
What Makes “Good Metrics”?
Meaningful to the Business
• the score should relate to improved business performance
Measurable
• must be able to be quantified within a discrete range
Controllable
• some action can be taken to change the data and improve the score
Reportable
• should provide enough information to the data steward to take action
Traceable
• must be able to be tracked over time to show improvement efforts
32. Copyright 2016 by Data Blueprint
32
Defining Business Rules and Metrics
Examples of Metrics for Various Dimensions
• Does each value fall within an allowed set of values?
• Does each value conform to the defined level of precision?
Accuracy
• Is data present in required fields?Completeness
• Is the data used the same way across the enterprise?Consistency
• Is the data up to date?Currency
• Are identifying data elements unique?Integrity
• Are data elements stored as assigned data types, e.g. is text
stored in a telephone number field?Conformity
• Do duplicate records exist?Duplication
33. Copyright 2016 by Data Blueprint
33
Profile Data
Develop
Business Rules
Define Metrics
Evaluate Data
with Metrics
Remediate
Anomalies
Monitor Health
Discovery
Findings
Review
Findings
Review
34. Copyright 2016 by Data Blueprint
34
Evaluating Data & Reporting Findings
Re-profile Data with Business Rules and Report Findings
• Definition, refinement, and application of business rules
should be repeated iteratively and reviewed until the
data owners are satisfied with the accuracy and
completeness of the business rule implementation.
• Present all findings to the data owners
and stewards for review.
• The goal of this step is to finalize the
data quality assessment definition such
that an ongoing monitoring process
can be modeled from the activity.
35. Copyright 2016 by Data Blueprint
35
Profile Data
Develop
Business Rules
Define Metrics
Evaluate Data
with Metrics
Remediate
Anomalies
Monitor Health
Discovery
Findings
Review
Findings
Review
36. Copyright 2016 by Data Blueprint
36
Two Routes
Find-and-Fix Process Change
Remediating Anomalies
Corrective Actions
• Leverage the continuous monitoring of data quality reports
to confirm that the data cleansing procedures are effective
Best
Practice
37. Copyright 2016 by Data Blueprint
37
• The costs of poor data quality include:
• human capital expense for manual correction
• revenue lost due to inaccurate information
• regulatory fines from compliance violations
• damage to corporate reputation
Data Stewardship Training - Session 2
37
Business Value from Data Quality
38. Copyright 2016 by Data Blueprint
38
Business Value Calculations
# Errors
Identified
Potential Cost
Avoidance
Business Rule: Customer Address Invalid 84367 92,952.42$
Calculation Description:
Manual effot to research and correct an invalid Customer
Address
Average Salary for worker engaged in correcting address 25,000.00$
Average Salary including benefits 34,375.00$
Salary per hour 16.53$
Salary per minute 0.28$
# minutes to correct an invalid address 4
Cost of manual effort to research and correct one address: 1.10$
39. Copyright 2016 by Data Blueprint
39
• State the issue (e.g. duplicate vendor records are causing issues with payments)
• Ask “Why?” five times
Remediating Anomalies
Five Whys for Root Cause (Danette McGilvray)
• New master records are created instead of
using existing ones.
Why are there
duplicate records?
• The reps don’t want to search for existing
records.
Why do they create
new duplicate records?
• Search takes too long.
Why don’t they want to
search for existing records?
• Reps have not been trained in proper search
techniques, system performance is poor.
Why is the search
time too long?
• Reps are measured by how quickly they can
create a new master record and they don’t see
the implications of duplicate data downstream.
Why is long search
time a problem?
40. Copyright 2016 by Data Blueprint
40
Profile Data
Develop
Business Rules
Define Metrics
Evaluate Data
with Metrics
Remediate
Anomalies
Monitor Health
Discovery
Findings
Review
Findings
Review
41. Copyright 2016 by Data Blueprint
41
Monitoring
At the Enterprise Level
Customer Product Supplier
Open Deferred Remediated Total Issues Open Deferred Remediated Total Issues Open Deferred Remediated Total Issues
156
48
97
11
225
140
19
66
145
43
90
12
Data Quality Issues by Domain as of 1-31-2015
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115
# Open Issues
Customer
Product
Supplier
26 Critical Issues
26.80% of Open
31 Critical Issues
46.97% of Open
5 Critical Issues
11.63% of Open
Critical Data Quality Issues
Total Data Quality Issues open more than 30 days: 92
Total Data Quality Issues open more than 60 days: 31
Total Data Quality Issue open more than 90 days: 17
Open
Deferred
Remediated
Total Issues
42. Copyright 2016 by Data Blueprint
42
Establish Process to
Consume Artifacts from
Data Profiling
Take corrective
measures to improve the
data quality
Verify through
monitoring that
improvements were
implemented by either
data cleansing, controls at
the root cause, or a
combination of both.
The data stewards should
understand how to
interpret the metrics,
including what is being
measured and why.
Monitoring can be
costly so it should focus
primarily on those
processes that are
essential to the business.
Monitoring
Monitoring by Data Stewards
43. Copyright 2016 by Data Blueprint
Data Governance & Stewardship – Maturity Model
Define
Control
Measure
Expand
Optimize
Business
Glossary &
Roles
Data
Standards
DQ
Dashboards
Data Sprints
Continuous
Improvement
Identify & catalog data assets, map to owners & stewards
• Stewards are identifying, defining critical data, publishing
business accredited terms for consumption
Define authorities, control changes
• Data Standards enforced by Stewards & Owners
• Harmonize definitions across functions, Lines of Business,
Geographies
Measuring data quality (DQ)
• Monitor ongoing stewardship operations & data use
• Data Standards implemented for new system
Repeatable data management processes in place
• Expand scope & breath of stewardship program
• Increase volume & efficiency of data it supports.
Iteratively enhance data quality & stewardship performance
• Continuously prioritize & act upon enhancement opportunities
from monitoring & expansion activities.
44. Copyright 2016 by Data Blueprint
• Parts of organization unaware of DG/Stewardship and do
their own thing; inconsistent with DG standard
• Business units may be unaware of benefits and added
value
Risk: Awareness
• Business units refuse to adopt standards put forth
• System constraints make it difficult to implement new
• standards
• Business units do not engage the Global Data Services
team on projects
Risk: Adoption
• Funding model that aligns with governance and
organizational structure (i.e. building data connections to
sources with DQ tool)
• Cost of building and establishing Global Data Services
Risk: Funding
• Stewardship skills are hard to maintain
• Build and sustain capability across a large world-wide
organization
Risk: Training
• Data Governance and Stewardship is a long-term
program, not a one-time project
Risk: Time to Build
• Strong communication plan that is meshed into overall
corporate communications
• Corporate governance and strong sponsorship of
DG/Stewardship
Mitigation: Awareness
• Accountability and approval process by Data Owners and
DG Enterprise Steering Committee
• Document exceptions and work-arounds
• Corporate governance and Architecture Review Board to
align projects with DG/Stewardship
Mitigation: Adoption
• DG & Stewardship funding established
• Cost allocation aligned with DG & Stewardship model
• Project specific costs
Mitigation: Funding
• Partner with Data Architecture, Global Change & Process
Excellence unit to provide a training curriculum
• Define staffing models and career paths that outline
training and align with DG/Stewardship
Mitigation: Training
• Leverage parallel opportunities to accelerate build and
implementation (Master Data, Global KPI reporting, One
ERP road map, One CRM)
• Pilot projects to quickly show tangible benefits
Mitigation: Time to Build