SlideShare a Scribd company logo
1 of 30
AN INTRODUCTION TO
DATA QUALITY SERVICES
koen verbeeck
BI consultant
WHO AM I
• BI consultant @ Ordina

• member of SQLUG.be

• MCTS, MCITP in SQL Server 2008

• working with Microsoft BI for over 2 years

• beer and comic books enthusiast

• married with children…
INTRODUCTION

data quality?
          Data are of high quality "if they are fit for their intended uses in
          operations, decision making and planning" (J. M. Juran).
          - Wikipedia on Data Quality


• achieved through people, technology & processes
• can be measured with various dimensions
  •   accuracy
  •   consistency
  •   completeness
  •   duplicates (uniqueness)
  •   timeliness
  •   validness
• bad data = bad business
INTRODUCTION


Data Quality   Issue                               Sample Data Problem

Standard       Are data elements consistently      Gender code = M, F, U in one system and Gender
               defined and understood?             code = 0, 1, 2 in another system


Complete       Is all necessary data present ?     20% of customers’ last name is blank,
                                                   50% of zip-codes are 99999

Accurate       Does the data accurately            A supplier is listed as ‘Active’ but went out of
               represent reality or a verifiable   business six years ago
               source?

Valid          Do data values fall within          Temperature recordings should be between
               acceptable ranges?                  -100°C and +100°C

Unique         Data appears several times          Prince, The Artist formerly known as Prince, The
                                                   Artist, … are they the same person?
INTRODUCTION

Monitoring                                           Cleansing
Tracking and monitoring                              Amend, remove or enrich
the state of Quality                                 data that is incorrect or
activities and Quality                               incomplete. This includes
of Data                                              correction, standardization
                                                     and enrichment.

                            Monitoring   Cleansing




                             Profiling   Matching
Profiling
                                                     Matching
Analysis of the data
                                                     Identifying, linking or
source to provide insight
                                                     merging related entries
into the quality of the
                                                     within or across sets of data.
data and help to identify
data quality issues.
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
OVERVIEW OF DQS




      Data Quality Services (DQS) is a
Knowledge-Driven data quality solution,
enabling IT Pros and data stewards to easily
     improve the quality of their data
OVERVIEW OF DQS

Knowledge-
              Based on a Data Quality Knowledge Base (DQKB)
  Driven


 Semantics    Data Domains capture the semantics of your data


Knowledge
              Acquires additional knowledge the more you use it
 Discovery

 Open and     Support use of user-generated knowledge and IP
 Extendible   by 3rd party reference data providers


              Compelling user experience designed for increased
Easy to use   productivity
OVERVIEW OF DQS
• easy installation
  • pre-installation checks
    o SQL Server 2012 database engine (server)
    o .NET 4.0 & IE 6.0 or higher (client)


  • installation of DQS using SQL Server set-up




  • post-installation tasks
    o run DQSInstaller.exe
    o grant DQS roles to users
    o enable TCP/IP
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
BUILDING A KNOWLEDGE BASE

                                                    Knowledge
                                                    Management
Build           Discover / Explore Data / Connect




 Integrated                                     Knowledge
 Profiling
                                                  Base


Use
                                                     DQ Projects
BUILDING A KNOWLEDGE BASE



                  Values



                                                        Composite
                                                         Domains
               Domains
               Represent
 3rd party   the data type
Reference
   Data                                  Domains   Knowledge
                              Rules &                Base
                             Relations


                                                           Matching
                                                            Policy
DEMO
• our first knowledge base
Z85HVQ4
BUILDING A KNOWLEDGE BASE
• iterative process
• knowledge discovery
  • gather knowledge from
    o Excel
    o SQL Server


  • profiling of data
    o not the same as SSIS profiling task!


  • automatically detects anomalies
BUILDING A KNOWLEDGE BASE
• domain management
  • knowledge about fields is kept in domains

  • data steward can
    o   create rules
    o   assign synonyms and corrections
    o   create term based relations (str.  street)
    o   link domains together into
        composite domains


  • import knowledge from
    o reference data (e.g. Azure Marketplace)
    o other knowledge bases
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
DATA CLEANSING & MATCHING
• cleansing                                              •   St. --> street (corrected)
  • why?                                                 •   Microsot --> Microsoft (corrected)
   o identifies incomplete or incorrect data             •   john.doe@hotmail (invalid)
   o standardizes and enriches data by using             •   0472/34672 (invalid)
     domain values, domain rules and reference data
                                                         •   Verbeek --> Verbeeck (suggested)

  • DQS cleansing
   o create a knowledge base or select an existing one
   o create a data quality project
   o 2-step process
     – computer assisted cleansing
     – interactive cleansing
   o export results
DATA CLEANSING & MATCHING
• matching                                            •   Prince
                                                           •    The Artist Formerly Known
 • why?                                                    •
                                                                As Prince
                                                                The Artist
   o identify duplicates with the data source
                                                           •
   o create consolidated view of data

                                                      •   Jon Doe, High Street 13, NY,
 • DQS matching                                           doe@gmail.com
   o build a matching policy in KB                        John Doe, High Str, NY,
   o matching training                                    doe@gmail.com
   o create matching project
   o choose survivors
                          DQ Client – Match Results
DEMO
• cleanse data
• use a matching policy to find
  duplicates
DATA CLEANSING & MATCHING
• create a cleansing project
  • uses knowledge gathered in a DQS knowledge base

  • simple user-friendly process

  • profile results
DATA CLEANSING & MATCHING
• create a matching project
  • uses a matching policy created
    in a knowledge base

  • eliminates duplicates

  • profile results

  • the more knowledge that is added the better results will be
    o tip: clean-up the data first using a cleansing project


  • choose survivors at the end

  • export results into .csv
    or SQL Server
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
SSIS INTEGRATION                                  SSIS Data Flow




        Knowledge
        Base
                               SSIS Package
                    Source +   Data correction
 Values/Rules       Mapping     Component        Destination


Reference Data
  Definition
DEMO
• an SSIS cleansing project
SSIS INTEGRATION
• cleaning as a batch process

• only cleaning, matching is (not yet?) possible

• composite domains are supported
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
CONCLUSION



Knowledge-driven              Easy To Use                    Open & Extendible
  Rich Knowledge Base          Focus on productivity and      Focus on cloud-based
  Continuous improvement       user experience                Reference Data
  and knowledge acquisition    Designed for business users    User-generated knowledge
  Build once, reuse for        Out-of-the-box knowledge       Integration with SSIS
  multiple DQ improvements
RESOURCES
• DQS Team Blog @ MSDN
  http://blogs.msdn.com/b/dqs/

• DQS documentation @ MSDN
  http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx

• SQL Server 2012 Resource Center (nice How-To videos)
  http://msdn.microsoft.com/en-us/sqlserver/ff898410.aspx

• DQS Forum @ MSDN
  http://social.msdn.microsoft.com/Forums/en-
  US/sqldataqualityservices/threads

• TechEd presentation about DQS by Elad Ziklik
  http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207
THE END
thanks for watching!

More Related Content

Viewers also liked

CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)Daniel Cai
 
Sql server 2008 interview questions answers
Sql server 2008 interview questions answersSql server 2008 interview questions answers
Sql server 2008 interview questions answersJitendra Gangwar
 
Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Boris Hristov
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online trainingsqlmasters
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 Richie Rump
 
Business Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and ComplianceBusiness Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and ComplianceCapgemini
 
70-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 201270-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 2012siphocha
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Stéphane Fréchette
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobSenturus
 
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiBest MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiInformation Technology
 

Viewers also liked (11)

CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
 
Sql server 2008 interview questions answers
Sql server 2008 interview questions answersSql server 2008 interview questions answers
Sql server 2008 interview questions answers
 
Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online training
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 
Business Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and ComplianceBusiness Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and Compliance
 
70-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 201270-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 2012
 
Good sql server interview_questions
Good sql server interview_questionsGood sql server interview_questions
Good sql server interview_questions
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the Job
 
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiBest MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
 

More from Microsoft TechNet - Belgium and Luxembourg

More from Microsoft TechNet - Belgium and Luxembourg (20)

Windows 10: all you need to know!
Windows 10: all you need to know!Windows 10: all you need to know!
Windows 10: all you need to know!
 
Configuration Manager 2012 – Compliance Settings 101 - Tim de Keukelaere
Configuration Manager 2012 – Compliance Settings 101 - Tim de KeukelaereConfiguration Manager 2012 – Compliance Settings 101 - Tim de Keukelaere
Configuration Manager 2012 – Compliance Settings 101 - Tim de Keukelaere
 
Windows 8.1 a closer look
Windows 8.1 a closer lookWindows 8.1 a closer look
Windows 8.1 a closer look
 
So you’ve successfully installed SCOM… Now what.
So you’ve successfully installed SCOM… Now what.So you’ve successfully installed SCOM… Now what.
So you’ve successfully installed SCOM… Now what.
 
Data Leakage Prevention
Data Leakage PreventionData Leakage Prevention
Data Leakage Prevention
 
Deploying and managing ConfigMgr Clients
Deploying and managing ConfigMgr ClientsDeploying and managing ConfigMgr Clients
Deploying and managing ConfigMgr Clients
 
Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?
Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?
Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?
 
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware UpdatingHands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
 
SCEP 2012 inside SCCM 2012
SCEP 2012 inside SCCM 2012SCEP 2012 inside SCCM 2012
SCEP 2012 inside SCCM 2012
 
Jump start your application monitoring with APM
Jump start your application monitoring with APMJump start your application monitoring with APM
Jump start your application monitoring with APM
 
What’s new in Lync Server 2013: Persistent Chat
What’s new in Lync Server 2013: Persistent ChatWhat’s new in Lync Server 2013: Persistent Chat
What’s new in Lync Server 2013: Persistent Chat
 
What's new for Lync 2013 Clients & Devices
What's new for Lync 2013 Clients & DevicesWhat's new for Lync 2013 Clients & Devices
What's new for Lync 2013 Clients & Devices
 
Office 365 ProPlus: Click-to-run deployment and management
Office 365 ProPlus: Click-to-run deployment and managementOffice 365 ProPlus: Click-to-run deployment and management
Office 365 ProPlus: Click-to-run deployment and management
 
Office 365 Identity Management options
Office 365 Identity Management options Office 365 Identity Management options
Office 365 Identity Management options
 
SharePoint Installation and Upgrade: Untangling Your Options
SharePoint Installation and Upgrade: Untangling Your Options SharePoint Installation and Upgrade: Untangling Your Options
SharePoint Installation and Upgrade: Untangling Your Options
 
The application model in real life
The application model in real lifeThe application model in real life
The application model in real life
 
Microsoft private cloud with Cisco and Netapp - Flexpod solution
Microsoft private cloud with Cisco and Netapp -  Flexpod solutionMicrosoft private cloud with Cisco and Netapp -  Flexpod solution
Microsoft private cloud with Cisco and Netapp - Flexpod solution
 
Managing Windows RT devices in the Enterprise
Managing Windows RT devices in the Enterprise Managing Windows RT devices in the Enterprise
Managing Windows RT devices in the Enterprise
 
Moving from Device Centric to a User Centric Management
Moving from Device Centric to a User Centric Management Moving from Device Centric to a User Centric Management
Moving from Device Centric to a User Centric Management
 
Network Management in System Center 2012 SP1 - VMM
Network Management in System Center 2012  SP1 - VMM Network Management in System Center 2012  SP1 - VMM
Network Management in System Center 2012 SP1 - VMM
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

An introduction to Data Quality Services (DQS)

  • 1. AN INTRODUCTION TO DATA QUALITY SERVICES koen verbeeck BI consultant
  • 2. WHO AM I • BI consultant @ Ordina • member of SQLUG.be • MCTS, MCITP in SQL Server 2008 • working with Microsoft BI for over 2 years • beer and comic books enthusiast • married with children…
  • 3. INTRODUCTION data quality? Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). - Wikipedia on Data Quality • achieved through people, technology & processes • can be measured with various dimensions • accuracy • consistency • completeness • duplicates (uniqueness) • timeliness • validness • bad data = bad business
  • 4. INTRODUCTION Data Quality Issue Sample Data Problem Standard Are data elements consistently Gender code = M, F, U in one system and Gender defined and understood? code = 0, 1, 2 in another system Complete Is all necessary data present ? 20% of customers’ last name is blank, 50% of zip-codes are 99999 Accurate Does the data accurately A supplier is listed as ‘Active’ but went out of represent reality or a verifiable business six years ago source? Valid Do data values fall within Temperature recordings should be between acceptable ranges? -100°C and +100°C Unique Data appears several times Prince, The Artist formerly known as Prince, The Artist, … are they the same person?
  • 5. INTRODUCTION Monitoring Cleansing Tracking and monitoring Amend, remove or enrich the state of Quality data that is incorrect or activities and Quality incomplete. This includes of Data correction, standardization and enrichment. Monitoring Cleansing Profiling Matching Profiling Matching Analysis of the data Identifying, linking or source to provide insight merging related entries into the quality of the within or across sets of data. data and help to identify data quality issues.
  • 6. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 7. OVERVIEW OF DQS Data Quality Services (DQS) is a Knowledge-Driven data quality solution, enabling IT Pros and data stewards to easily improve the quality of their data
  • 8. OVERVIEW OF DQS Knowledge- Based on a Data Quality Knowledge Base (DQKB) Driven Semantics Data Domains capture the semantics of your data Knowledge Acquires additional knowledge the more you use it Discovery Open and Support use of user-generated knowledge and IP Extendible by 3rd party reference data providers Compelling user experience designed for increased Easy to use productivity
  • 9. OVERVIEW OF DQS • easy installation • pre-installation checks o SQL Server 2012 database engine (server) o .NET 4.0 & IE 6.0 or higher (client) • installation of DQS using SQL Server set-up • post-installation tasks o run DQSInstaller.exe o grant DQS roles to users o enable TCP/IP
  • 10. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 11. BUILDING A KNOWLEDGE BASE Knowledge Management Build Discover / Explore Data / Connect Integrated Knowledge Profiling Base Use DQ Projects
  • 12. BUILDING A KNOWLEDGE BASE Values Composite Domains Domains Represent 3rd party the data type Reference Data Domains Knowledge Rules & Base Relations Matching Policy
  • 13. DEMO • our first knowledge base
  • 15. BUILDING A KNOWLEDGE BASE • iterative process • knowledge discovery • gather knowledge from o Excel o SQL Server • profiling of data o not the same as SSIS profiling task! • automatically detects anomalies
  • 16. BUILDING A KNOWLEDGE BASE • domain management • knowledge about fields is kept in domains • data steward can o create rules o assign synonyms and corrections o create term based relations (str.  street) o link domains together into composite domains • import knowledge from o reference data (e.g. Azure Marketplace) o other knowledge bases
  • 17. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 18. DATA CLEANSING & MATCHING • cleansing • St. --> street (corrected) • why? • Microsot --> Microsoft (corrected) o identifies incomplete or incorrect data • john.doe@hotmail (invalid) o standardizes and enriches data by using • 0472/34672 (invalid) domain values, domain rules and reference data • Verbeek --> Verbeeck (suggested) • DQS cleansing o create a knowledge base or select an existing one o create a data quality project o 2-step process – computer assisted cleansing – interactive cleansing o export results
  • 19. DATA CLEANSING & MATCHING • matching • Prince • The Artist Formerly Known • why? • As Prince The Artist o identify duplicates with the data source • o create consolidated view of data • Jon Doe, High Street 13, NY, • DQS matching doe@gmail.com o build a matching policy in KB John Doe, High Str, NY, o matching training doe@gmail.com o create matching project o choose survivors DQ Client – Match Results
  • 20. DEMO • cleanse data • use a matching policy to find duplicates
  • 21. DATA CLEANSING & MATCHING • create a cleansing project • uses knowledge gathered in a DQS knowledge base • simple user-friendly process • profile results
  • 22. DATA CLEANSING & MATCHING • create a matching project • uses a matching policy created in a knowledge base • eliminates duplicates • profile results • the more knowledge that is added the better results will be o tip: clean-up the data first using a cleansing project • choose survivors at the end • export results into .csv or SQL Server
  • 23. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 24. SSIS INTEGRATION SSIS Data Flow Knowledge Base SSIS Package Source + Data correction Values/Rules Mapping Component Destination Reference Data Definition
  • 25. DEMO • an SSIS cleansing project
  • 26. SSIS INTEGRATION • cleaning as a batch process • only cleaning, matching is (not yet?) possible • composite domains are supported
  • 27. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 28. CONCLUSION Knowledge-driven Easy To Use Open & Extendible Rich Knowledge Base Focus on productivity and Focus on cloud-based Continuous improvement user experience Reference Data and knowledge acquisition Designed for business users User-generated knowledge Build once, reuse for Out-of-the-box knowledge Integration with SSIS multiple DQ improvements
  • 29. RESOURCES • DQS Team Blog @ MSDN http://blogs.msdn.com/b/dqs/ • DQS documentation @ MSDN http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx • SQL Server 2012 Resource Center (nice How-To videos) http://msdn.microsoft.com/en-us/sqlserver/ff898410.aspx • DQS Forum @ MSDN http://social.msdn.microsoft.com/Forums/en- US/sqldataqualityservices/threads • TechEd presentation about DQS by Elad Ziklik http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207
  • 30. THE END thanks for watching!