Presented on PHPID Online Learning 35.
Komunitas PHP Indonesia
Title: Enabling Data Governance - The Journey through Data Trust, Ethics, and Quality
Eryk B. Pratama
Global IT & Cybersecurity Advisor
Statistics notes ,it includes mean to index numbers
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
1. 11
ENABLING DATA GOVERNANCE
Eryk B. Pratama
IT Advisory & Cyber Security Consultant at Global Consulting Firm
20 June 2020 | 19:00
PHPID-OL#35
The Journey through Data Trust, Ethics, and Quality
2. About Me
q Global IT Advisory & Cyber Security Professional
q Community Enthusiast
q Blogger / Writer
q Knowledge Hunter
q https://medium.com/@proferyk
q https://www.slideshare.net/proferyk
https://www.linkedin.com/in/erykbudipratama/
You can subscribe to my telegram channel.
§ IT Advisory & Risk (t.me/itadvindonesia)
§ Data Privacy & Protection (t.me/dataprivid)
§ Komunitas Data Privacy & Protection (t.me/dataprotectionid)
5. “ Data is the new oil “
“ Data is the new gold “
6. Data, Information, Knowledge and Wisdom (DIKW) Hierarchy
Introduction
Wisdom
Knowledge
Information
Data
DIKW Hierarchy
Data is the foundation of the pyramid, information is the next layer,
then knowledge, and, finally, wisdom is the apex. DIKW is a model or
construct that has been used widely within Information Science and
Knowledge Management. Some theoreticians in library and
information science have used DIKW to offer an account of logico-
conceptual constructions of interest to them, particularly concepts
relating to knowledge and epistemology.
Source: COBIT 5
7. Data/Information Lifecycle
Introduction
Source: ISACA – Getting Started with Data Governance with COBIT 5
It is important to plan the life cycle of data along with their placement within the governance structure. As practices
operate, the data supporting or underlying them reach the various levels of their natural life cycles. Data is planned,
designed, acquired, used, monitored and disposed of.
Critical information security control
Store | Data at Rest Share | Data in Motion Use | Data in Use
8. Data
Security
Data
Quality
Data
Governance
Data Governance DNA
Introduction
ProcessPeopleTechnology
•Data Policies
•Data Standards
•Business Data Ownership
•Data Workflow
•Data Quality Rules & Policies
•Data Cleansing Standards
•Compliance Rules
•Compliance & Security Policies
•Local, National & International
Laws
•Data Stewards
•Business Data Owners
•Data Mgmt Committee
•Data Administration
•Data Quality Services Team
•Data Governance
•DBAs
•Corporate Security
•Auditors
•Compliance Dept.
•Data Rules Library
•Automated Notifications
(Workflow)
•Data Profiling, Quality &
Monitoring Tools
•ETL Tools
•Audit Reports
•Security Software
•Access Rights Management
•Data Audit Trails
9. Data Governance Organization (Simple)
Introduction
Sponsor
Data Steward
Data OwnerData OwnerData Analyst
Data
Consumer
Data
Consumer
Data
Consumer
Data
Consumer
Technical
Support
Technical
Support
10. Data Governance Organization (Recommended)
Introduction
Corporate Governance
Committee
Data Management Lead
Database
Administrators
Data ArchitectsData Stewards
Business SMEs
Provide corporate governance to strategic data management decisions
such as new subject areas, new data sources, new business problems
solved with the data management, and new expenditures for processors
and storage
• Business Data Owner
• Understand data
• Define rules
• Identify errors
• Set thresholds for acceptable
levels of data quality
Oversee the organizations’ data stewardship, data administration and data
quality programs
Change
Management
• Resolve data integration issues
• Determine data security
• Document data definitions, calculations
and summarizations
• Maintain/update business rules
• Analyze and improve data quality
• Define mandatory data elements that
need to be measured
• Define and review metrics for measuring
data quality
• Translate Business Rules into
Data Models
• Maintain Conceptual, Logical
and Physical Data Models
• Assist in Data Integration
Resolution
• Maintain Metadata Repository
• Generate Physical Database
Schema
• Performance Database Tuning
• Create Database Backups
• Plan for Database Capacity
• Implement Data Security
Requirements
• Provide education on the
importance of data quality to the
company
• Communicate data quality
improvements to all employees
13. … trigger the rise of Four Anchors to make analytics more trusted
Data Trust
Does it perform as intended?Are the inputs and the
development process high
quality?
Is its use considered
acceptable?
Is its long term operation
optimised?
Percentage of respondents who reported being very confident in their D&A insights
14. Four Dimension of Data Trust
Data Trust
Quality
Effectiveness
Organizations need to ensure that the input and output can be in accordance with the context in the information /
insight will be used.
Effectiveness in this case is the extent to the output can achieve the expected results and provide value to the decision
makers who use the information.
Integrity
In this context, integrity refers to the use of data that is ethical and acceptable to related parties and complies with
existing regulations (for example data privacy).
In this context, resilience means how to ensure that the data source and output can be optimized for the long term.
Resilience
Does it perform as intended?Are the inputs and the development
process high quality?
Is its use considered acceptable? Is its long term operation
optimised?
15. Data sourcing is the key trust in stage of the analytics lifecycle
Data Trust
18. Ethics in Data Processing
Data Ethics
In the context of personal data, data represent the characteristics of individuals that can later be used
to determine decisions that can affect the life of the individual. For example health data / medical
records. What is the impact if a medical record is leaked? Unauthorized and irresponsible people can
exploit it for financial needs, for example by selling medical records to companies that need the data.
Impact on
People
Abuse
Potential
The
economic
value of data
Misuse of data can have a negative impact on individuals. For example when we register a credit card at
the mall. Mostly, there will be offers from either other credit card providers or other advertisements
that we would ask from where or whom this sales person obtain our number. Another example is the
leak of permanent voter list (which the KPU said that those data indeed opened for public). What can
you do with that data? We can sell those data to certain parties. For criminals, this information can be
used for fraud activities.
Proper data processing will provide economic value. The ethics of the data owner can determine how
this value is obtained and who may take economic value from the data.
19. Ethical Decision Point
Data Ethics
Source: https://www.accenture.com/us-en/blogs/blogs-new-data-ethics-guidelines-organizations-digital-trust
20. Implementation of Data Ethics
Data Ethics
Vision
Vision really determines the direction / goals of the organization. In this context, the organization
needs to determine what ethical data usage is in the organization. The vision can be adopted from
data ethics principles chosen by Management.
Strategy
Strategies are arranged to achieve the vision. In this case, organizations need to develop strategies
so that data ethics can be applied and carried out consistently as part of the organization's culture.
Governance
To "force" stakeholders to carry out data ethics practices, organizations need to develop effective
policies and procedures and ensure that each related party has clearly defined responsibilities.
Infrastructure & Architecture
Managing complex data (especially for large organizations) will certainly be easier and integrated if
the organization has visibility of all data and is outlined in architecture (for example Enterprise
Architecture) and supported by systems and infrastructure that are qualified and reliable.
Data Insight
The use of insight to support clear and accurate data results is certainly very necessary. Use of tools
(such as dashboards) can help organizations monitor and provide early warnings of potential ethical
data violations.
Training & Development
People are the main factor in the context of data ethics. Organizations need to conduct training
related to ethics in the use (and misuse) of data. Of course this can be done when the organization
conducts socialization or training related to Data Privacy and Personal Data Protection, because data
ethics is attached to both
https://medium.com/@proferyk
Source: https://home.kpmg/pl/en/home/insights/2018/01/report-building-trust-in-analytics.html
21. RUU Perlindungan Data Pribadi
Data Ethics
Key Highlight
§ Explicit Consent is required from the data owner for
personal data processing.
§ Responding timelines for Data subject rights have been
separately called out in the RUU PDP.
§ Data controller to notify the data owner and the Minister
within 3 days of data breach.
§ Penalties for non-compliance may range from Rp 20 Billion
to Rp 70 Billion or Imprisonment ranging from 2 to 7 years
Data Owner Data Controller Data Processor Data Protection Officer
22. Data Privacy Framework
Data Ethics
NIST Privacy KPMG PrivacyISO/IEC 27701
§ Information Lifecycle
Management
§ Governance and Operating
Model
§ Inventory/Data Mapping
§ Regulatory Management
§ Risk and Control
§ Policies
§ Processes, Procedures and
Technology
§ Security for Privacy
§ Third Party Oversight
§ Training and Awareness
§ Monitoring
§ Incident Management
§ Inventory and Mapping
§ Data Processing Ecosystem Risk
Management
§ Governance Policies, Processes,
and Procedures
§ Awareness and Training
§ Monitoring and Review
§ Data Processing Management
§ Communication
§ Data Security
§ Protective Technology
§ Detection Processes
§ Respond Processes
§ Recovery Processes
§ Conditions for collection and
processing
§ Obligations to PII principals
§ Privacy by design and privacy by
default
§ PII sharing, transfer, and
disclosure
§ PIMS-specific requirements
related to ISO/IEC 27001
§ PIMS-specific requirements
related to ISO/IEC 27002
§ Additional ISO/IEC 27002
guidance for PII controllers
§ Additional ISO/IEC 27002
guidance for PII processors
Data Privacy Framework
Further Discussion
§ Data Privacy & Protection News (t.me/dataprivid)
§ Komunitas Data Privacy & Protection (t.me/dataprotectionid)
24. Common Questions
Data Quality
qHow can poor quality data impact our decisions?
qHow can we decrease costs connected with poor
quality data?
qHow can we improve the success of data related
projects?
qWhich approach should we choose to
continuously improve data?
qHow can we set up efficient and sustainable data
governance?
Source: ISACA – Getting Started with Data Governance with COBIT 5
Does our data fit the purpose we use it for?
25. Basic Data Quality Criteria
Data Quality
Accuracy Completeness
Consistency Timeliness
Accuracy Completeness
Consistency
26. Purpose of Data Quality Management
Data Quality
Develop an approach that managed appropriately to make data "fit for
purpose" based on the needs of customer data.
Define standards, needs, and specifications for quality control
purposes as part of the data lifecycle.
Define and implement processes to measure, monitor, and report
levels of data quality.
Identify opportunities to improve data quality through improved
processes and systems.
27. Data Quality Assessment
Data Quality
Some recommended steps are as follows.
1. Define the purpose of the assessment.
2. Identifying data to be assessed; focus on small data first or on specific problems.
3. Identify data usage and who will use the data.
4. Identify risks from data to be assessed, including their impact on business processes.
5. Check data in accordance with predetermined rules.
6. Document the issues found.
7. Conduct further analysis to quantify findings, prioritize issues based on business impact, and
develop hypotheses for root causes of the issues found.
8. Meet with Data Stewards / Owners, Subject Matter Experts, and data users to confirm issues and
priorities for improvement.
9. Use the assessment findings to improvement of data quality management processes.
30. Data Quality Check (example)
Data Quality
To properly analyze certain data set, a look how it is stored and what are the examples of the records inside it
should be always taken. It helps to understand the provided unprocessed information and gives an idea what
are the characteristics of an ideal record plus it’s meaning.
Basic Checks
fill percentage
number of
duplicates
distinct and
unique values
31. Basic Data Quality Checks
Data Quality
As we can see on the above graph values of the data can be divided into several categories:
§ Empty records or the one with "Null“ value are grouped into “Null“ category
§ The other ones (”Not Null”) can be divided further into duplicates and distinct content.
If the percentage of the missing data is too big, we needs to take into account replacing them by specific
numbers or exclude the whole column/attribute from the set.
Duplicate repeated more than once in the given data.
Distinct non-null values that are different from each other.
Unique values that have no duplicates.
Non-
Unique
number of values that have at
least one duplicate in the list.
ILLUSTRATIVES
32. Case Example
Data Quality
Having given below data, a fast quality check can be done. Let’s assume that it is an analysis of phone numbers of 20 clients.
Having an output for “Phone” column:
§ We can see that out of 20 samples all of the records are non-empty and 1 of
them is a duplicate.
§ Taking into account that phone numbers have unique values that probably
means that two of our clients share the same number or the same client is
in the system twice (e.g. with a typo in his name).
§ The minimum value consists only zeros, therefore here we can also
distinguish wrongly typed phone number.
33. Case Example
Data Quality
The minimum value consists only zeros, therefore here we can also distinguish wrongly typed phone number.
34. Case Example
Data Quality
From the below shown mask analysis it might be observed that there are two records which are not in the correct format (9
digit number). First one has only 3 digits and the second one has a letter on the 6th place.
The quantile analysis in this case doesn’t bring much useful information, since the phone numbers are not scalable.