The document discusses implementing data governance and stewardship programs at universities. It provides examples of programs at Stanford University, George Washington University, and in the Flanders region of Belgium. The key aspects covered are:
- Establishing a data governance framework with roles, processes, asset definitions. and oversight council.
- Implementing data stewardship activities like data quality management, metadata development, and reference data management.
- Stanford's program established foundations for institutional research through data quality and context definitions.
- George Washington runs a centralized program managed by the IT governance office.
- The Flanders program provides research information and services across universities through consistent definitions, roles and collaborative workflows.
Automating Data Governance in Decentralized Environments
1. The Data Driven University
Automating Data Governance & Stewardship in
Autonomous & Decentralized Environments
Pieter De Leenheer, PhD
Cofounder and VP Innovation
2. What we talk about when we talk about
no Data Governance
Who approved this?
I wish these guys
spoke our
language
I can’t understand
this report !
I’ve never seen this
funding code! Who
introduced this ?
Are we sure this
definition of
‘professor’ is correct
?
The Problem
This rule is
different on our
campus!
Are we allowed to share this
student data with IR?
3. Glossary Search
• How frequently do you look up a word for your
business?
• To what purpose?
Clarification
Differentiation
• What are your main sources?
• Hierarchy-based navigation or key-word based
search?
• Authoritative Truth or trust?
4. Overview
• Data Governance Operating Framework
Data Governance
Data Stewardship
Data Management
• Implementations
Stanford University Data Stewardship (SUDS)
George Washington University
Brigham Young University
• The Bigger Picture
Inter-university Data Governance in
the Flanders Research Information Space
5. Data Governance Framework
Data Governance Council: Governance Operating Model
Roles &
Responsibilities
Processes &
Workflow
Asset Types &
Traceability
Data Governance
Organization
Data Stewardship Activities
Data Quality
Development
IT / Operational Data Management Activities
Data
Modeling
Metadata
Lineage
Establishes& drives
Aligns& Coordinates
Reports& Escalates
Monitors& Remediates
Metadata
Scanning
Reference Data
Authoring
Data
Integration
Collibra Business
Semantics Glossary (BSG)
Collibra Reference Data
Accelerator (RDA)
Hierarchy
Management
Business &
Data Definitions
Business
Traceability
Semantic
Modeling
Mapping
Specifications
Policy
Management
Business
Rules
Data Quality
Rules
Data Quality
Reporting
Issue
Management
Reference Data
Crosswalks
Master Data
Stewardship
Data Quality Profiling
DQ Defect
Resolution
Collibra Data Stewardship
Manager (DSM)
Collibra Platform
Other Data Management
Vendor products
...
https://compass.collibra.com/display/COOK/Data+Governance+Operating+Model
6. Stanford University Data Stewardship
(SUDS)
• All Materials available here
dg.stanford.edu
• Establish foundation for
Institutional Research
• Data Quality
How many faculty do we have?
• Context and Meaning
What does faculty mean in which
context?
How is faculty data structured and
where is it stored?
• Data Usage Request
Am I allowed to use faculty or student
name and age for external reporting?
7. SUDS: Approach
• Decentralized
1 DG coordinator (also show vacancy)
Project staff
cross-functional working groups : natural scope
and resources
focus on BI reporting, with input from above
projects
sign off by DG coordinator and end user through
usage (full cycle)
• Step-by step; success by success
9. DG Operating Model
• What do we want to capture?
Asset Type: Business Terms, Policies, Rules, Code
Values
Attribute/Relation Type: Name, Definition, Example,
Derivations, Specializations
• Who should be involved in this process?
Communities: Finance, HR, Student, Research
Domains / subject areas: Task Management
Users and User groups
• How to execute and Monitor the process?
Key events and workflow chains
Validation rules
10. SUDS Data Dictionary Example
+4000 data elements
Community context: Finance, HR,
Research and Student
Custom attribute types and relation types
11. What attribute- and relation-types do we want to capture?
Out of the box but also custom
attribute types and relation types
12. What attribute- and relation-types do we want to capture?
• https://stanford.app.box.com/CollibraQuickReference
• https://stanford.box.com/UsingCollibraFields
13. Who is involved in the
process?
• https://compass.collibra.com/display/COOK/Role+Ty
pes
ResponsibleAccountable Informed Consulted
16. How to execute and monitor?
From Best Practice to Auto-Validation Rules
http://web.stanford.edu/dept/pres-provost/cgi-bin/dg/wordpress/?p=577
(generic example – not from SUDS)
17. How to execute and monitor?
• Status Types and Workflows
E.g., For Domains, Terms, Users, and later for Issues and Data Sharing
Agreements, we first define a “finite state machine” and then a set of
workflows that each define a transition between states. This means
workflows can trigger each other and form a complex chain.
BUSINESS SEMANTICS GLOSSARY
Candidate In Progress
Under Review
Accepted In Revision
Rejected
Term requested on
the domain page
11
1
2
2
3
3
2
3
Depricated
4
5
Workflows
1
2
Propose Business Term
Edit Business Term
3 Onboarding Business Term
4 Deprecate Business Term
5 Reactivate Business Term
18. How it it to be governed? Onboarding Workflow
(Not Stanford content - illustrative example only)
19. How it it to be governed? Approval Workflow
(not Stanford content - illustrative example only)
20. Stanford DG Program Key Results
(from http://web.stanford.edu/dept/pres-provost/cgi-bin/dg/wordpress/wp-content/uploads/2014/11/Stanford_DS_CAIR_v2.pdf
• Understand data from multiple
perspectives
• Central repository of verified information
(and better data infrastructure)
• Easier access to information; less reliance
on ‘oral tradition’
• Improved data quality, consistency
• Increased understanding; thoughtful
decision-making around data
21. SUDS Future Directions
• Continue building engagement around
data governance (define policy), in
addition to data stewardship (enforce
policy)
• Continue building engagement, especially
by executive-level leadership
• Continue increasing visibility and
consumption of definitions and other
metadata
22. George Washington University
(by courtesy of Ron Layne, GWU)
• centralized
• run by the DG Office division of IT
• mapping data dictionaries, rules and metrics and data sharing
agreements
• Integration with Informatica Data Quality
23. Flanders Research Information Space
• Providing Scientific Research Information and
Services
• Easy
• Transparent
• Open
• Timely
• Unambiguous
• Supported by Data Governance
• Qualitative meta data: e.g., definition for
project, funding codes, mappings,
classifications, etc.
• Roles and responsibilities for Information
Providers and Stiweto
• Collaborative workflows between Information
Providers and Stiweto
By courtesy of G. Van Grootel, EWI
25. The Data providers landscape
25
Universities
Research Institutes
Funders
Others
Strategic Research
Centers
Universitiy Colleges
By courtesy of G. Van Grootel, EWI
27. Traceability diagram
Node Description
JRC (Joint Research Centre) The Business Term representing the
Funding Source
Zevende Kader Programma.. The Business Term representin the
parent Funding Source
3723 Generation 1 Funding Code Value
258 Generation 2 Funding Code Value
G3 The Funding Stream Code Value
By courtesy of G. Van Grootel, EW
28. Conclusions
• Case by Case, success by success
• Identify key events and design workflow
‘chains’ to automate governance
• To support your specific use case and the
growing DG platform you need extend
asset, relation, attribute types
• Collaboration and business user
friendliness
• BOK http://compass.collibra.com
29. Questions For Audience
• How much % of data user need to look up
the definition of a term?
• How many % wants to know where data
around a term is stored.
• How many business terms do you have?
• Who is in charge for data quality /
governance ?
• How much % of data definition decisions
depends on business?
Editor's Notes
Audience from various academic institutions
before collibra I was a researcher and assisttant prof in 3 univ
In fact collibra is a university spinoff valorising the research on data govenance, ontologies and semantics web.
We should know how universities tick.
From my own experience as an employee and as vendor I think I know how university departments thrive as decentralized and autonomous entities.
No data governance does not mean data quality can be managed good. It us just that globalization and increased data servicing between university entities that makes quality and truth of data relative, and we more and more have to rely on mutual trust.
Put yourself in the user of your data. An consider these questions ?
Stewardship activities doing: scattered, no unified operatnig model and no clear sight on the results of doing it.
Een goede cae om dit te illustreren is de Vlaamse departement van Economie, Wetenschap en Innovatie.
Logo’s van alle Hogescholen post fusie
Illustrates the implemented FRIS Metamodel in the DGC operational model.
Allows for formal named relations between the different FRIS Business leveling model concepts