2. Background
• Overlapping and (few) common dataset(s) are required in various business areas
processes and systems like geographic locations, customer lists, business unit
lists, address postal codes, etc.
• Cross systems or departments mismatches can occur if not well integerated and
working in silos
• Data keeps on evolving with time and if not properly checked if may/can cause
disperancies
• One business activity in one corner of business may effect other business wings
as well but if they are not talking to each other there is a problem
3. Background - 2
• Consider the following examples:
• In a food delivery business only the system maintaining the customer records have
postal codes that gets daily synced with the national postal database however other
systems are not updated since inception. Now how the management can have
accurate report for revenue calculation based on postal codes down the year(s)
• A database showing the list of doctors in the city is not integerated with hospitals in
the city. How can the public know that a specific doctor of cardiac is in which
respective hospital
• The finance department in university have its own database which is not integerated
with students record system in academics department. Imagine the problems both
these departments shall be facing.
• Such disperaties in data can lead to problematic decision making, reduntant and
manual work, incorrect reporting and misleading figures
4. What is Master Data Management
• Managing shared data to meet organizational
goals, reduce risks associated with data
redundancy, ensure higher quality and
reduce the costs of data integration
• Master data represents core entities and
domains that are integral for managing and
running business processes and systems
• The concept of master data management is
about aliging the processes that create and
maintain that information with consistent
and shared relation
• It provides single view of same information
which is available to and from multiple
sources
5. What is Master Data Management - 2
• The goals include ensuring availability of accurate,
current values while reducing risks associated with
ambiguous identifiers
• This data can be shared across multiple deployments
line of businesses processes or systems that require a
consistent and accurate information
• Master Data can be unique, when referenced by
other data master data rarely changes.
• If changes do happen they need to be propagated as
well
6. Master Data Management and Data
Governance
• Master data management requires Data Governance
• Data Governance is the creation of rules, the
execution of those rules, and the adjudication of
any violation of the rules
• Master data is a type of data that describes
subjects related to the ‘who,’ ‘what,’ and ‘where’
in business transactions communications, and
events
• The rules created within Data Governance ensure
quality and privacy of the master data “because
the concepts of MDM and Data Governance are
labeled differently, they’re often thought of as
mutually exclusive, but they’re not
7. Master Data vs Metadata Management
Metadata is a strategy to
organize contextual information
about data from various tools
and systems across the modern
data stack, on the other hand
Master Data is a business function
to identify, create, and manage
master data in an organization
Master Data build a single
source of truth for business
critical data, metadata adds
context and meaning to data
There is no Master Data
Management without Metadata of
underlying business
applications/systems
Master Data Management
occurs at enterprise level vs
Metadata Management occurs
at application level
8. Business Drivers
• Following are common drivers for a business to invest into
this activity:
• Reducing redundant data sets and saving costs attached
with collection and maintaincee of those data sets
indiviually
• Defining the base registry and most accurate data source
in case of duplication
• Managing data quality
• Cutting down the costs of data integration
• Reduce the cost and risk for data sharing architecture
9. Goals & Principles
• Business should have the following goals in scope to achieve for
Master Data Management program:
• Access to 3Cs data. (Consistent, Current & Complete)
• Enterprise wide shareable data
• Refined data standards that reduces the cost of data
integeration enterprise wide
10. Goals & Principles - 2
• To achieve the above goals following principles shall be followed:
• The ownership of Master Data is of organization not of one
team or department
• Regular data quality monitoring and activity is required
• Data stewards should be empowered to monitor data related
activities and data quality
• A change controlled system should be there to track and
monitor critical changes in data
• A propery base registry for all duplicate data sets should be
defined and followed
11. Process & Activities
• Data Model Management
• A clear and consistent logical data definitions
• A comprehensive and centralised data dictionary
• Source systems and associated data values must be mapped
clearly
• Data Acquisition
• Defining a process for existing source systems and
integeration of any new system in the organization should
have a reliable and repeatable process
• Execute intial data profiling to perform data quality
assessments
• Assess cost of integeration of data source
• Impact assesmment on current data rules
• Finalize DW metrics for new data source
• Integeration of new data source with Master Data
Management platform
12. Process & Activities
• Cleanse, Standardize & Enrich
• A three step phase in which first acquired data is cleansed
then standardized as per the defined codes, format or fields.
Lastly data is enriched that can help to resolve identity
issues
• Match and Merge
• Once the data is cleansed and enriched the attirbutes are
matched and merged as per the business rules
• There are multiple techniques to perform these activities
indiviually
• Unfiy and Data Sharing
• The role for data stewards to make sure that data is properly
populated
• Data is unified as per the business rules and repository
quality is not compromised
13. Matching Technqiues
• Matching / Record Linkage: Records are grouped together based on similar values in particular fields through
exact matching or fuzzy logic for matching strategies
• Black Box vs Business Rule: In a black box technique some pre built rules are defined and attributes are
mapped on this rules to determine the output. The other process can be is to define business rules and logic for the
matching and classifying the attributes
• De-duplication techniques
• Deterministic vs Probablistic
• Rule-based vs Score-based
• Symmetric union vs Hierarchical
• Original-to-orignal vs Original-to-Master
14. Merging Techniques
• Merging/Unification
• Select the best fit information at field or record level
• Represent Goldern Record
• Track changes to incoming or outgoing golden record information as well
• Manual Match & Merge
• Data stewards may have to do some work manually
• Based on specific business rules and criteria this process is done manually
15. Implementation Styles
• Centralised
• Data is in source systems and MDM repo and it is
updated synchronously to and from
• Matches and physically stores the up-to-date
consolidated view of master data
• Central authoring of master data
• Consolidation
• Data is acquired from source system(s), land into
MDM repository but not updated back
• Matches and physically stores a consolidated view
of master data
• Good for reporting, analysis and centrel reference
• Picks up the golden record concept
16. Implementation Styles - 2
• Registry
• Only keep pointers to where the (base) data is
• On request it is fetched and processed
• Data physically is not sent back, but it is
cleansed and processed in MDM and assume that
quality data is available in source system
• Coexistence
• An offline mechanism is implemented to update
the source system(s) from MDM repo
• It also like consolidation supports the golden
record concept
• Expensive than consolidation to implement as
source system(s) are also being implemented
17. Success Metrics
• Following metrics shall be tied to support this activity:
• Dashboard showing the data quality and confidence % of data for
key attributes for a specific business domain/entity
• Trackable data changing activities. This will also help to identify
frequently changing attributes and highlight the risk against
those atrributes
• Capturing of data lineage
• Data stewards ownership and responsibility
• Long run total cost of ownership of the process