SlideShare a Scribd company logo
1 of 11
Download to read offline
Data Masking
- A Developer’s guide
Technical Paper
Value. Accelerated.
Introduction
Plastic money is a part of our life. The more the credit cards, the more
prestigious the person is. How secured are we in this aspect? What if someone
gets into the bank’s database and knows our personal information such as
address, mobile number etc.? Why because, the software developers sometimes
will have the access to look into production database. Confused? That’s where
data scrubbing comes into picture. Have you ever observed that when we pay
the credit card bills thru online, our personal information like card number, email-
id, etc., will be partially scrubbed?
Why is it done? Is it really
necessary? How is it done? Why
do we have to concentrate on
this? This paper focuses mainly
on the necessity to scrub the
data, different ways which are available in achieving it without affecting the cost-
cutting initiatives and technical aspects which need to consider in scrubbing the
data. This paper acts as the guideline for managers who think on a developer’s
perspective, for programmers who want to have more technical knowledge for
doing the data scrubbing tools, and for clients who look towards scope of data
masking from enterprise’s perspective.
Concept of Data Masking
Different people have different perception of data masking. Some call it as
cleaning of the so called ‘JUNK data’ from the database, Some call it as “Data
scrubbing” and others as ”Data Sanitization”. Some refer to this as correction of
invalid data entered into the database. Data scrubbing here refers to masking
users personal details such as card number, address, contact details etc. with
some variable/static alphanumeric characters. It gives an impression to the user
that their personal information is secure and not compromised.
Value. Accelerated.
Here is an overview of typical banking cycle. The front end involves a customer
support executive and bank customer. A bank customer will access their account
details using banking portal. A customer support executive also has access to a
customer’s personal details. The cloned version of the production database in the
scrubbed format will be available for the developers for carrying their testing.
Scope of Data Masking
Data masking encompasses each phase of a typical SDLC cycle. While
estimating, the effort for scrubbed data preparation and post-scrubbing has to be
taken into account. In testing phase, feasibility of the prepared scrubbed data
should be considered. After implementation, in case of any report generation to
the end users, the data should be thoroughly scrubbed and the report should be
generated.
Value. Accelerated.
Even though there is a lot of manual effort involved in scrubbing, there
are other alternatives which are cost-effective. Firstly, tier-I companies such as
TCS, Infosys etc., have arranged separate teams for this purpose itself, thereby
providing the solution inside the organization. Secondly, small companies (mid-
tier) are outsourcing the work to the data scrubbing service providers available in
the market. Not only these, but there are lot of tools to avoid the manual
involvement in this time-taking process. For example, Compuware’s File-aid is a
tool which is enhanced to scrub the data for particular columns given a
file/database. This has been implemented in lot of mainframe systems among the
enterprises to provide the flexibility.
But, are the organizations serious about scrubbing? A definite Yes, because the
clients expect a certain level of integrity when they form a relationship with this
Value. Accelerated.
organization. So the personal data which the organization handles should be
secure enough build trust and confidence with the client. This is what one of the
clients using data scrubbing service had to say “I can honestly say that, I have
been impressed with the quality and turnaround time, making my business much
more manageable”.
Technology aspect
Lot of technologies such as ORACLE, Mainframes, People soft and SAS are
heading with a great pace towards it. Some technologies like Oracle applications,
SQL server and DB2 UDB are using official data masking tool such as Data
Masker, ORACLE Data Integrator etc. Irrespective of the technology, the process
remains the same to develop a tool for data masking. For example, let us
consider Mainframes as the environment. The data inside the DB2 database will
be stored in the form of LDS files. LDS file can be converted into flat file and the
same can be taken as the input for the tool. The language can be chosen from
COBOL, PLI, ASSEMBLER and REXX depending on the available expertise,
environment support, compilers provided by the client etc., like CHGMAN,
ENDEAVOR and ISPW.
There are different types of data masking. They are as follows:
1. File Masking: Sensitive information sometimes is stored in the form of files.
These files need to be masked with test data. This is often easier of all the
other types of masking.
2. Data base Masking: At the time of testing, developers does not need the
access to look into the production data but just need the look of the real-time
data. This creates the concept of Database masking. As the name says, the
database has to be masked with the test data and can be used for testing
purpose.
Value. Accelerated.
3. Report Masking: Most important of all is the report masking. This involves
the end users. So, all the data right from the contact details to financial details
has to be masked and encrypted with passwords. There are chances that the
reports can be viewed by third parties for validation purpose.
Some important factors which need to be considered in masking tool preparation
1. What data to be masked: Whether it is a file or a database not necessarily
all the data should be masked. Let us consider EMPLOYEE TABLE which
is having EMP_ID, EMP_FIRST_NAME and EMP_LAST_NAME as
columns, where EMP_ID is the primary key. In such case scrubbing the
primary ID will make the data confused and the integrity of the DB2 will be
lost. So such fields like EMP_ID, SSN_NO, USER_ID etc., should left as
is to avoid the confusion.
2. Consideration of database: In the case of mainframes there is a possibility
of having two types of database. They are IMS (information management
system) and DB2. TheDB2 gives the flexibility to arrange data in a
structured manner into rows and columns. Whereas the IMS is arranged in
the form of inverted tree structure. It runs on the basis of parent child
relationship. So all these should be considered while developing the tool.
3. The three ‘S’ Rule: The basic rule in the tool development is to maintain a
secured, standardized and structured data.
 Secured: Data like First name second name should be replaced with
test names to give the feel that the data is secured.
 Standard: the data should be standardised across all the tables. For
example, once the data is masked in an employee table, the
corresponding entries should be changed in Employee Manager Table
Value. Accelerated.
as well. This will make the data uniform across the tables and gives the
ease of access to the user.
 Structured: Database relations such as referential integrity should be
considered to maintain structured data. In the above example,
EMP_LAST_NAME acts a referential integrity to the EMPLOYEE table.
In such cases, two options are available. Either to mask the data with
similar values in both the tables or leave as is.
4. Masking Strategies: There are different types of masking techniques
available. All the data cannot be masked thru the same technique.
Depending on the usability and importance of the field, the type of
masking has to be chosen. Some of them are as follows
 Substitution: Fields like first/last name can be replaced with some
values which will be pre-defined.
 Encryption/Decryption: Data can be encrypted/ decrypted. Consider
the example below. There is a 16 digit card number in which the
middle 8 digits can be encrypted with ‘XXXX’. It can be referred to
password encryption also.
 Shuffling: Fields like phone number, date of birth can be considered
for this type of masking.
Value. Accelerated.
 Nullifying: Fields which can be used rarely for testing should be
considered for this type of masking. Partially truncating the field
makes the field not much useful.
The Darker Side
Just to play devil’s advocate, What if the data scrubbing tool has to contact a
server for scrambling the information? Is it really safe? Yes, there are chances of
risk that are involved in the usage of tools. There are some tools which contacts
the server for running the inbuilt code for scrambling the information. In such
cases risk factor is higher than safety. But not necessarily it can happen. Most of
the tools are made to work on the desktops rather than depending on the
servers.
Coming to manual ways, outsourcing the data scrubbing to the service providers
will contradict the basic principles of masking. Involvement of manual effort for
masking is not considered to be the safer way since human error is unavoidable.
As long as the data is safe, it is up to the organizations to choose their own way,
depending on the client satisfaction.
Value. Accelerated.
Even though enterprises are aware of data masking, Client gives the access to
the organisations to read the production data. In view of the security, removing
the production access to developer will be a serious issue. In case of any
analysis, it is recommended that an SME from the developing team has to look
into the production, for better technical understanding of the client requirement.
So as a regulatory step for the data masking, one cannot remove the access to
the programmers or associated team. This is why organisations still have the
production access associated with the customer’s information.
Data will be accessed by the customer support that directly interacts with the
customers/end users. These people would have the access to look into the
personal as well as the financial details of the customer. Even though that is a
limited access, still it acts as a threat from the end user point of view.
Conclusion
All-in-all it’s our responsibility to maintain the data confidentiality and integrity
inside our organization. Once we provide the edge ‘what client is expecting’, this
stands as the distinguishing factor in providing the trust. Data needs to be
standardised, well managed across all enterprise systems, otherwise the integrity
cannot be trusted. Even enterprises can increase their revenue, mitigate risk;
operate more efficiently by incorporating the data scrubbing as part of their
project schedule.
Someone said “we are looking for something better that we sometimes fail to
realize, that we already have the best”. The same is the case with the
enterprises. Now-a-days organisations have many weapons in their arsenal to
grab the interest of the client apart from this data masking. The scope and the
visibility of this data masking are so less to attract the organisations, in showing
itself as a powerful weapon. That is why the organizations tend more towards
other mantras like re-engineering initiatives, etc., to impress the client. Finally, it
is the enterprise’s ethics that stand between the client and itself in continuing the
long-term relationship.
Value. Accelerated.
References
Data masking what you need to know for developers:
http://www.datamasker.com/DataMasking_WhatYouNeedToKnow.pdf
Data Masking Services: http://www.dataentryindia.biz/data-entry-india/data-cleansing-data-scrubbing-
services-india.html
Data Masking for SQL: http://www.sqledit.com/scr/scr.pdf
Data masker for ORACLE: http://www.oracle.com/technetwork/middleware/data-
integrator/overview/odi-11g-new-features-overview-1622677.pdf
Data masking tool for mainframes (File-aid): http://www.compuware.com/mainframe-
solutions/r/fileaid_mvs.pdf
***
Value. Accelerated.

More Related Content

What's hot

A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods IJECEIAES
 
Information Assurance in an Enterprise Hosting Environment
Information Assurance in an Enterprise Hosting EnvironmentInformation Assurance in an Enterprise Hosting Environment
Information Assurance in an Enterprise Hosting Environmentwebhostingguy
 
Data masking techniques for Insurance
Data masking techniques for InsuranceData masking techniques for Insurance
Data masking techniques for InsuranceNIIT Technologies
 
Model of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computingModel of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computingijcseit
 
525 ibm optim
525 ibm optim525 ibm optim
525 ibm optimAccenture
 
Productivity and Security with Microsoft 365 and the Modern Desktop
Productivity and Security with Microsoft 365 and the Modern DesktopProductivity and Security with Microsoft 365 and the Modern Desktop
Productivity and Security with Microsoft 365 and the Modern DesktopDavid J Rosenthal
 
2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1c2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1cGene Kim
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Steven Meister
 
David valovcin big data - big risk
David valovcin big data - big riskDavid valovcin big data - big risk
David valovcin big data - big riskIBM Sverige
 
Health Decisions Webinar: January 2013 data warehouses
Health Decisions Webinar: January 2013 data warehousesHealth Decisions Webinar: January 2013 data warehouses
Health Decisions Webinar: January 2013 data warehousesSi Nahra
 
A survey on data security in data warehousing
A survey on data security in data warehousing A survey on data security in data warehousing
A survey on data security in data warehousing Rezgar Mohammad
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big DataNicolas Morales
 
Beyond BYOD
Beyond BYODBeyond BYOD
Beyond BYODWorksPad
 
Estuate EDM Checklist
Estuate EDM ChecklistEstuate EDM Checklist
Estuate EDM ChecklistEstuate, Inc.
 
Optim Insync10 Paul Griffin presentation
Optim Insync10 Paul Griffin presentationOptim Insync10 Paul Griffin presentation
Optim Insync10 Paul Griffin presentationInSync Conference
 
Oracle Identity Management Leveraging Oracle’s Engineered Systems
Oracle Identity Management Leveraging Oracle’s Engineered SystemsOracle Identity Management Leveraging Oracle’s Engineered Systems
Oracle Identity Management Leveraging Oracle’s Engineered SystemsGregOracle
 
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data ProtectionISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data ProtectionUlf Mattsson
 

What's hot (18)

A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods
 
Information Assurance in an Enterprise Hosting Environment
Information Assurance in an Enterprise Hosting EnvironmentInformation Assurance in an Enterprise Hosting Environment
Information Assurance in an Enterprise Hosting Environment
 
Data masking techniques for Insurance
Data masking techniques for InsuranceData masking techniques for Insurance
Data masking techniques for Insurance
 
Model of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computingModel of solutions for data security in Cloud computing
Model of solutions for data security in Cloud computing
 
525 ibm optim
525 ibm optim525 ibm optim
525 ibm optim
 
Productivity and Security with Microsoft 365 and the Modern Desktop
Productivity and Security with Microsoft 365 and the Modern DesktopProductivity and Security with Microsoft 365 and the Modern Desktop
Productivity and Security with Microsoft 365 and the Modern Desktop
 
2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1c2010 07 BSidesLV Mobilizing The PCI Resistance 1c
2010 07 BSidesLV Mobilizing The PCI Resistance 1c
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
 
David valovcin big data - big risk
David valovcin big data - big riskDavid valovcin big data - big risk
David valovcin big data - big risk
 
Health Decisions Webinar: January 2013 data warehouses
Health Decisions Webinar: January 2013 data warehousesHealth Decisions Webinar: January 2013 data warehouses
Health Decisions Webinar: January 2013 data warehouses
 
A survey on data security in data warehousing
A survey on data security in data warehousing A survey on data security in data warehousing
A survey on data security in data warehousing
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 
Beyond BYOD
Beyond BYODBeyond BYOD
Beyond BYOD
 
Estuate EDM Checklist
Estuate EDM ChecklistEstuate EDM Checklist
Estuate EDM Checklist
 
Optim Insync10 Paul Griffin presentation
Optim Insync10 Paul Griffin presentationOptim Insync10 Paul Griffin presentation
Optim Insync10 Paul Griffin presentation
 
Oracle Identity Management Leveraging Oracle’s Engineered Systems
Oracle Identity Management Leveraging Oracle’s Engineered SystemsOracle Identity Management Leveraging Oracle’s Engineered Systems
Oracle Identity Management Leveraging Oracle’s Engineered Systems
 
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data ProtectionISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
ISSA Boston - PCI and Beyond: A Cost Effective Approach to Data Protection
 
Consumerization
ConsumerizationConsumerization
Consumerization
 

Similar to Data masking a developer's guide

Three E-Business Marketing Strategies
Three E-Business Marketing StrategiesThree E-Business Marketing Strategies
Three E-Business Marketing StrategiesSummer Young
 
Database Management System.pptx
Database Management System.pptxDatabase Management System.pptx
Database Management System.pptxShuvrojitMajumder
 
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...IRJET Journal
 
MBA Application Essay
MBA Application EssayMBA Application Essay
MBA Application EssayJessica Spell
 
Application Of A New Database Management System
Application Of A New Database Management SystemApplication Of A New Database Management System
Application Of A New Database Management SystemPamela Wright
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Kudler Fine Foods Database Analysis Essay
Kudler Fine Foods Database Analysis EssayKudler Fine Foods Database Analysis Essay
Kudler Fine Foods Database Analysis EssayChantel Marie
 
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
IRJET-  	  Efficient Privacy-Preserving using Novel Based Secure Protocol in SVMIRJET-  	  Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVMIRJET Journal
 
Carrying out safe exploration short of the actual data of codes and trapdoors
Carrying out safe exploration short of the actual data of codes and trapdoorsCarrying out safe exploration short of the actual data of codes and trapdoors
Carrying out safe exploration short of the actual data of codes and trapdoorsIaetsd Iaetsd
 
Data Security Issues in Cloud Computing
Data Security Issues in Cloud ComputingData Security Issues in Cloud Computing
Data Security Issues in Cloud ComputingAsad Ali
 
Advantages And Disadvantages Of Data Storage
Advantages And Disadvantages Of Data StorageAdvantages And Disadvantages Of Data Storage
Advantages And Disadvantages Of Data StorageNicole Williams
 
User access profiling model
User access profiling modelUser access profiling model
User access profiling modelJose Guerrero
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
Data And Three Types Of Data Centers
Data And Three Types Of Data CentersData And Three Types Of Data Centers
Data And Three Types Of Data CentersApril Davis
 
Sap Business Objects Tableau Ibm Cognos
Sap Business Objects Tableau Ibm CognosSap Business Objects Tableau Ibm Cognos
Sap Business Objects Tableau Ibm CognosGerri Dominguez
 
A Secure Model for Cloud Computing Based Storage and Retrieval
A Secure Model for Cloud Computing Based Storage and  RetrievalA Secure Model for Cloud Computing Based Storage and  Retrieval
A Secure Model for Cloud Computing Based Storage and RetrievalIOSR Journals
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
 

Similar to Data masking a developer's guide (20)

Three E-Business Marketing Strategies
Three E-Business Marketing StrategiesThree E-Business Marketing Strategies
Three E-Business Marketing Strategies
 
Database Management System.pptx
Database Management System.pptxDatabase Management System.pptx
Database Management System.pptx
 
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
A Survey on Access Control Scheme for Data in Cloud with Anonymous Authentica...
 
MBA Application Essay
MBA Application EssayMBA Application Essay
MBA Application Essay
 
Application Of A New Database Management System
Application Of A New Database Management SystemApplication Of A New Database Management System
Application Of A New Database Management System
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Kudler Fine Foods Database Analysis Essay
Kudler Fine Foods Database Analysis EssayKudler Fine Foods Database Analysis Essay
Kudler Fine Foods Database Analysis Essay
 
Building a SaaS Style Application
Building a SaaS Style ApplicationBuilding a SaaS Style Application
Building a SaaS Style Application
 
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
IRJET-  	  Efficient Privacy-Preserving using Novel Based Secure Protocol in SVMIRJET-  	  Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
 
Database Security
Database SecurityDatabase Security
Database Security
 
Carrying out safe exploration short of the actual data of codes and trapdoors
Carrying out safe exploration short of the actual data of codes and trapdoorsCarrying out safe exploration short of the actual data of codes and trapdoors
Carrying out safe exploration short of the actual data of codes and trapdoors
 
Data Security Issues in Cloud Computing
Data Security Issues in Cloud ComputingData Security Issues in Cloud Computing
Data Security Issues in Cloud Computing
 
Advantages And Disadvantages Of Data Storage
Advantages And Disadvantages Of Data StorageAdvantages And Disadvantages Of Data Storage
Advantages And Disadvantages Of Data Storage
 
DBMS unit 1
DBMS unit 1DBMS unit 1
DBMS unit 1
 
User access profiling model
User access profiling modelUser access profiling model
User access profiling model
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Data And Three Types Of Data Centers
Data And Three Types Of Data CentersData And Three Types Of Data Centers
Data And Three Types Of Data Centers
 
Sap Business Objects Tableau Ibm Cognos
Sap Business Objects Tableau Ibm CognosSap Business Objects Tableau Ibm Cognos
Sap Business Objects Tableau Ibm Cognos
 
A Secure Model for Cloud Computing Based Storage and Retrieval
A Secure Model for Cloud Computing Based Storage and  RetrievalA Secure Model for Cloud Computing Based Storage and  Retrieval
A Secure Model for Cloud Computing Based Storage and Retrieval
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 

Recently uploaded

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 

Recently uploaded (20)

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 

Data masking a developer's guide

  • 1. Data Masking - A Developer’s guide Technical Paper
  • 2. Value. Accelerated. Introduction Plastic money is a part of our life. The more the credit cards, the more prestigious the person is. How secured are we in this aspect? What if someone gets into the bank’s database and knows our personal information such as address, mobile number etc.? Why because, the software developers sometimes will have the access to look into production database. Confused? That’s where data scrubbing comes into picture. Have you ever observed that when we pay the credit card bills thru online, our personal information like card number, email- id, etc., will be partially scrubbed? Why is it done? Is it really necessary? How is it done? Why do we have to concentrate on this? This paper focuses mainly on the necessity to scrub the data, different ways which are available in achieving it without affecting the cost- cutting initiatives and technical aspects which need to consider in scrubbing the data. This paper acts as the guideline for managers who think on a developer’s perspective, for programmers who want to have more technical knowledge for doing the data scrubbing tools, and for clients who look towards scope of data masking from enterprise’s perspective. Concept of Data Masking Different people have different perception of data masking. Some call it as cleaning of the so called ‘JUNK data’ from the database, Some call it as “Data scrubbing” and others as ”Data Sanitization”. Some refer to this as correction of invalid data entered into the database. Data scrubbing here refers to masking users personal details such as card number, address, contact details etc. with some variable/static alphanumeric characters. It gives an impression to the user that their personal information is secure and not compromised.
  • 3. Value. Accelerated. Here is an overview of typical banking cycle. The front end involves a customer support executive and bank customer. A bank customer will access their account details using banking portal. A customer support executive also has access to a customer’s personal details. The cloned version of the production database in the scrubbed format will be available for the developers for carrying their testing. Scope of Data Masking Data masking encompasses each phase of a typical SDLC cycle. While estimating, the effort for scrubbed data preparation and post-scrubbing has to be taken into account. In testing phase, feasibility of the prepared scrubbed data should be considered. After implementation, in case of any report generation to the end users, the data should be thoroughly scrubbed and the report should be generated.
  • 4. Value. Accelerated. Even though there is a lot of manual effort involved in scrubbing, there are other alternatives which are cost-effective. Firstly, tier-I companies such as TCS, Infosys etc., have arranged separate teams for this purpose itself, thereby providing the solution inside the organization. Secondly, small companies (mid- tier) are outsourcing the work to the data scrubbing service providers available in the market. Not only these, but there are lot of tools to avoid the manual involvement in this time-taking process. For example, Compuware’s File-aid is a tool which is enhanced to scrub the data for particular columns given a file/database. This has been implemented in lot of mainframe systems among the enterprises to provide the flexibility. But, are the organizations serious about scrubbing? A definite Yes, because the clients expect a certain level of integrity when they form a relationship with this
  • 5. Value. Accelerated. organization. So the personal data which the organization handles should be secure enough build trust and confidence with the client. This is what one of the clients using data scrubbing service had to say “I can honestly say that, I have been impressed with the quality and turnaround time, making my business much more manageable”. Technology aspect Lot of technologies such as ORACLE, Mainframes, People soft and SAS are heading with a great pace towards it. Some technologies like Oracle applications, SQL server and DB2 UDB are using official data masking tool such as Data Masker, ORACLE Data Integrator etc. Irrespective of the technology, the process remains the same to develop a tool for data masking. For example, let us consider Mainframes as the environment. The data inside the DB2 database will be stored in the form of LDS files. LDS file can be converted into flat file and the same can be taken as the input for the tool. The language can be chosen from COBOL, PLI, ASSEMBLER and REXX depending on the available expertise, environment support, compilers provided by the client etc., like CHGMAN, ENDEAVOR and ISPW. There are different types of data masking. They are as follows: 1. File Masking: Sensitive information sometimes is stored in the form of files. These files need to be masked with test data. This is often easier of all the other types of masking. 2. Data base Masking: At the time of testing, developers does not need the access to look into the production data but just need the look of the real-time data. This creates the concept of Database masking. As the name says, the database has to be masked with the test data and can be used for testing purpose.
  • 6. Value. Accelerated. 3. Report Masking: Most important of all is the report masking. This involves the end users. So, all the data right from the contact details to financial details has to be masked and encrypted with passwords. There are chances that the reports can be viewed by third parties for validation purpose. Some important factors which need to be considered in masking tool preparation 1. What data to be masked: Whether it is a file or a database not necessarily all the data should be masked. Let us consider EMPLOYEE TABLE which is having EMP_ID, EMP_FIRST_NAME and EMP_LAST_NAME as columns, where EMP_ID is the primary key. In such case scrubbing the primary ID will make the data confused and the integrity of the DB2 will be lost. So such fields like EMP_ID, SSN_NO, USER_ID etc., should left as is to avoid the confusion. 2. Consideration of database: In the case of mainframes there is a possibility of having two types of database. They are IMS (information management system) and DB2. TheDB2 gives the flexibility to arrange data in a structured manner into rows and columns. Whereas the IMS is arranged in the form of inverted tree structure. It runs on the basis of parent child relationship. So all these should be considered while developing the tool. 3. The three ‘S’ Rule: The basic rule in the tool development is to maintain a secured, standardized and structured data.  Secured: Data like First name second name should be replaced with test names to give the feel that the data is secured.  Standard: the data should be standardised across all the tables. For example, once the data is masked in an employee table, the corresponding entries should be changed in Employee Manager Table
  • 7. Value. Accelerated. as well. This will make the data uniform across the tables and gives the ease of access to the user.  Structured: Database relations such as referential integrity should be considered to maintain structured data. In the above example, EMP_LAST_NAME acts a referential integrity to the EMPLOYEE table. In such cases, two options are available. Either to mask the data with similar values in both the tables or leave as is. 4. Masking Strategies: There are different types of masking techniques available. All the data cannot be masked thru the same technique. Depending on the usability and importance of the field, the type of masking has to be chosen. Some of them are as follows  Substitution: Fields like first/last name can be replaced with some values which will be pre-defined.  Encryption/Decryption: Data can be encrypted/ decrypted. Consider the example below. There is a 16 digit card number in which the middle 8 digits can be encrypted with ‘XXXX’. It can be referred to password encryption also.  Shuffling: Fields like phone number, date of birth can be considered for this type of masking.
  • 8. Value. Accelerated.  Nullifying: Fields which can be used rarely for testing should be considered for this type of masking. Partially truncating the field makes the field not much useful. The Darker Side Just to play devil’s advocate, What if the data scrubbing tool has to contact a server for scrambling the information? Is it really safe? Yes, there are chances of risk that are involved in the usage of tools. There are some tools which contacts the server for running the inbuilt code for scrambling the information. In such cases risk factor is higher than safety. But not necessarily it can happen. Most of the tools are made to work on the desktops rather than depending on the servers. Coming to manual ways, outsourcing the data scrubbing to the service providers will contradict the basic principles of masking. Involvement of manual effort for masking is not considered to be the safer way since human error is unavoidable. As long as the data is safe, it is up to the organizations to choose their own way, depending on the client satisfaction.
  • 9. Value. Accelerated. Even though enterprises are aware of data masking, Client gives the access to the organisations to read the production data. In view of the security, removing the production access to developer will be a serious issue. In case of any analysis, it is recommended that an SME from the developing team has to look into the production, for better technical understanding of the client requirement. So as a regulatory step for the data masking, one cannot remove the access to the programmers or associated team. This is why organisations still have the production access associated with the customer’s information. Data will be accessed by the customer support that directly interacts with the customers/end users. These people would have the access to look into the personal as well as the financial details of the customer. Even though that is a limited access, still it acts as a threat from the end user point of view. Conclusion All-in-all it’s our responsibility to maintain the data confidentiality and integrity inside our organization. Once we provide the edge ‘what client is expecting’, this stands as the distinguishing factor in providing the trust. Data needs to be standardised, well managed across all enterprise systems, otherwise the integrity cannot be trusted. Even enterprises can increase their revenue, mitigate risk; operate more efficiently by incorporating the data scrubbing as part of their project schedule. Someone said “we are looking for something better that we sometimes fail to realize, that we already have the best”. The same is the case with the enterprises. Now-a-days organisations have many weapons in their arsenal to grab the interest of the client apart from this data masking. The scope and the visibility of this data masking are so less to attract the organisations, in showing itself as a powerful weapon. That is why the organizations tend more towards other mantras like re-engineering initiatives, etc., to impress the client. Finally, it is the enterprise’s ethics that stand between the client and itself in continuing the long-term relationship.
  • 10. Value. Accelerated. References Data masking what you need to know for developers: http://www.datamasker.com/DataMasking_WhatYouNeedToKnow.pdf Data Masking Services: http://www.dataentryindia.biz/data-entry-india/data-cleansing-data-scrubbing- services-india.html Data Masking for SQL: http://www.sqledit.com/scr/scr.pdf Data masker for ORACLE: http://www.oracle.com/technetwork/middleware/data- integrator/overview/odi-11g-new-features-overview-1622677.pdf Data masking tool for mainframes (File-aid): http://www.compuware.com/mainframe- solutions/r/fileaid_mvs.pdf ***