2. Value. Accelerated.
Introduction
Plastic money is a part of our life. The more the credit cards, the more
prestigious the person is. How secured are we in this aspect? What if someone
gets into the bank’s database and knows our personal information such as
address, mobile number etc.? Why because, the software developers sometimes
will have the access to look into production database. Confused? That’s where
data scrubbing comes into picture. Have you ever observed that when we pay
the credit card bills thru online, our personal information like card number, email-
id, etc., will be partially scrubbed?
Why is it done? Is it really
necessary? How is it done? Why
do we have to concentrate on
this? This paper focuses mainly
on the necessity to scrub the
data, different ways which are available in achieving it without affecting the cost-
cutting initiatives and technical aspects which need to consider in scrubbing the
data. This paper acts as the guideline for managers who think on a developer’s
perspective, for programmers who want to have more technical knowledge for
doing the data scrubbing tools, and for clients who look towards scope of data
masking from enterprise’s perspective.
Concept of Data Masking
Different people have different perception of data masking. Some call it as
cleaning of the so called ‘JUNK data’ from the database, Some call it as “Data
scrubbing” and others as ”Data Sanitization”. Some refer to this as correction of
invalid data entered into the database. Data scrubbing here refers to masking
users personal details such as card number, address, contact details etc. with
some variable/static alphanumeric characters. It gives an impression to the user
that their personal information is secure and not compromised.
3. Value. Accelerated.
Here is an overview of typical banking cycle. The front end involves a customer
support executive and bank customer. A bank customer will access their account
details using banking portal. A customer support executive also has access to a
customer’s personal details. The cloned version of the production database in the
scrubbed format will be available for the developers for carrying their testing.
Scope of Data Masking
Data masking encompasses each phase of a typical SDLC cycle. While
estimating, the effort for scrubbed data preparation and post-scrubbing has to be
taken into account. In testing phase, feasibility of the prepared scrubbed data
should be considered. After implementation, in case of any report generation to
the end users, the data should be thoroughly scrubbed and the report should be
generated.
4. Value. Accelerated.
Even though there is a lot of manual effort involved in scrubbing, there
are other alternatives which are cost-effective. Firstly, tier-I companies such as
TCS, Infosys etc., have arranged separate teams for this purpose itself, thereby
providing the solution inside the organization. Secondly, small companies (mid-
tier) are outsourcing the work to the data scrubbing service providers available in
the market. Not only these, but there are lot of tools to avoid the manual
involvement in this time-taking process. For example, Compuware’s File-aid is a
tool which is enhanced to scrub the data for particular columns given a
file/database. This has been implemented in lot of mainframe systems among the
enterprises to provide the flexibility.
But, are the organizations serious about scrubbing? A definite Yes, because the
clients expect a certain level of integrity when they form a relationship with this
5. Value. Accelerated.
organization. So the personal data which the organization handles should be
secure enough build trust and confidence with the client. This is what one of the
clients using data scrubbing service had to say “I can honestly say that, I have
been impressed with the quality and turnaround time, making my business much
more manageable”.
Technology aspect
Lot of technologies such as ORACLE, Mainframes, People soft and SAS are
heading with a great pace towards it. Some technologies like Oracle applications,
SQL server and DB2 UDB are using official data masking tool such as Data
Masker, ORACLE Data Integrator etc. Irrespective of the technology, the process
remains the same to develop a tool for data masking. For example, let us
consider Mainframes as the environment. The data inside the DB2 database will
be stored in the form of LDS files. LDS file can be converted into flat file and the
same can be taken as the input for the tool. The language can be chosen from
COBOL, PLI, ASSEMBLER and REXX depending on the available expertise,
environment support, compilers provided by the client etc., like CHGMAN,
ENDEAVOR and ISPW.
There are different types of data masking. They are as follows:
1. File Masking: Sensitive information sometimes is stored in the form of files.
These files need to be masked with test data. This is often easier of all the
other types of masking.
2. Data base Masking: At the time of testing, developers does not need the
access to look into the production data but just need the look of the real-time
data. This creates the concept of Database masking. As the name says, the
database has to be masked with the test data and can be used for testing
purpose.
6. Value. Accelerated.
3. Report Masking: Most important of all is the report masking. This involves
the end users. So, all the data right from the contact details to financial details
has to be masked and encrypted with passwords. There are chances that the
reports can be viewed by third parties for validation purpose.
Some important factors which need to be considered in masking tool preparation
1. What data to be masked: Whether it is a file or a database not necessarily
all the data should be masked. Let us consider EMPLOYEE TABLE which
is having EMP_ID, EMP_FIRST_NAME and EMP_LAST_NAME as
columns, where EMP_ID is the primary key. In such case scrubbing the
primary ID will make the data confused and the integrity of the DB2 will be
lost. So such fields like EMP_ID, SSN_NO, USER_ID etc., should left as
is to avoid the confusion.
2. Consideration of database: In the case of mainframes there is a possibility
of having two types of database. They are IMS (information management
system) and DB2. TheDB2 gives the flexibility to arrange data in a
structured manner into rows and columns. Whereas the IMS is arranged in
the form of inverted tree structure. It runs on the basis of parent child
relationship. So all these should be considered while developing the tool.
3. The three ‘S’ Rule: The basic rule in the tool development is to maintain a
secured, standardized and structured data.
Secured: Data like First name second name should be replaced with
test names to give the feel that the data is secured.
Standard: the data should be standardised across all the tables. For
example, once the data is masked in an employee table, the
corresponding entries should be changed in Employee Manager Table
7. Value. Accelerated.
as well. This will make the data uniform across the tables and gives the
ease of access to the user.
Structured: Database relations such as referential integrity should be
considered to maintain structured data. In the above example,
EMP_LAST_NAME acts a referential integrity to the EMPLOYEE table.
In such cases, two options are available. Either to mask the data with
similar values in both the tables or leave as is.
4. Masking Strategies: There are different types of masking techniques
available. All the data cannot be masked thru the same technique.
Depending on the usability and importance of the field, the type of
masking has to be chosen. Some of them are as follows
Substitution: Fields like first/last name can be replaced with some
values which will be pre-defined.
Encryption/Decryption: Data can be encrypted/ decrypted. Consider
the example below. There is a 16 digit card number in which the
middle 8 digits can be encrypted with ‘XXXX’. It can be referred to
password encryption also.
Shuffling: Fields like phone number, date of birth can be considered
for this type of masking.
8. Value. Accelerated.
Nullifying: Fields which can be used rarely for testing should be
considered for this type of masking. Partially truncating the field
makes the field not much useful.
The Darker Side
Just to play devil’s advocate, What if the data scrubbing tool has to contact a
server for scrambling the information? Is it really safe? Yes, there are chances of
risk that are involved in the usage of tools. There are some tools which contacts
the server for running the inbuilt code for scrambling the information. In such
cases risk factor is higher than safety. But not necessarily it can happen. Most of
the tools are made to work on the desktops rather than depending on the
servers.
Coming to manual ways, outsourcing the data scrubbing to the service providers
will contradict the basic principles of masking. Involvement of manual effort for
masking is not considered to be the safer way since human error is unavoidable.
As long as the data is safe, it is up to the organizations to choose their own way,
depending on the client satisfaction.
9. Value. Accelerated.
Even though enterprises are aware of data masking, Client gives the access to
the organisations to read the production data. In view of the security, removing
the production access to developer will be a serious issue. In case of any
analysis, it is recommended that an SME from the developing team has to look
into the production, for better technical understanding of the client requirement.
So as a regulatory step for the data masking, one cannot remove the access to
the programmers or associated team. This is why organisations still have the
production access associated with the customer’s information.
Data will be accessed by the customer support that directly interacts with the
customers/end users. These people would have the access to look into the
personal as well as the financial details of the customer. Even though that is a
limited access, still it acts as a threat from the end user point of view.
Conclusion
All-in-all it’s our responsibility to maintain the data confidentiality and integrity
inside our organization. Once we provide the edge ‘what client is expecting’, this
stands as the distinguishing factor in providing the trust. Data needs to be
standardised, well managed across all enterprise systems, otherwise the integrity
cannot be trusted. Even enterprises can increase their revenue, mitigate risk;
operate more efficiently by incorporating the data scrubbing as part of their
project schedule.
Someone said “we are looking for something better that we sometimes fail to
realize, that we already have the best”. The same is the case with the
enterprises. Now-a-days organisations have many weapons in their arsenal to
grab the interest of the client apart from this data masking. The scope and the
visibility of this data masking are so less to attract the organisations, in showing
itself as a powerful weapon. That is why the organizations tend more towards
other mantras like re-engineering initiatives, etc., to impress the client. Finally, it
is the enterprise’s ethics that stand between the client and itself in continuing the
long-term relationship.
10. Value. Accelerated.
References
Data masking what you need to know for developers:
http://www.datamasker.com/DataMasking_WhatYouNeedToKnow.pdf
Data Masking Services: http://www.dataentryindia.biz/data-entry-india/data-cleansing-data-scrubbing-
services-india.html
Data Masking for SQL: http://www.sqledit.com/scr/scr.pdf
Data masker for ORACLE: http://www.oracle.com/technetwork/middleware/data-
integrator/overview/odi-11g-new-features-overview-1622677.pdf
Data masking tool for mainframes (File-aid): http://www.compuware.com/mainframe-
solutions/r/fileaid_mvs.pdf
***