My keynote speech at the ISACA IIA Belgium software watch day in October 2014 in Brussels on the value of big data and data analytics for auditors and other assurance professionals
20. Big Data—Characteristics
Very large, distributed aggregations of loosely
structured data – often incomplete & inaccessible:
• Petabytes/Exabytes of data,
• Millions/billions/trillions of records,
• Loosely-structured, distributed data,
• Flat schemas with complex interrelationships,
• Coming from diverse sources,
• Social networks,
• Sensor networks,
• Servers (files, logs, chats, video),
• etc.,
• Often involving time-stamped events,
• Applications (Transactional or Analytical)
22. Fourth ‘v’: Validity = data quality & integration connected to
collaborate data governance.
Fifth ‘v’: Veracity = data integrity & the ability for an organization to
trust the data and be able to confidently use it to make crucial
decisions
Big Data—Characteristics
33. Main management
questions on big data:
1. Where should we store our
data?
2. How are we going to protect
our data?
3. How are we going to use our
data safely & lawfully?
34. Core element on big data:
The full life cycle
of information
needs to be
considered!
35. Big data perception issue:
Addressing
big data risk & concerns
can NOT be seen
exclusively
as an IT exercise!
36. Key elements to take into
account with Big Data
1. Data Quality: shift from “quest for perfect data” to
“fit for use data”
●Add practicality
●Reduce time, cost and effort
2. Data Volume: shift from “shiny object syndrome”
to “managed analytics”
3. Project Budgets: shift from “business
determining needs” to “business & IT working
together”
4. User Proficiency & Knowledge Transfer: shift
from “specialist expert analytics know best” to
“business using tools & experts”
39. Key risk with Big Data:
privacy & security
Privacy-related laws must be considered for Big Data
projects. Responsibility for applying data privacy & security
techniques falls on the individual capturing, using data &
performing analysis: determine & document how data privacy
& security issues will be addressed prior to a Big Data project.
1. What is the goal of the Big Data project?
2. How will this data be used? Anyone who will access & hold this data
during analysis must adhere to appropriate data governance standards.
3. Who will be able to access, review and analyze this data? Which roles &
responsibilities of users within organization will have access to personal
& sensitive information.
4. How will this data be secured to prevent unauthorized access?
5. How will this data be updated?
52. Role of Internal Audit in
Managing Big Data
Check the extent of data assets and deep dive into what all is available. Data that is redundant or
unimportant may be identified and reduced.
To manage data holdings effectively, an organization must first be aware of the location, condition and
value of its research assets. Conducting a data audit provides this information, raising awareness of
collection strengths and identifying weaknesses in data policies and management procedures.
The benefits of conducting an audit for managing big data
effectively are:
Monitor holdings and avoid big data leaks. Data hacking, social engineering and data leaks are all
concepts that plague a company – an audit can help a company identify areas where there is a
possibility of leakage.
Manage risks associated with big data loss and irretrievability. Data which is not structured and is
lying untouched may never be retrieved; an audit can help identify such cases.
Develop a big data strategy and implement robust big data policies. Big data requires robust
management and proper structurization.
Improve workflows and benefit from efficiency savings. Check where there are complex and time-
consuming workflows and where there is a scope of improving efficiencies.
Realize the value of big data through improved access and reuse to check if there are areas that
have not been used in a while.
Source: http://www.data-audit.eu/docs/DAF_briefing_paper.pdf
53. Complex Big Data
Big Data Security
Big Data
Accessibility
Big Data
Quality
Big Data
Understanding
Managing Big Data
through Internal Audit
Most companies collect large volumes of data but they do not have comprehensive
approaches for centralizing the information. Internal audit can help companies
manage big data by streamlining and collating data effectively.
Issues of big data that internal audit can help mitigate:
Maintaining effective data security is increasingly recognized as a critical risk area for
organizations. Loss of control over data security can have severe ramifications for an
organization, including regulatory penalties, loss of reputation, and damage to
business operations and profitability. Auditing can help organizations secure and
control data collected.
Giving access to big data to the right person at the right time is another challenge
organizations face. Segregation of Duties (SoD) is an important aspect that can be
checked by an IA.
The more data one accumulates, the harder it is to keep everything consistent and
correct. Internal audit can check the quality of big data.
Understanding and interpretation of big data remains one of the primary concerns for
many organizations. Auditors can effectively simplify an organization’s data
effectively.
Source: http://www.acl.com/pdfs/wp_AA_Best_Practices.pdf, http://
smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-data
55. Big Data
! Big data = valuable asset & powerful tool with
far reaching impacts. Big data initiatives need to
have visibility at board level & executive level
sponsors.
! Success of enterprises will depend on how they
meet & deal with big data challenges & impacts.
! To harness value & deliver resilient and faster
analytic solutions, enterprises must implement
big data solutions using repeatable frameworks
& processes coupled with good governance &
risk management framework.
56. Any organization’s data = one of most valuable assets.
Without a way to obtain, cleanse, organize and evaluate its
data, an organization is left with a vast, chaotic pool of 0 & 1.
Big Data results can be used to:
◦ improve business efficiencies;
◦ verify process effectiveness;
◦ identify areas of key risk, fraud, errors or misuse;
◦ influence business decisions.
Maximum benefits from Big Data can be achieved for any
organization if:
◦ the approach is aligned to the business,
◦ the (privacy & security) risks are managed,
◦ the process is effectively planned, designed, implemented,
tested and governed.
Big Data
57. “Even with infinite knowledge of past behaviour,
we often won’t have enough information to make
meaningful predictions about the future.
In fact, the more data we have, the more false
confidence we will have…
The important part is to understand what our
limits are and to use the best possible science to
fill in the gaps. All the data in the world will never
achieve that goal for us.”
Peter Fader, co-director of the Wharton Customer
Analytics Initiative at the University of Pennsylvania
and Professor Marketing at Penn’s Wharton School
of Business in MIT’s Technology Review, May 2012
http://www.technologyreview.com/news/427786/is-there-big-money-in-big-data/
62. Big Data Governance questions
1. What principles, policies and frameworks are we
going to establish to support the achievement of
business strategy through big data?
2. Can we trust our sources of big data?
3. What structures and skills do we have to govern and
manage IT?
4. What structures and skills do we have to govern big
data privacy?
5. Do we have the right tools to meet our big data
privacy requirements?
63. Big Data Governance questions
6. How do we verify the authenticity of the data?
7. Can we verify how the information will be used?
8. What decision options do we have regarding big data
privacy?
9. What is the context for each decision?
10. Can we simulate the decisions and understand the
consequences?
11. Will we record the consequences and use that
information to improve our big data information gathering,
context, analysis and decision-making processes?
64. Big Data Governance questions
12. How will we protect our sources, our processes and
our decisions from theft and corruption?
13. Are we exploiting the insights we get from big data?
14. What information are we collecting without exposing
the enterprise to legal and regulatory battles?
15. What actions are we taking that create trends that
can be exploited by our rivals?
16. What policies are in place to ensure that employees
keep stakeholder information confidential during and
after employment?
65. Big Data Governance potential answers
! Senior management buy-in and evidence of continuous
commitment
! Data anonymization/sanitization or de-identification
! Adequate, relevant, useful and current big data privacy
policies, processes, procedures and supporting structures
! Appropriate data destruction, comprehensive data
management policy, clearly defined disposal ownership and
accountability
! Compliance with legal and regulatory data requirements
! Continuous education and training of big data policies,
processes and procedures
66. Big data control category:
Approach & Understanding
! “Right tone at the top”.
! Data policy.
! Inventory of all data sources
! Identify vulnerabilities in the data flow including internal &
external data sources, automated & manual processes.
! Data deficiency governance process for analysis of impact
and probability, escalation to senior management where
necessary, and a strategic or tactical resolution.
! Each vulnerability needs a data owner.
! Materiality criteria to identify most relevant data sets.
! Escalation path for data deficiency management.
67. Big data control category:
Confidentiality / Privacy
! Through the data risk management process, all sensitive data
should be identified and appropriate controls put in place.
Rules & regulations govern how sensitive data should be
secured in storage & transit.
! Logical & physical access security controls are needed to
prevent unauthorized access to sensitive data. This includes
classic IT General Controls (password settings, masking or partially
masking sensitive data, periodic user access review, firewalls, server room door
security, server access logs, administrative access privileges and screen saver
lockout.)
! Encryption technologies must be used to store & transfer
highly sensitive information within + outside the enterprise.
68. Big data control category:
Quality
! Assess data against accuracy, reliability, completeness and
timeliness criteria defined in the data policy & associated
standards.
! Data sourced from third party: contractually bound process to
gain confidence over data quality through an independent
validation of data quality controls at the third party or through
independent checks on any material data received.
! Ownership & responsibilities associated with each material
data set should be assigned. Appropriate training should be
rolled out to all relevant personnel to make them aware of their
data-related responsibilities.
69. Big data control category:
Availability
! Reliable (tested) disaster recovery arrangements should be
in place to ensure that data are available in accordance with
◦ data recovery point objective (RPO) criteria and
◦ recovery time objective (RTO) criteria
defined in a business impact analysis (BIA).
70. Assurance considerations with Big Data
By using the same risk-based approach used to determine
the audit schedule, potential target areas may be identified in
following areas:
1. Determine operational effectiveness of the current control
environment.
2. Determine effectiveness of anti-fraud procedures &
controls.
3. Identify business process errors.
4. Identify business process improvements & inefficiencies in
the control environment.
5. Identify exceptions or unusual business rules.
6. Identify fraud.
7. Identify areas where poor data quality exists.