The Cloud Playbook showcases how Booz Allen’s Cloud Analytics Reference Architecture can be utilized to build technology infrastructures that can withstand the weight of massive data sets - and deliver the deep insights organizations need to drive innovation.
3. 1.0 | Summary
1.0 | Summary
The problems of explosive data growth and how
cloud analytics provide the solution
page 4
2.0 | Differentiation
2.0 | Differentiation
An enormous amount of valuable information is out there, Introduces the Architecture and explains how
Booz Allen’s unique approach to people, processes,
waiting to be transformed into differentiating services. Booz Allen and technology gets the job done
page 10
Hamilton uses its Cloud Analytics Reference Architecture to build
3.0 | Depth
technology infrastructures that can withstand the weight of massive
3.0 | Depth
Takes the Architecture apart
layer by layer with detailed visuals that
datasets—and deliver the deep insights organizations need to show you how we frame a solution
page 16
drive innovation.
4.0 | Successes
4.0 | Successes
Presents real-world examples from the hundreds of
organizations who have successfully worked with Booz Allen
to implement an analytics solution using the Architecture
page 31
Prefer to read this on your iPad?
Search “Booz Allen” at the iTunes App Store,®
or simply scan the QR code.
3
4. 1.0
Summary
in this section
A majority of executives believe their
companies are unprepared to leverage
their data. We look at why that is and
how to change it.
5. 1.0 | Summary
Extracting True Insights
The Growing Data Analysis Gap
2.0 | Differentiation
We are living in the greatest age of information discovery the world has ever known.
According to recent industry research, we now generate more data every 2 days than we did from the dawn of early civilization
through the year 2003 combined. And data rates are still growing—approximately 40% each year.
Fueled in large part by the more than five billion mobile phones in use around the globe, our world is increasingly measured,
instrumented, monitored, and automated in ways that generate incredible amounts of rich and complex data. Unfortunately,
the number of big data analysts and the capabilities of traditional tools aren’t keeping pace with this unprecedented data growth.
At Booz Allen, we’ve watched this trend for some time now—we call it the “data analysis gap.” It’s clear that data has outstripped
3.0 | Depth
common analytics tools and staffing levels. In order to move forward, organizations must be able to analyze data on a massive
scale and quickly use it to provide deeper insights, create new products, and differentiate their services.
4.0 | Successes
sustainable challenging missed
opportunities
44 times
quantity
Data
By 2020, the amount of information in our economy will grow 44 times. Analysis
Very few organizations are prepared for this wave of data. Gap
2009 2020
time
(Source: IDC)
5
6. Preparing for
What’s Ahead
The ability to compete and win in the information economy will come from powerful analytics that draw insights and value from Done right, analytics hosted in
data, and from high-fidelity visualizations that present those insights in impactful, intuitive ways. Both will become key influencers the cloud will help:
of corporate decision making and consumer purchasing.
▶▶ Improve overall performance
Many of the world’s IT systems are not ready for the technology revolution happening as organizations seek to transform how they and efficiency
use data. Their infrastructures face three major challenges: ▶▶ Better understand customer and
employee needs
Volume Variety Velocity ▶▶ Translate data into actionable
Not enough storage capacity and Data comes in many different formats, Inability to process data in real time in intelligence and faster decision
analytical capabilities to handle which can be difficult and expensive order to extract the most value from it making
massive volumes of data to integrate ▶▶ Reduce IT costs
▶▶ mprove scalability to handle
I
To help organizations overcome these hurdles and prepare for what’s next, Booz Allen has pioneered strategies for the future growth
implementation of the Digital Enterprise—a way of using technology, machine-based analytics, and human-powered analysis to create
competitive and mission advantage. However, before you invest in a
cloud analytics solution, you should
fully understand the scope of what’s
involved and engage in the proper
A Framework for the Future
planning to ensure that all the right
elements will be in place.
Booz Allen has a framework for intelligently integrating cloud computing technology and advanced analytic capabilities,
called the Cloud Analytics Reference Architecture. The Architecture is designed to solve compute-intensive problems that
were previously out of reach for most organizations, including large-scale image processing, sensor data correlation, social
network analysis, encryption/decryption, data mining, simulations, and pattern recognition.
At the core of the Architecture are systems that accommodate petabytes of data at reasonable cost and allow analytics
to run at previously unattainable scales in reasonable amounts of time. However, human insights and action are still the
fundamental drivers.
The purpose of the Architecture is to allow machines to do 80% of the work—the mundane tasks they are best suited for—
and enable people to do the 20% of the work they do best, tasks that involve analysis and creativity.
7. 1.0 | Summary
The Transformative
Power of Cloud Analytics
2.0 | Differentiation
Booz Allen is the leader in the Analysis using standard cloud
emerging field of cloud analytics. computing solutions extends
Our unique approach combines basic analytic techniques to
cloud and other technologies large or very large datasets.
with superior analytic tradecraft This is a logical entry point for
to create breakthroughs in how cloud solutions because cloud
organizations capture, store, technology is the most efficient,
3.0 | Depth
correlate, pre-compute, and extract cost-effective way to run analytics
value from large sets of data. on large amounts of data.
To understand the power of Advanced analytics is where
cloud analytics, it helps to see predictive capabilities are brought
the progression from basic into the mix. It’s generally used
data analytics performed in to evaluate the future impact of
most organizations today. As an strategic decisions. However, it
4.0 | Successes
infrastructure is built out along represents a step back in terms
the continuum to cloud analytics, of the size of datasets that can
the size and scale of data it can be manipulated.
process increases along with the
ability to drive performance and Cloud analytics transcends
improve decision making. the limits of the other forms of
analysis. It delivers insights to
Basic data analysis usually answer previously unanswerable
happens in core business questions such as:
functions with smaller datasets.
Reports are usually created ▶▶ ow can we gain competitive
H
on a “one-off” basis, limited advantage in our market
to distribution within a specific space?
department to support routine
decision making. ▶▶ here can we save money
W
within our organization?
▶▶ ow should we turn our data
H
into a product?
7
8. From Data
to Digital Enterprise
Booz Allen clients have
a wide variety of data
analysis challenges and IT
infrastructures. Our flexible,
scalable Cloud Analytics STAGE 3
Reference Architecture has Significant improvements
three stages or entry points in performance are realized
to accommodate these when you achieve success
differences. STAGE 2 in managing the flow of
We begin to modernize information at scale and
In each stage, we enable applications to handle derive the fullest value from
shifts in technology STAGE 1 the demands of advanced your data.
investments while helping The focus is on saving analytics. Faster, reusable,
manage risk and maximize money and reducing risk. and more intuitive applications
the reward. That means
leveraging the assets you
already own and taking logical
You may have already
begun some of these
initiatives; we leverage
will enable everyone in your
organization to work smarter. 3 CLOUD ANALYTICS
steps to add what’s needed. what’s working now as we ▶ reate deep insight into
C
2 SMART DATA
discover new ways to relevant mission data
This is the only way to build a
increase efficiency.
structure to instrument data at scale
so you can truly experience ▶ sk and answer previously
A
▶ nhanced Enterprise
E
breakthrough analytics. unanswerable questions
1 IT EFFICIENCIES Data Architecture
▶ larify pedigree
C
(data tagging)
▶ Data center consolidation ▶ Multidimensional indexing
▶ Server and data consolidation ▶ Adopt distributed database
▶ Increased automation ▶ Reusable applications
▶ Modernized security
posture and metrics
▶ Reduced licensing costs
IT Maturity
9. 1.0 | Summary
A Better Approach
As the leaders in cloud analytics, Booz Allen has a proven approach
2.0 | Differentiation
delivered by some of the industry’s best talent. Here’s why we’re different:
Technical framework A LOOK AHEAD
Our Architecture combines the collective experience of thousands of people who have road tested
technologies from across the cloud solution landscape in hundreds of client organizations, ranging
from the U.S. Federal Government to commercial and international clients.
3.0 | Depth
Section 2.0
Best practices Differentiation: Introduces and diagrams
We have an exclusive set of lessons learned and breadth of technical knowledge that saves time and the Architecture, and explains how it
money while reducing risk. reflects Booz Allen’s unique approach.
You’ll also read about our core design
principles, extensive service offerings,
Core principles and technology choices.
4.0 | Successes
These are “rules of the road” we’ve developed to build the most effective solution with the highest Pages 10–15
return on investment. They encompass everything from how data should be stored to how to improve
relationships with the end users of your data.
Section 3.0
Depth: Takes the Architecture apart layer
Critical skill sets by layer with detailed visuals, design
We bring technologists as architecture and solutions specialists, domain experts who know your concepts, and recommended solutions
industry and your data, and data scientists who explore and examine data from disparate sources from the cloud vendor landscape. The
and recommend how best to use it. No one else in the industry offers a better combination of talent. section ends with a look at how security is
built into all levels. Pages 16–30
Vendor neutrality
Our approach utilizes a broad ecosystem of products and custom systems culled from an exhaustive Section 4.0
survey of available options. In the crowded, fragmented, and continually evolving landscape of cloud Successes: Presents real-world
solutions, we recommend only the best fit and value for your organization. examples from our extensive file of case
studies. We present the solutions and
challenges, describe and diagram the
implementations, and explain the results.
Pages 31–35
9
11. 1.0 | Summary
A Layered
Framework
Human Insights and Actions
2.0 | Differentiation
Enabled by customizable interfaces
The Booz Allen Cloud Analytics and visualizations of the data
Reference Architecture Visualization,
Reporting,
incorporates a wide range
Dashboards, and
of services to move from Query Interface
a technology infrastructure
with chaotic, distributed data
burdened by noise to large-
Analytics and Services
Services (SOA) Your tools for analysis, modeling,
scale data processing and
testing, and simulations
analytics characterized by
speed, precision, security,
3.0 | Depth
scalability, and cost efficiency.
Analytics and
Discovery
However, Booz Allen’s approach
is about much more than
infrastructure. We start with
your need to make better sense
and better use of your mission
Views and Indexes
data, and build from there.
4.0 | Successes
Streaming
Indexes
Data Management
The single, secure repository
for all of your valuable data
Data Lake
Metadata Tagging
Data Sources
Infrastructure
A FRAMEWORK FOR SECURITY Infrastructure/ The technology platform for storing
Management and managing your data
Page 30 details our security processes
11
12. Booz Allen
Cloud Analytics Service Offerings
cloud strategy and Cloud application migration advanced cloud analytics infrastructure
economics
Expertise in the assessment, prioritization, Delivery of scalable analytics platforms allowing Design and implementation of IaaS offerings
Delivery of strategy, technology, and economic architectural mapping, re-engineering, and the processing of information at extreme scale; to provide global access to data storage,
analysis for evaluating and planning all of the optimization of workloads that have high value and eDiscovery: high-volume, full text indexing, computing, and networking services on
business, technical, operational, and financial and are ready for migration to the cloud and context-based search of information demand through self-service portals
aspects of a cloud transition
cloud security software and platform vdi deployment and data center migration
integration and optimization
Unified risk management approach to define cloud Expertise in the secure implementation of SaaS and
security requirements, controls, and a continuous PaaS service delivery models, data migration, and Delivery of flexible and dynamic virtual desktop Identify critical factors, design, and execute
monitoring framework to address data protection, integration with existing enterprise infrastructure infrastructure to simplify management, reduce the transformation of legacy IT systems to
identity, privacy, regulatory, and compliance risks and applications licensing costs, and increase desktop security virtualized and cloud computing environments
and data protection requirements
13. 1.0 | Summary
Complex Ecosystem
We’ll help you navigate the crowded, fragmented, and continually evolving
2.0 | Differentiation
vendor ecosystem to design a best-of-breed solution for your organization.
3.0 | Depth
4.0 | Successes
Human Insights and Actions
Analytics and Services
Data Management
Infrastructure
Data Integration
13
14. Booz Allen
Cloud Analytics Core Principles
In-situ processing Use commodity hardware Schema on read
The Architecture demands that “you send the Hardware should be expected to fail as the If you have all the source data indexed and query-
question to the data,” because most big data normal condition. The Architecture supports able, plus the ability to create aggregations, then
processes are disk I/O-bound. In-situ processing both scalability and fault tolerance to achieve you can manage complex ontologies and demands
means that most of the computation is done optimal application load balancing. in a very efficient manner.
locally to the data, so that analytics run faster.
This can enhance existing analytic capabilities
and/or allow you to ask entirely new types
of questions.
Throw away nothing Economies of scale Change development process
Near-linear scalable hardware and software What used to be called service-oriented In order to develop a tight, iterative relationship
systems allow much more data to be stored, architecture (SOA) means that you can with your end users, you can develop/research
which enable reprocessing of historical data define the value and cost of services in a new capability in hours (not months), and the
with new algorithms and correlations that your enterprise, and plan your development process of discovery and integration with the rest
bring new insights. actions, either to reduce the cost of low-value of the enterprise begins much sooner, too.
components or increase the scale of high-
value components.
Data tagging
You can now afford to tag all of your data for
sensitivity or other controls (such as geographic).
This is the fastest, most reliable way to instrument
change across your entire Data Lake.
15. 1.0 | Summary
Deeper The Cloud Analytics Reference Architecture enables staff at all levels to quickly gather and act on granular insights
from all of your available data, regardless of its format or location. Below are some of the ways human insights and
Insights actions are enhanced by this new framework, which fosters greater collaboration and teamwork, and, ultimately,
delivers the highest business value from your information and your computing infrastructure.
2.0 | Differentiation
3.0 | Depth
Human Insights and Actions Analytics and Services Data Management Infrastructure
DecisionMakers, Analysts and Data Developers and Data System Administrators
Investigators, Interdictors, Scientists Scientists and IT Staff
4.0 | Successes
AND Analysts
▶▶ reate and use many views into the
C ▶▶ o longer constrained by years-old
N ▶▶ educe IT costs through commoditization
R
▶▶ eal-time alerting, situational awareness, and
R same data schemas and economies of scale
dissemination specific to their clearance level ▶▶ Automatically find trends and outliers ▶▶ C
atalog and index the data that is ▶▶ Meet long-term scalability requirements
▶▶ nvestigate and provide feedback
I ▶▶ valuate analysis methods to determine
E relevant today
on reporting and enhance best-of-breed tradecraft ▶▶ F
ree to create new views and
▶▶ Interact and search using tailored tools reporting metrics
▶▶ R
eference undiscovered trends in
original data
▶▶ A
pply advanced machine learning and
statistical methods
▶▶ In-situ hypothesis testing
15
15
16. 3.0
Depth
in this section
We diagram and describe each layer
of the Cloud Analytics Reference
Architecture, including our design
principles and technology choices.
17. 1.0 | Summary
Reference
Architecture
2.0 | Differentiation
Booz Allen’s Cloud Analytics layer 1
Reference Architecture provides
a holistic approach to people, Human Insights and Actions
processes, and technology in four Building on results and outputs from various analytical
tightly integrated layers. methods, multiple data visualizations can be created
in your new cloud analytics solution. These are used to
compose the interactive, real-time dashboard interfaces
Key Attributes your decision-makers and analysts need to make sense
Human Insights and Actions of your data.
By design, the Booz Allen Cloud
Analytics Reference Architecture:
3.0 | Depth
layer 2
▶▶ s reliable, allowing distributed
I Analytics and Services
storage and replication of bytes Both traditional and “Big Data” tools and software
across networks and hardware can operate on the information stored in your Data
that is assumed to fail at any time Lake, producing advanced specific analysis, modeling,
▶▶ A
llows for massive, world-scale testing, and simulations you need for decision making.
storage that separates metadata
from data Analytics and Services
4.0 | Successes
▶▶ S
upports a write-once,
sporadic append, read-many
usage structure layer 3
▶▶ S
tores records of various sizes, Data Management
from a few bytes up to a few Your Data Lake is a secure, distributed repository of a
terabytes in size wide variety of data sources. Security, metadata, and
▶▶ A
llows compute cycles to be indexing of Big Data are enabled by distributed key
easily moved to the data store, value systems (NoSQL), but the Architecture allows for
instead of moving the data to traditional relational databases as well.
a processer farm Data Management
layer 4
Infrastructure
This foundational layer allows for quick, streamlined,
low-risk deployment of the cloud implementation. The
plug-and-play, vendor-neutral framework is unique to
Booz Allen.
A FRAMEWORK FOR SECURITY
Infrastructure
Page 30 details our security processes
17
18. layer 1
Human Insights and Actions
architecture model
Monitoring Exploratory
Geospatial
Line Chart
Analytics and Services
19. 1.0 | Summary
Human Insights and Actions (continued)
PRINCIPLES AND TECHNOLOGIES
2.0 | Differentiation
In analytics solutions built on the Architecture, the data that’s available and the desired results drive the interfaces— TECHNOLOGY EXAMPLES
3.0 | Depth
not the other way around. When user communities and stakeholders aren’t restricted by their tools, they can perform
complex visualizations to identify patterns they previously couldn’t see. HTML5, JavaScript, OWF, Synapse Lightweight, custom web-based
applications and dashboards tailored
That freedom defines the guiding principles behind this first layer of the Architecture: to specific user communities or
stakeholders for data exploration, event
alerting, and monitoring, as well as
▶▶ Design and build the framework so that the desired data and analytic results define the visualization continuous quality improvement
▶▶ Reuse results and outputs of analytics across different visualizations
▶▶ ecouple the underlying analytics and data access from the visualizations and interfaces so that it’s possible
D
4.0 | Successes
to build customized, interactive dashboard interfaces composed of dynamically linked visualizations Commercial products (Splunk, Out-of-the-box, easy-to-build dashboards
Pentaho, Datameer Business for historical trending and real-time
Infographics, etc.) monitoring to analyze user transactions,
customer behavior, network patterns,
security threats, and fraudulent activity
Adobe Flex and Adobe Flash Despite the rise of HTML5, Adobe Flex
and Flash applications still remain
strong candidates for quickly building
and deploying rich user interfaces
19
20. layer 2
Analytics and Services
architecture model
Human Insights and Actions
Time Series Social Network
Analysis
R, SAS, Matlab, MapReduce, Hive,
Mathematica Pig, Hama
21. 1.0 | Summary
Analytics and Services (continued)
PRINCIPLES AND TECHNOLOGIES
2.0 | Differentiation
TECHNOLOGY EXAMPLES
Data Mining Data mining is used to discover patterns
in large datasets and draws from multiple
fields including artificial intelligence,
machine learning, statistics, and
database systems.
Frequently where data is concerned, the whole is greater than the sum of its parts. In the most strategic business
3.0 | Depth
Machine Learning Machine learning is used to learn
decisions, the ability to combine multiple types of analyses creates a holistic picture that can lead to much more
classifiers and prediction models in
valuable insight. With the Cloud Analytics Reference Architecture, you can implement different the absence of an expert and employs
types of analytical methods. many algorithms in the areas of
decision trees, association learning,
artificial neural networks, inductive logic
This integrated approach is an anchor for the guiding principles behind our Analytics and Services layer:
programming, support vector machines,
clustering, Bayesian networks, genetic
▶▶ llow both traditional and Big Data analysis tools and software to operate on a centralized repository of data
A algorithms, reinforcement learning, and
(the DataLake) representation learning.
4.0 | Successes
▶▶ Integrate results and outputs of analyses and visualize them on dashboards for decision making
▶▶ Decouple tools from the various types of analyses to make the system more extensible and adaptable
▶▶ nclude a service-oriented architecture layer to reuse results and outputs in many different ways relevant to
I Natural Language Processing (NLP) NLP is used to process unstructured
and semi-structured documents for
different stakeholders and decisionmakers the purposes of information retrieval,
▶▶ ncorporate Certified Catastrophe Risk Analysis (CCRA) to allow a variety of data analysis tools and software
I sentiment analysis, statistical machine
translation, and classification.
to be integrated and used; it also enables results and outputs of analyses to be visualized and used across
multiple interfaces
Network Analysis Network analysis using graph theory
and social network analysis are
used to understand association
and relationships between entities
of interests.
Statistical Analysis Traditional statistical methods using
univariate and multivariate analysis on
relatively small datasets are employed
to make inferences, test hypotheses,
and summarize data.
21
22. layer 2
Analytics and Services (continued)
discovering your data
Before working with Booz Allen, most clients faced a fundamental
challenge with data discovery. They didn’t know what data was actually
available or how to sort through all of it to identify the most important
business problems or trends it could reveal.
Search
Technical framework
Discovery is intimately related to search and analysis. All three feed into insight in a nonlinear fashion.
A search-discovery-analytics process that solves business problems without consuming disproportionate
resources meets these user needs:
▶▶ Real-time, ad hoc access to content
▶▶ Aggressive prioritization based on importance to the user and the business
▶▶ ata-driven decision making, which relies on the ability to try different approaches and ideas in order
D
Analytics Discovery
to discover previously unimagined insights
▶▶ Feedback/learning from the past intelligently applied to today’s data
How booz allen simplifies discovery
Other solutions require analysts to break down data into numerous subsets and samples before it can How does the Architecture support fast, efficient, and scalable search on entire datasets, not just samples?
be digested. This expensive, time-consuming process is one of the major roadblocks to turning data into
true business intelligence. ▶▶ ulk and soft real-time indexing enable the solution to handle billions of records with subsecond
B
search and faceting
Even though the Booz Allen Cloud Analytics Reference Architecture supports the most advanced ▶▶ arge-scale, cost-effective storage and processing capabilities accommodate “whole data”
L
analysis, it can also allow your staff to sift through all of your data on a basic level. Without tedious or consumption and analysis; in-memory caching of critical data ensures applications meet performance
sophisticated sampling and complex tools, they can discover what’s useful and what’s not useful for a requirements
specific business problem. ▶▶ LP and machine learning tools can scale to enhance discovery and analysis on very large datasets
N
23. 1.0 | Summary
Analytics and Services (continued)
HOW THE DATA SCIENCE LIFECYCLE DISTILLS INSIGHTS
2.0 | Differentiation
The data science lifecycle consists of three basic steps:
Step 1
First, data is sampled using a cloud analytics platform. This step may involve a sophisticated analytic
that runs in the cloud, such as one that crawls a social network to find people with certain types of
relationships with an individual or organization. This sampling can be done using either high-level query
3.0 | Depth
languages that are specially made for scalable cloud analytics or low-level developer interfaces.
Step 2
Next, a data scientist models the data sample in order to understand it better. This is usually done using
a statistical modeling environment on the data scientist’s workstation.
4.0 | Successes
Step 3
Finally, once a trend is established using the model, the data scientist works with analysts and domain
experts to explain the trend and yield insights.
This cycle is repeated until the data science team reaches actionable insights and intelligence that
can be presented to senior leadership for decision making purposes. Information may be delivered in
a visualization, dashboard, or written report.
23
25. 1.0 | Summary
Data Management (continued)
PRINCIPLES AND TECHNOLOGIES
2.0 | Differentiation
A central feature of the Architecture, the Data Lake delivers on the promise of cloud analytics to offer previously hidden insights and TECHNOLOGY EXAMPLES
drive better decisions. It’s a secure repository for data of all types and origins. Instead of precategorizing data, which restricts its
usability from the moment it enters your organization, the Architecture combines unstructured, structured, and streaming data types
3.0 | Depth
Hadoop Distributed File The primary open-source, distributed
and makes them available for many different forms of analysis. System (HDFS) storage system creates multiple
replicas of data blocks and
The following principles demonstrate how the Architecture enables your organization to use this repository of enterprise distributes them on compute nodes
throughout a cluster to enable
data to the best advantage: reliable, rapid computations.
▶▶ Provide inherent replication of the data through a distributed file system
▶▶ se distributed key value (NoSQL) data storage to enable security and metadata tagging at the data level Accumulo NoSQL store based on Google’s
U
BigTable design features cell-level
as well as indexing for specialized retrieval
4.0 | Successes
security access labels and a server-
▶▶ R
elax schema constraints and provide the flexibility to adapt to changing data sources and types with the side programming mechanism
schema-on-read approach of distributed key value data storage that can modify key/value pairs
▶▶ at various points in the data
Store the Data Lake on commodity hardware and scale linearly in performance and storage
management process.
▶▶ Don’t presummarize or precategorize data
▶▶ Enable rapid ingest of data, aggressive indexing, and dynamic question-focused datasets through scale
Hbase, Cassandra, MongoDB Open-source NoSQL databases
focused on a combination of
consistency, availability, and
partition tolerance.
Neo4j NoSQL scalable graph database
storing data in nodes and the
relationships of a graph.
25
26. layer 3
Data Management (continued)
DATA LAKE
Booz Allen works with organizations in corporate and government sectors that have an urgent
need to make sense of volumes of data from diverse sources, including those that had been
inaccessible or extremely difficult to utilize, such as streams from social networks. Now analysts
and decisionmakers can form new connections between all of this data to uncover previously
hidden trends and relationships.
Enterprise Machine-to- Transaction Sensor Data
Data machine logs
communication
Intrusion and
malware detection Fraud Detection
Cyber Government Defense Finance healthcare
Security Logs Regulatory Compliance Enhanced Situational Enhanced Email
Quarterly Filings Awareness Forecasting Reports
System Logs Financials
Press Articles
Individual organizations require different types of data. Not all types of data listed above may apply to every organization.
27. 1.0 | Summary
Enhanced Media Content
Booz Allen’s strategy and technology consultants are highly regarded subject matter experts. Through groundbreaking
conference keynotes, whiteboard talks, and papers, they help educate and shape the analytics industry.
2.0 | Differentiation
We invite you and your team to take advantage of the educational resources listed below to gain strategic insights
about the use of analytics, explore technical topics in depth, and stay on top of the latest trends.
Presentations Videos
Yahoo! Hadoop Summit: Biometric Databases and Hadoop Cloud Whiteboard Playlist
Invented and demonstrated methods for dense data correlation (e.g., imagery and Short instructional videos on a range of topics from introductory talks for executives
3.0 | Depth
biometrics) within a Hadoop distributed computing platform using new machine learning to tutorials for data analysts. Check back frequently for new material.
parallel methods.
Cloud Analytics for Executive Leadership
Yahoo! Hadoop Summit: Culvert—A Robust Framework for Secondary Indexing of Booz Allen Principal Josh Sullivan discusses how analysis of data can be used
Structured and Unstructured Data as a tool to provide insight to executives.
Demonstration of Booz Allen’s secondary indexing solutions and design patterns,
which support online index updates as well as a variation of the HIVE query language Informed Decision Making: Sampling Techniques for Cloud Data
over Accumulo and other BigTable-like databases to allow indexing one or more columns Booz Allen Data Scientist Ed Kohlwey explains how sampling large amounts
4.0 | Successes
in a table. of data can be useful for program managers to make informed decisions.
Slidecast: Hadoop World—Protein Alignment Developer Perspectives: The FuzzyTable Database
Demonstration of advanced analytics in using protein alignment sequences to identify Booz Allen Data Scientist Drew Farris explains how to use the FuzzyTable
disease markers using Hadoop, HBase, Accumulo, and novel machine learning concepts. biometrics database.
Slidecast: Innovative Cyber Defense with Cloud Analytics
Presentation on improving intelligence analysis through a hybrid cloud approach Workshop
to analytics, with descriptions and diagrams from Booz Allen client solutions. O’Reilly Strata Conference: Beyond MapReduce—Getting Creative with Parallel Processing
Technical discussion of MapReduce as an excellent environment for some parallel
Slidecast: Integrating Tahoe with Hadoop’s MapReduce computing tasks and the many ways to use a cluster beyond MapReduce.
Invented and demonstrated method to use least-authority encrypted file system as plugin
to HDFS within Hadoop cluster.
Learn More
Papers
Massive Data Analytics in the Cloud Scan the QR code, or go directly to:
Overview of the business impact of cloud computing, and how data clouds are boozallen.com/cloud
shaping new advances in intelligence analysis.
27
29. 1.0 | Summary
Infrastructure (continued)
PRINCIPLES AND TECHNOLOGIES
2.0 | Differentiation
3.0 | Depth
TECHNOLOGY EXAMPLES
Infrastructure is the foundation for any cloud implementation. What The following principles guide the infrastructure layer of
makes the Booz Allen Cloud Analytics Reference Architecture unique the Architecture:
is its plug-and-play, vendor-neutral framework. This framework not Amazon Web Services, Microsoft Cloud tool chain for provisioning,
only allows a greater range of choices in selecting resources and ▶▶ ake it easy to transform physical resources from legacy IT
M Azure, Puppet, VMware, vSphere configuration, orchestration, and
monitoring of virtual environment.
building services, it also allows for a faster, more streamlined, more systems to secure, virtualized data centers and trusted cloud
These tools provide the building blocks
secure, and lower risk deployment. computing environments for IaaS, PaaS, and foundation for
▶▶ mplement core services to provide the mechanisms to realize
I SaaS. Run multiple operating systems
4.0 | Successes
on-demand self-service, broad network access, resource pooling, and virtual network platforms on the
same hardware—sharing computing,
rapid elasticity, and measured service storage, and networking resources.
▶▶ mploy virtualization to increase utilization of existing assets and
E
resources, and improve operational effectiveness
▶▶ ngineer in-depth security to provide controls and continuous
E Security through VMware, McAfee, Protect assets—physical, logical, and
monitoring in order to fully address data protection, identity, Symantec, Cisco, TripWire, EnCase virtual—while automating governance
privacy, regulatory, and compliance risks and compliance.
29
29
30. Reference Architecture
Security Framework
Business
Assets to Be Protected Geography
Threats and Processes that Require Security Distributed Sites | Remote Workers | Jurisdictions
Organizational Security Time Dependencies
Governance | Supply Chain | Strategic Partnerships Transaction Throughput | Lifetimes and Deadlines
Business Layer
Conceptual
Business Attributes Technical and Management Security Strategies Roles and Responsibilities
Security Requirements to Support the Business
Trust Relationships Time Dependencies
a framework Control and Enablement Objectives
Security Domains, Boundaries, and Associations When Is Protection Relevant?
for security Resulting from Risk Assessment
The Architecture is designed to Business Layer
Logical
protect your data at rest and Business Information to Be Secured Interrelationships Security Services
in flight, with security controls Security and Risk Attributes
Authentication Confidentiality and Integrity
Protection | Strategic Partnerships
embedded in each layer. This Security Entities Management Policy
is obviously more than just
Business Layer
Physical
a technology challenge. We
Security-Related Data Structures Security Mechanisms Human Interface
understand the need to embed Tables | Messages | Pointers | Certificates | Signatures Encryption | Access Control | Digital Signatures Screen Formats | User Interaction
Access Control | Systems
new processes and training Security Rules Security Infrastructure
Physical Layout of Hardware, Software, and
Conditions | Practices | Procedures Time Dependencies
regimens so your staff handles Communication Lines Sequence of Processes and Sessions
sensitive data correctly. We also
advise you on how to secure your Business Layer
Component
Security IT Products Personnel Management Tools Time Dependencies
facilities and ensure that all off- Risk Management Identities | Roles | Functions | Access Controls | Lists Time Schedules | Clocks | Timers and Interrupts
premise facilities have the right Tools for Monitoring and Reporting
Security Process
Locator Tools
Dynamic Inventory of Nodes | Addresses and Locations
controls in place as well. Tools, Standards, and Protocols
Business Layer
Service Management
Service Delivery Management Management of Security Operations Management of Environment
Assurance of Operational Continuity Admin | Backups | Monitoring | Emergency Response Buildings | Sites | Platforms and Networks
Operational Risk Management Personnel Management Management Schedule
Risk Assessment | Monitoring and Reporting Account Provisioning | User Support | Management Security-Related | Calendar and Timetable
31. Case Studies
1.0 | Summary
2.0 | Differentiation
3.0 | Depth
4.0
4.0 | Successes
Successes
in this section
These case studies show how Booz
Allen uses superior technology and
analytics expertise to solve complex
problems for clients in a wide range
of corporate and government sectors.
31
32. Case Studies Example
One
Improving Intelligence Analysis
Mission Solutions
To fulfill their mission, this Booz Allen worked closely with scalable and flexible to support Data Management
organization requires data the client to adopt a data cloud future innovation and evolution The data sources had multiple
correlation, quick access to implementation by augmenting without reengineering. formats, were large in size,
analytic results, ad-hoc queries, the legacy relational databases and distressed with noise.
advanced scalable analytics, with cloud computing and Interfaces and Visualizations The solution created deep
and real-time alerting. analytics. The design focused Dashboards, web applications, insight through fusion of
To provide their analysts on keeping transactional- client applications, and rich different data types at scale.
with a continuous pipeline based queries in the current clients interfaced and integrated The solution enabled the
of prioritized, actionable relational databases, while with advanced analytics ability to follow the lineage or
information, they needed a doing the “heavy lifting” in infrastructure and legacy pedigree of the data, allowing
secure, scalable, automated the cloud and outputting the relational databases through a the client to map cost in
solution that would more interesting, processed, or SOA business logic layer. relation to the value of the data
quickly and precisely sift desired analytic results into or how well it is being used.
through large (and growing) relational data stores for quick Analytics and Services
volumes of complex data transactional access. The solution called for Infrastructure
characterized by a variety of predictive analytics to forecast The solution used Accumulo
formats and noise. In addition, With many existing systems potential events from existing (distributed key value
they needed to leverage their and applications dependent on data and anomaly detection to systems/NoSQL database)
existing analytics infrastructure the legacy relational database extract potentially significant for content normalization
in the new platform. for transactional queries of information and patterns. The and indexing, MapReduce as
data, Booz Allen pulled together solution leveraged the core the precomputation engine,
excess servers from the client’s principle of cloud analytics that and HDFS for scalable ingest
infrastructure to build a hybrid enables automated analysis and storage.
cloud solution. Also, as the techniques, precomputation,
client’s needs change to adapt and aggressive indexing.
to the mission, the solution is
Impact
Rather than simply focus on gaining IT efficiencies by using cloud technology for infrastructure, Booz Allen focused on applying cloud analytics
and in-depth understanding of the organization’s operational and mission needs to extract more value faster from massive datasets. The
new cloud solution provided immediate and striking improvements across the increasing volume of structured and unstructured data using
aggressive indexing techniques, on-demand analytics, and precomputed results for common analytics.
The final solution combined sophistication with scalability, moving the organization from a situation in which analysts stitched together sparse
bits of data to a platform for distilling real-time, actionable information from the full aggregation of data.
33. Case Studies Example
Two
1.0 | Summary
Planning and Responding to Disaster
2.0 | Differentiation
Mission Solutions
This organization, which Booz Allen developed a tool, Splunk, to mine through Data Management
Geotagging Link Analysis
is responsible for disaster framework to capture, normalize, and analyze vast amounts The solution framework
planning and response, found and transform open-source of data in real time, while captured live, streaming open-
that social media could provide media used to characterize outputting characterization source media such as Twitter
timely situational awareness for and forecast disaster events, and forecasting metrics of and RSS feeds. Data was
biological (and other disaster) in real time. The framework captured events. captured in Splunk and stored
events. They wanted a solution incorporated computational on AWS.
to better characterize and and analytical approaches Interfaces and Visualizations
3.0 | Depth
forecast emerging disaster to turn the noise from social The solution included
Risk Scoring Predictive Modeling events using social media data media into valuable information dashboards that characterized
as it streams in real time. With using algorithms such as term events captured in social
such a solution in place, the frequency-inverse document media. The visual analyses
organization could increase frequency (TF-IDF), natural include event extraction counts,
overall preparedness by language processing (NLP), time series counts, forecasting
leveraging event characterization and predictive modeling to counts, a symptom tag cloud,
to accurately predict the impact characterize and forecast the and geographical isolation.
4.0 | Successes
and improve the response. numbers of sick, dead, and
hospitalized, as well as to Analytics and Services
Provider In order to reach their goal, extract symptoms, geography, TF-IDF and NLP algorithms
Profile the organization needed higher and demographics for specific were used to classify and
levels of confidence in the illness events. extract relevant information
Clean, Validate, social media data on which they from the data. Booz Allen
Normalize, Integrate
would base their decisions. The solution framework developed predictive models
The specific challenges the was implemented in the for forecasting event frequency
Provider Online Cases/ new solution had to overcome cloud, taking advantage of and counts. The algorithms
Financial Registration Geolocation Licensing Exclusion Claims Activities Rulings
included data ingestion and the flexible computational were written in Python and
normalization, social media power and storage. The incorporated into Splunk
vocabulary, social media new cloud infrastructure located on Amazon Web
characterization, information allowed Booz Allen’s data Services (AWS).
extract, and geographical capturing and visualization
isolation of events.
Impact
The new Booz Allen solution, which builds upon current best practices in cyber terrorism, enables near real-time situational awareness through
a standalone surveillance system that captures, transforms, and analyzes massive volumes of social media data. By leveraging social media
data and analytics for more timely and accurate disaster characterization, the organization is able to more effectively plan and respond.
33
33