SlideShare a Scribd company logo
1 of 18
PwC Advisory
Apache Hadoop
Summit ‘ 2016
The Future of Apache Hadoop
An Enterprise Architecture View
www.pwc.com/unlockdatapossibilities
2
Presenters
Oliver Halter
Partner, Information Strategy and Big Data
oliver.halter@pwc.com
Ritesh Ramesh
Chief Technologist, Global Data and Analytics
ritesh.ramesh@pwc.com
3
Contents
1 2 3 4 5
Trends Challenges Opportunities Accelerating
adoption
through a
Capability
Driven Approach
Real life
Case
Studies/Lessons
Learnt
4
PwC's global data & analytics surveys & trends
PwC, 2016 Global CEO Survey, January 2016 PwC, Global Data and Analytics Survey: 2016
Big Decisions™
73% Data and Analytics Technologies
generate the greatest return in terms of
engagement with wider stakeholders
32% Nearly one in three said developing or
launching new products and services is their
leading ‘big decision’. Does your data & analytics
effectively support you?
5
Although we are increasingly seeing the use of Hadoop among
mainstream companies key barriers still remain for its holistic
success and adoption as an enterprise platform
An
enterprise is
a complex
system of
components
Adoption Barriers
1 2 3 4
Incoherent
Enterprise View
Overcrowded
technology
ecosystem
Lack of User
Centricity
Siloed
Ownership
6
We believe external market forces will propel enterprises to
embrace the Data Lake as a foundation of their data, analytics and
emerging technology strategies
1.InternetofThings
3.Digital
4.ModernData
Management
2.ArtificialIntelligence
5.Analytics
6.CyberSecurity
Enterprise Data Lake
1. Grow the Business
2. Optimize Spend
3. Innovate
4. Mitigate Risks
Emerging
Technology
Platforms
Connecting the dots
between various
strategic technology
initiatives within the
enterprise is going to
be critical to
capitalize on the
opportunity....
7
There are lots of opportunities to innovate and accelerate
enterprise adoption of Hadoop by abstracting sophistication with
simplicity and superior end user experience
Existing Innovations enabling Acceleration Opportunities to close the gaps
Cloud based Marketplaces and Solutions
Third Party on-demand, ‘Smart’ Data Wrangling
solutions leveraging high performance
components in Hadoop
Open Source Analytics and AI Libraries
Third Party ‘Hadoop in a Box’ integrated solutions
Vendor distributions and developer communities
– well established
1
2
3
4
5
Data extraction and semantic text analytics
libraries for complex data structures – Nested
XML’s, PDF’s and Unstructured Data
Model Management and integration tools
facilitating seamless interoperability or migration
from existing technology investments ( data
warehouses and applications)
Bringing Visualization to the data stored with
Hadoop with native libraries and third party tools
Adaptive & Dynamic Workload Management
Native Data Masking and Encryption Features
1
2
3
4
5
8
Jumpstart/accelerate Hadoop journey with these 4 core tenets
Capability
Driven
1
Right Fit3
Flexible Operating Model4
Heterogenous2
Third Party
Tool Integration
PwC’s Next
Generation
Information
Architecture
1 2
34
Cloud
Interoperability
Legacy
Integration
Data
Migration
On-Premise
Cloud
In-Memory
Disk based
NoSQL typesSupport Model
Training
Use Cases/
Demad Intake
Services
Catalog
Business
Adoption
Innovation
Platform
Monetization
Analytics
Application
Development
Enterprise
Data
Mnagemnet
*https://www.pwc.com/us/infoarchitecture
9
Tenet 1: Capability Driven
Focus on capturing the current and future information and analytics needs of every business
function and external partners to drive the architecture
PwC’s Data
Lake
Capability
Framework
Data Quality/
Integration
2
Data
Architecture
3
Metadata
Management
4
Analytics/
Reporting/
Visualization
5
Data
Access
6
Security
7
Governance/
Organization
8
1
Data
Ingestion
Modern data management technologies (ELT based, Data
wrangling etc.) used for cleansing, standardizing and
integrating the data from multiple internal and external
sources leveraging the scalable computing platform
Ability to manage and store data in normalized or
denormalized structures on disk, in-memory,
row vs. columnar vs. column family based data stores
(Hive, Spark, HBase, RDBMS etc.) in depending on
the use cases
Ability to track data sources ingested into the data
lake, track data lineage and provenance of storage and
processing activities
Metrics, Tools and processes required to visualize and
comprehend data stored in the data stores in form of
reports, dashboards and scorecards for business users
Ability to ingest data in batch & real time modes
in various forms –Databases, Files, Streams
and Queues
Centralized and coordinated management
of projects/activities, managing change
and communication of key milestones
and business benefits
Capabilities to secure personally
identifiable information in the next
generation platform and create role based
access to business users
Ability to access stored data from
the Platform through a consistent &
secure API
10
Tenet 2: Heterogeneous
Hybrid set of both traditional and emerging technologies and platforms to acquire, store,
interlock and analyze internal and external data will be the norm going forward. Design
for simplicity and iteratively build your modular architecture with transition states towards
the target
Sources of
Known Value
Sales Transactions
Customer
Product
Physical Assets
Sources of
Unproven Value
Call Center
Social Media
Web Clickstream
Mobile Interactions
Data Ingestion Layer
ETL Connectors
Sqoop
Kafka
Flume
Emerging – Open Source
Illustrative model from a national retailer
Emerging – Licensed Traditional – Licensed Licensed+Open Source
ETL
Match-Merge
Services
Metadata
Management
Spark
Data Analytics/
Visualization
Standardized
Reporting
On-Demand/
Adhoc
Analytics
Modeling
API based Apps.
ELT
Relational
Schemas
Enterprise Data warehouse
Data Exchange
HDFS
RDD HBase
Data
Wrangling
Hive
(Parquet)
Enterprise Data Lake
11
Tenet 3: Right Fit
Enterprises need to develop a decision model which identifies the mix of ‘right fit’ open source
as well as commercial solution components, either hosted on the cloud or On Premise, based on
functionality and business needs
Illustrative
On Premise
Build ? Buy ?
Vendor Dist. ? Constraints ? Base Platform ? End-End Stack ?
3rd party
Cloud/Tools?
Security? Cloud integration?
Pre-Requisites
(Hardware, Drivers, Software Interoperability)
Cloud
Build ? Buy ?
3rd party
Cloud/Tools?
Security?
On Premise
Integration?
Pre-Requisites
(Hardware, Drivers, Software Interoperability)
Cloud Vendor ?
Vendor Dist.
(IaaS)?
Which Native
Services (PaaS)?
12
Tenet 4: Flexible Operating Model
Recognizes the sophistication and analytics maturity at a business function level and enables
the required capabilities with the necessary skills, processes, tools and support
1. Business alignment on how Haddoop environment will
operate. This includes defining
- Services Catalog
- Service level Agreements
- Tracking Usage, Benefits and Costs
- User Onboarding & training
2. Defining the Business architecture
- Identify capability areas and opportunities to inform the
Big Data Strategy
- Use Case Evaluation (risk, feasibility and business case)
- Prioritization criteria
- Demand / Intake process
- Business Roadmap
1. Technology Alignment on how the Hadoop environment
will operate. This includes defining:
- Access Model (Self service vs. Controlled)
- Data acquisition and classification strategy
- Organization (Develop vs. Support)
- Technical Skills Training
2. Defining the Technology architecture
- Architecture Guiding Principles
- Leading practices for data acquisition, management and
delivery
- Reference Architecture with solution patterns for the
various use cases
- Storage and infrastructure Planning
- Security Model
Business Operating Model Technology Operating Model
13
Five step strategic approach to build a strong data lake foundation
Recognizes the sophistication and analytics maturity at a business function level and enables
the required capabilities with the necessary skills, processes, tools and support
Capabilities
Leveraging client’s stated capabilities and PwC’s Capability framework with business interviews, analytical capabilities are
captured and documented1
Use Case Specifications
Define success criteria, information sources, dimensionality and information delivery mechanism for each use case. Each Use
Case must be mapped to a set of Capabilities2
Platform Architecture &
Operating Model
Define end-end architecture components (‘lego blocks’) mapped to the capabilities identified with leading practices for
ingestion, management , analytics and visualization. Identifies the organization, process and support structure required for agility3
Strategic Roadmap
for Execution
Organize the initiatives in a sequenced roadmap with scope, duration and dependencies under various themes5
Architecture Patterns
Depict the architecture pattern at the use case level , leverages the logical architecture ‘lego blocks’ and also shows the
information flow, respective technology component and integration touch point with client’s systems4
14
Case Study # 1 – Financial Services Provider – Risk Modeling for
their Loans Portfolio
Current State
Future State
• The client developed a next generation information management
and analytics platform which was more business centric with an
operating model that enables agility, self service, faster data
management and deep analytics for the business stakeholders
• Data processing window was reduced from 8-10 hours to less than
30 minutes
• Business Users were able to access more granular historical data
for ad hoc analysis and analytics models
TableauSAS CSV Files
No capability to look
back history past the
last month of data
Sources two CSV
files (total ~ 3 M
rows of data)
Aggregation logic
performed – CSV
data files exported
Hadoop Distributed File System
TableauHive Spark
Aggregation and Data transformation logic
performed using HiveQL on 67M records
and 36 columns (14.7 GB of data in Hive,
16.3 GB in memory in Spark SQL)
Response time between
2s and ~ 1 min per filter
sourcing live data via
Spark SQL
Current Process – Adhoc Analysis – 8-10 hours
Future Process – Adhoc Analysis – < 30 minutes
• Lack of an integrated architecture and scalable technology
infrastructure contributed to data management challenges
• The business analytics and modeling teams were looking for more
self-sufficiency and process agility
• Lacked program leadership and program management discipline
specifically for third party services and solution providers
• Data Acquisition and management processes lacked a consistent
design and architecture and were heavily siloed on an application
– application basis
Any trademarks included are trademarks of their respective owners and are not affiliated with, nor endorsed by, PricewaterhouseCoopers LLP, its subsidiaries or affiliates.
15
Case Study # 2 – Leading Retail Distribution Company – Trade
Promotion Effectiveness
500k SKU’s, 250k customers, 5k suppliers, 6k Fleets
Current State
• On-premise, rigid infrastructure with serial data processing
and limited capacity
• Delayed data availability reducing applicability to impactful
business decisions
• No integration with 3rd party data is causing pain points with
vendor collaboration and data access
Future State
• Flexible, scalable, cloud-based infrastructure enabling multi-
stream data processing
• Near real-time data availability via Apache Spark data
processing providing valuable insights for decision making
• Easily supported visualization and reporting platforms
accessible by internal and vendors with simple access controls
Any trademarks included are trademarks of their respective owners and are not affiliated with, nor endorsed by, PricewaterhouseCoopers LLP, its subsidiaries or affiliates.
16
How is PwC Creating Awareness and Driving Adoption in the
Market
Thought Leadership /
Independent Research Strategic Alliances
• Google
• Microsoft
• Oracle
• SAP
Data & Analytics @Scale - Client Delivery
17
Closing Thoughts…....
• We believe external market forces will propel enterprises to embrace the Data Lake as a
foundation of their data, analytics and emerging technology strategies
• Although barriers remain for adoption by mainstream enterprises, there are ample
opportunities for innovation and acceleration by abstracting sophistication with
simplicity and superior end user experience
• Enterprises should follow 4 core tenets* while developing their Next Generation
Information Architecture Platform
• Keep the 5 step strategic ‘capability driven’ approach in mind!!
• Thanks for attending the session – please contact us with any questions!
© 2016 PwC. All rights reserved. PwC refers to the US member firm or one of its subsidiaries or affiliates, and may sometimes refer to the PwC
network. Each member firm is a separate legal entity. Please see www.pwc.com/structure for further details.

More Related Content

What's hot

2019 Media and Entertainment Study
2019 Media and Entertainment Study2019 Media and Entertainment Study
2019 Media and Entertainment StudyL.E.K. Consulting
 
Cracking the Code on Consumer Fraud | Accenture
Cracking the Code on Consumer Fraud | AccentureCracking the Code on Consumer Fraud | Accenture
Cracking the Code on Consumer Fraud | Accentureaccenture
 
How fit is your capital allocation strategy?
How fit is your capital allocation strategy? How fit is your capital allocation strategy?
How fit is your capital allocation strategy? EY
 
A.T. Kearney 2017 State of Logistics Report: Accelerating into Uncertainty
A.T. Kearney 2017 State of Logistics Report: Accelerating into UncertaintyA.T. Kearney 2017 State of Logistics Report: Accelerating into Uncertainty
A.T. Kearney 2017 State of Logistics Report: Accelerating into UncertaintyKearney
 
World Economic Forum: The power of analytics for better and faster decisions ...
World Economic Forum: The power of analytics for better and faster decisions ...World Economic Forum: The power of analytics for better and faster decisions ...
World Economic Forum: The power of analytics for better and faster decisions ...PwC
 
18th Annual Global CEO Survey - Technology industry key findings
18th Annual Global CEO Survey - Technology industry key findings18th Annual Global CEO Survey - Technology industry key findings
18th Annual Global CEO Survey - Technology industry key findingsPwC
 
Navigating a digital-first home furnishings market
Navigating a digital-first home furnishings market Navigating a digital-first home furnishings market
Navigating a digital-first home furnishings market L.E.K. Consulting
 
Global Capital Confidence Barometer 21st edition
Global Capital Confidence Barometer 21st editionGlobal Capital Confidence Barometer 21st edition
Global Capital Confidence Barometer 21st editionEY
 
Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017PwC
 
Unleashing Competitiveness on the Cloud Continuum | Accenture
Unleashing Competitiveness on the Cloud Continuum | AccentureUnleashing Competitiveness on the Cloud Continuum | Accenture
Unleashing Competitiveness on the Cloud Continuum | Accentureaccenture
 
Strategy Study 2014 | A.T. Kearney
Strategy Study 2014 | A.T. KearneyStrategy Study 2014 | A.T. Kearney
Strategy Study 2014 | A.T. KearneyKearney
 
The FDA and industry: A recipe for collaborating in the New Health Economy
The FDA and industry:  A recipe for collaborating in the New Health EconomyThe FDA and industry:  A recipe for collaborating in the New Health Economy
The FDA and industry: A recipe for collaborating in the New Health EconomyPwC
 
Top 8 Insights From the 2018 Beauty, Health & Wellness Survey
Top 8 Insights From the 2018 Beauty, Health & Wellness SurveyTop 8 Insights From the 2018 Beauty, Health & Wellness Survey
Top 8 Insights From the 2018 Beauty, Health & Wellness SurveyL.E.K. Consulting
 
Power transactions and trends Q2 2019
Power transactions and trends Q2 2019Power transactions and trends Q2 2019
Power transactions and trends Q2 2019EY
 
EY Price Point: global oil and gas market outlook, Q2 | April 2022
EY Price Point: global oil and gas market outlook, Q2 | April 2022EY Price Point: global oil and gas market outlook, Q2 | April 2022
EY Price Point: global oil and gas market outlook, Q2 | April 2022EY
 
Federal Technology Vision 2021: Full U.S. Federal Survey Findings | Accenture
Federal Technology Vision 2021: Full U.S. Federal Survey Findings | AccentureFederal Technology Vision 2021: Full U.S. Federal Survey Findings | Accenture
Federal Technology Vision 2021: Full U.S. Federal Survey Findings | Accentureaccenture
 
TMT Outlook 2017: A new wave of advances offer opportunities and challenges
TMT Outlook 2017:  A new wave of advances offer opportunities and challengesTMT Outlook 2017:  A new wave of advances offer opportunities and challenges
TMT Outlook 2017: A new wave of advances offer opportunities and challengesDeloitte United States
 
5 Opportunities in the Nutritional Supplements Industry
5 Opportunities in the Nutritional Supplements Industry5 Opportunities in the Nutritional Supplements Industry
5 Opportunities in the Nutritional Supplements IndustryL.E.K. Consulting
 
Pursuing Customer Inspired Growth
Pursuing Customer Inspired GrowthPursuing Customer Inspired Growth
Pursuing Customer Inspired GrowthKearney
 
McKinsey - Covid 19 - Global Auto Consumer Insights - November 2020
McKinsey -  Covid 19 - Global Auto Consumer Insights - November 2020McKinsey -  Covid 19 - Global Auto Consumer Insights - November 2020
McKinsey - Covid 19 - Global Auto Consumer Insights - November 2020Martin Hattrup
 

What's hot (20)

2019 Media and Entertainment Study
2019 Media and Entertainment Study2019 Media and Entertainment Study
2019 Media and Entertainment Study
 
Cracking the Code on Consumer Fraud | Accenture
Cracking the Code on Consumer Fraud | AccentureCracking the Code on Consumer Fraud | Accenture
Cracking the Code on Consumer Fraud | Accenture
 
How fit is your capital allocation strategy?
How fit is your capital allocation strategy? How fit is your capital allocation strategy?
How fit is your capital allocation strategy?
 
A.T. Kearney 2017 State of Logistics Report: Accelerating into Uncertainty
A.T. Kearney 2017 State of Logistics Report: Accelerating into UncertaintyA.T. Kearney 2017 State of Logistics Report: Accelerating into Uncertainty
A.T. Kearney 2017 State of Logistics Report: Accelerating into Uncertainty
 
World Economic Forum: The power of analytics for better and faster decisions ...
World Economic Forum: The power of analytics for better and faster decisions ...World Economic Forum: The power of analytics for better and faster decisions ...
World Economic Forum: The power of analytics for better and faster decisions ...
 
18th Annual Global CEO Survey - Technology industry key findings
18th Annual Global CEO Survey - Technology industry key findings18th Annual Global CEO Survey - Technology industry key findings
18th Annual Global CEO Survey - Technology industry key findings
 
Navigating a digital-first home furnishings market
Navigating a digital-first home furnishings market Navigating a digital-first home furnishings market
Navigating a digital-first home furnishings market
 
Global Capital Confidence Barometer 21st edition
Global Capital Confidence Barometer 21st editionGlobal Capital Confidence Barometer 21st edition
Global Capital Confidence Barometer 21st edition
 
Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017
 
Unleashing Competitiveness on the Cloud Continuum | Accenture
Unleashing Competitiveness on the Cloud Continuum | AccentureUnleashing Competitiveness on the Cloud Continuum | Accenture
Unleashing Competitiveness on the Cloud Continuum | Accenture
 
Strategy Study 2014 | A.T. Kearney
Strategy Study 2014 | A.T. KearneyStrategy Study 2014 | A.T. Kearney
Strategy Study 2014 | A.T. Kearney
 
The FDA and industry: A recipe for collaborating in the New Health Economy
The FDA and industry:  A recipe for collaborating in the New Health EconomyThe FDA and industry:  A recipe for collaborating in the New Health Economy
The FDA and industry: A recipe for collaborating in the New Health Economy
 
Top 8 Insights From the 2018 Beauty, Health & Wellness Survey
Top 8 Insights From the 2018 Beauty, Health & Wellness SurveyTop 8 Insights From the 2018 Beauty, Health & Wellness Survey
Top 8 Insights From the 2018 Beauty, Health & Wellness Survey
 
Power transactions and trends Q2 2019
Power transactions and trends Q2 2019Power transactions and trends Q2 2019
Power transactions and trends Q2 2019
 
EY Price Point: global oil and gas market outlook, Q2 | April 2022
EY Price Point: global oil and gas market outlook, Q2 | April 2022EY Price Point: global oil and gas market outlook, Q2 | April 2022
EY Price Point: global oil and gas market outlook, Q2 | April 2022
 
Federal Technology Vision 2021: Full U.S. Federal Survey Findings | Accenture
Federal Technology Vision 2021: Full U.S. Federal Survey Findings | AccentureFederal Technology Vision 2021: Full U.S. Federal Survey Findings | Accenture
Federal Technology Vision 2021: Full U.S. Federal Survey Findings | Accenture
 
TMT Outlook 2017: A new wave of advances offer opportunities and challenges
TMT Outlook 2017:  A new wave of advances offer opportunities and challengesTMT Outlook 2017:  A new wave of advances offer opportunities and challenges
TMT Outlook 2017: A new wave of advances offer opportunities and challenges
 
5 Opportunities in the Nutritional Supplements Industry
5 Opportunities in the Nutritional Supplements Industry5 Opportunities in the Nutritional Supplements Industry
5 Opportunities in the Nutritional Supplements Industry
 
Pursuing Customer Inspired Growth
Pursuing Customer Inspired GrowthPursuing Customer Inspired Growth
Pursuing Customer Inspired Growth
 
McKinsey - Covid 19 - Global Auto Consumer Insights - November 2020
McKinsey -  Covid 19 - Global Auto Consumer Insights - November 2020McKinsey -  Covid 19 - Global Auto Consumer Insights - November 2020
McKinsey - Covid 19 - Global Auto Consumer Insights - November 2020
 

Similar to Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architecture View

When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18Cloudera, Inc.
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
The Eco-System of AI and How to Use It
The Eco-System of AI and How to Use ItThe Eco-System of AI and How to Use It
The Eco-System of AI and How to Use Itinside-BigData.com
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraMolly Alexander
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?Xpand IT
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Athira mp cv_latest - copy
Athira mp cv_latest - copyAthira mp cv_latest - copy
Athira mp cv_latest - copyAthira MP
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 

Similar to Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architecture View (20)

When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
The Eco-System of AI and How to Use It
The Eco-System of AI and How to Use ItThe Eco-System of AI and How to Use It
The Eco-System of AI and How to Use It
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Evaluation guide to Streaming Analytics
Evaluation guide to Streaming AnalyticsEvaluation guide to Streaming Analytics
Evaluation guide to Streaming Analytics
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Athira mp cv_latest - copy
Athira mp cv_latest - copyAthira mp cv_latest - copy
Athira mp cv_latest - copy
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 

More from PwC

2017 Top Issues - Financial Reporting Modernization - January 2017
2017 Top Issues - Financial Reporting Modernization - January 20172017 Top Issues - Financial Reporting Modernization - January 2017
2017 Top Issues - Financial Reporting Modernization - January 2017PwC
 
2017 Top Issues - DOL Fiduciary Rule - January 2017
2017 Top Issues - DOL Fiduciary Rule - January 20172017 Top Issues - DOL Fiduciary Rule - January 2017
2017 Top Issues - DOL Fiduciary Rule - January 2017PwC
 
2017 Top Issues - Changing Business Models - January 2017
2017 Top Issues -  Changing Business Models  - January 20172017 Top Issues -  Changing Business Models  - January 2017
2017 Top Issues - Changing Business Models - January 2017PwC
 
2017 Top Issues Core Transformation - January 2017
2017 Top Issues Core Transformation - January 20172017 Top Issues Core Transformation - January 2017
2017 Top Issues Core Transformation - January 2017PwC
 
PwC Insurance deals insights
PwC Insurance deals insights PwC Insurance deals insights
PwC Insurance deals insights PwC
 
Chain Reaction: How Blockchain Technology Might Transform Wholesale Insurance
Chain Reaction: How Blockchain Technology Might Transform Wholesale InsuranceChain Reaction: How Blockchain Technology Might Transform Wholesale Insurance
Chain Reaction: How Blockchain Technology Might Transform Wholesale InsurancePwC
 
In depth: New financial instruments impairment model
In depth: New financial instruments impairment modelIn depth: New financial instruments impairment model
In depth: New financial instruments impairment modelPwC
 
Advancing internal audit analytics
Advancing internal audit analytics Advancing internal audit analytics
Advancing internal audit analytics PwC
 
Fintech Insurance Report -June 2016
Fintech Insurance Report -June 2016Fintech Insurance Report -June 2016
Fintech Insurance Report -June 2016PwC
 
Stepping into the cockpit- Redefining finance's role in the digital age
Stepping into the cockpit- Redefining finance's role in the digital ageStepping into the cockpit- Redefining finance's role in the digital age
Stepping into the cockpit- Redefining finance's role in the digital agePwC
 
PwC Loyalty Programs - Revenue Recognition
PwC Loyalty Programs - Revenue RecognitionPwC Loyalty Programs - Revenue Recognition
PwC Loyalty Programs - Revenue RecognitionPwC
 
PwC Insurance -Stress-testing
PwC Insurance -Stress-testingPwC Insurance -Stress-testing
PwC Insurance -Stress-testingPwC
 
International Capital Standard (ICS) Background
International Capital Standard (ICS) Background International Capital Standard (ICS) Background
International Capital Standard (ICS) Background PwC
 
PwC Managing Agent Change Report
PwC Managing Agent Change Report PwC Managing Agent Change Report
PwC Managing Agent Change Report PwC
 
In depth: The leasing standard
In depth: The leasing standardIn depth: The leasing standard
In depth: The leasing standardPwC
 
Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017PwC
 
PwC Lease Accounting Guide
PwC Lease Accounting GuidePwC Lease Accounting Guide
PwC Lease Accounting GuidePwC
 
Putting digital technology and data to work for Tech CMO's
Putting digital technology and data to work for Tech CMO'sPutting digital technology and data to work for Tech CMO's
Putting digital technology and data to work for Tech CMO'sPwC
 
InsurTech: PwC Top Issues
InsurTech: PwC Top IssuesInsurTech: PwC Top Issues
InsurTech: PwC Top IssuesPwC
 
Artificial intelligence: PwC Top Issues
Artificial intelligence: PwC Top IssuesArtificial intelligence: PwC Top Issues
Artificial intelligence: PwC Top IssuesPwC
 

More from PwC (20)

2017 Top Issues - Financial Reporting Modernization - January 2017
2017 Top Issues - Financial Reporting Modernization - January 20172017 Top Issues - Financial Reporting Modernization - January 2017
2017 Top Issues - Financial Reporting Modernization - January 2017
 
2017 Top Issues - DOL Fiduciary Rule - January 2017
2017 Top Issues - DOL Fiduciary Rule - January 20172017 Top Issues - DOL Fiduciary Rule - January 2017
2017 Top Issues - DOL Fiduciary Rule - January 2017
 
2017 Top Issues - Changing Business Models - January 2017
2017 Top Issues -  Changing Business Models  - January 20172017 Top Issues -  Changing Business Models  - January 2017
2017 Top Issues - Changing Business Models - January 2017
 
2017 Top Issues Core Transformation - January 2017
2017 Top Issues Core Transformation - January 20172017 Top Issues Core Transformation - January 2017
2017 Top Issues Core Transformation - January 2017
 
PwC Insurance deals insights
PwC Insurance deals insights PwC Insurance deals insights
PwC Insurance deals insights
 
Chain Reaction: How Blockchain Technology Might Transform Wholesale Insurance
Chain Reaction: How Blockchain Technology Might Transform Wholesale InsuranceChain Reaction: How Blockchain Technology Might Transform Wholesale Insurance
Chain Reaction: How Blockchain Technology Might Transform Wholesale Insurance
 
In depth: New financial instruments impairment model
In depth: New financial instruments impairment modelIn depth: New financial instruments impairment model
In depth: New financial instruments impairment model
 
Advancing internal audit analytics
Advancing internal audit analytics Advancing internal audit analytics
Advancing internal audit analytics
 
Fintech Insurance Report -June 2016
Fintech Insurance Report -June 2016Fintech Insurance Report -June 2016
Fintech Insurance Report -June 2016
 
Stepping into the cockpit- Redefining finance's role in the digital age
Stepping into the cockpit- Redefining finance's role in the digital ageStepping into the cockpit- Redefining finance's role in the digital age
Stepping into the cockpit- Redefining finance's role in the digital age
 
PwC Loyalty Programs - Revenue Recognition
PwC Loyalty Programs - Revenue RecognitionPwC Loyalty Programs - Revenue Recognition
PwC Loyalty Programs - Revenue Recognition
 
PwC Insurance -Stress-testing
PwC Insurance -Stress-testingPwC Insurance -Stress-testing
PwC Insurance -Stress-testing
 
International Capital Standard (ICS) Background
International Capital Standard (ICS) Background International Capital Standard (ICS) Background
International Capital Standard (ICS) Background
 
PwC Managing Agent Change Report
PwC Managing Agent Change Report PwC Managing Agent Change Report
PwC Managing Agent Change Report
 
In depth: The leasing standard
In depth: The leasing standardIn depth: The leasing standard
In depth: The leasing standard
 
Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017Medical Cost Trend: Behind the Numbers 2017
Medical Cost Trend: Behind the Numbers 2017
 
PwC Lease Accounting Guide
PwC Lease Accounting GuidePwC Lease Accounting Guide
PwC Lease Accounting Guide
 
Putting digital technology and data to work for Tech CMO's
Putting digital technology and data to work for Tech CMO'sPutting digital technology and data to work for Tech CMO's
Putting digital technology and data to work for Tech CMO's
 
InsurTech: PwC Top Issues
InsurTech: PwC Top IssuesInsurTech: PwC Top Issues
InsurTech: PwC Top Issues
 
Artificial intelligence: PwC Top Issues
Artificial intelligence: PwC Top IssuesArtificial intelligence: PwC Top Issues
Artificial intelligence: PwC Top Issues
 

Recently uploaded

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architecture View

  • 1. PwC Advisory Apache Hadoop Summit ‘ 2016 The Future of Apache Hadoop An Enterprise Architecture View www.pwc.com/unlockdatapossibilities
  • 2. 2 Presenters Oliver Halter Partner, Information Strategy and Big Data oliver.halter@pwc.com Ritesh Ramesh Chief Technologist, Global Data and Analytics ritesh.ramesh@pwc.com
  • 3. 3 Contents 1 2 3 4 5 Trends Challenges Opportunities Accelerating adoption through a Capability Driven Approach Real life Case Studies/Lessons Learnt
  • 4. 4 PwC's global data & analytics surveys & trends PwC, 2016 Global CEO Survey, January 2016 PwC, Global Data and Analytics Survey: 2016 Big Decisions™ 73% Data and Analytics Technologies generate the greatest return in terms of engagement with wider stakeholders 32% Nearly one in three said developing or launching new products and services is their leading ‘big decision’. Does your data & analytics effectively support you?
  • 5. 5 Although we are increasingly seeing the use of Hadoop among mainstream companies key barriers still remain for its holistic success and adoption as an enterprise platform An enterprise is a complex system of components Adoption Barriers 1 2 3 4 Incoherent Enterprise View Overcrowded technology ecosystem Lack of User Centricity Siloed Ownership
  • 6. 6 We believe external market forces will propel enterprises to embrace the Data Lake as a foundation of their data, analytics and emerging technology strategies 1.InternetofThings 3.Digital 4.ModernData Management 2.ArtificialIntelligence 5.Analytics 6.CyberSecurity Enterprise Data Lake 1. Grow the Business 2. Optimize Spend 3. Innovate 4. Mitigate Risks Emerging Technology Platforms Connecting the dots between various strategic technology initiatives within the enterprise is going to be critical to capitalize on the opportunity....
  • 7. 7 There are lots of opportunities to innovate and accelerate enterprise adoption of Hadoop by abstracting sophistication with simplicity and superior end user experience Existing Innovations enabling Acceleration Opportunities to close the gaps Cloud based Marketplaces and Solutions Third Party on-demand, ‘Smart’ Data Wrangling solutions leveraging high performance components in Hadoop Open Source Analytics and AI Libraries Third Party ‘Hadoop in a Box’ integrated solutions Vendor distributions and developer communities – well established 1 2 3 4 5 Data extraction and semantic text analytics libraries for complex data structures – Nested XML’s, PDF’s and Unstructured Data Model Management and integration tools facilitating seamless interoperability or migration from existing technology investments ( data warehouses and applications) Bringing Visualization to the data stored with Hadoop with native libraries and third party tools Adaptive & Dynamic Workload Management Native Data Masking and Encryption Features 1 2 3 4 5
  • 8. 8 Jumpstart/accelerate Hadoop journey with these 4 core tenets Capability Driven 1 Right Fit3 Flexible Operating Model4 Heterogenous2 Third Party Tool Integration PwC’s Next Generation Information Architecture 1 2 34 Cloud Interoperability Legacy Integration Data Migration On-Premise Cloud In-Memory Disk based NoSQL typesSupport Model Training Use Cases/ Demad Intake Services Catalog Business Adoption Innovation Platform Monetization Analytics Application Development Enterprise Data Mnagemnet *https://www.pwc.com/us/infoarchitecture
  • 9. 9 Tenet 1: Capability Driven Focus on capturing the current and future information and analytics needs of every business function and external partners to drive the architecture PwC’s Data Lake Capability Framework Data Quality/ Integration 2 Data Architecture 3 Metadata Management 4 Analytics/ Reporting/ Visualization 5 Data Access 6 Security 7 Governance/ Organization 8 1 Data Ingestion Modern data management technologies (ELT based, Data wrangling etc.) used for cleansing, standardizing and integrating the data from multiple internal and external sources leveraging the scalable computing platform Ability to manage and store data in normalized or denormalized structures on disk, in-memory, row vs. columnar vs. column family based data stores (Hive, Spark, HBase, RDBMS etc.) in depending on the use cases Ability to track data sources ingested into the data lake, track data lineage and provenance of storage and processing activities Metrics, Tools and processes required to visualize and comprehend data stored in the data stores in form of reports, dashboards and scorecards for business users Ability to ingest data in batch & real time modes in various forms –Databases, Files, Streams and Queues Centralized and coordinated management of projects/activities, managing change and communication of key milestones and business benefits Capabilities to secure personally identifiable information in the next generation platform and create role based access to business users Ability to access stored data from the Platform through a consistent & secure API
  • 10. 10 Tenet 2: Heterogeneous Hybrid set of both traditional and emerging technologies and platforms to acquire, store, interlock and analyze internal and external data will be the norm going forward. Design for simplicity and iteratively build your modular architecture with transition states towards the target Sources of Known Value Sales Transactions Customer Product Physical Assets Sources of Unproven Value Call Center Social Media Web Clickstream Mobile Interactions Data Ingestion Layer ETL Connectors Sqoop Kafka Flume Emerging – Open Source Illustrative model from a national retailer Emerging – Licensed Traditional – Licensed Licensed+Open Source ETL Match-Merge Services Metadata Management Spark Data Analytics/ Visualization Standardized Reporting On-Demand/ Adhoc Analytics Modeling API based Apps. ELT Relational Schemas Enterprise Data warehouse Data Exchange HDFS RDD HBase Data Wrangling Hive (Parquet) Enterprise Data Lake
  • 11. 11 Tenet 3: Right Fit Enterprises need to develop a decision model which identifies the mix of ‘right fit’ open source as well as commercial solution components, either hosted on the cloud or On Premise, based on functionality and business needs Illustrative On Premise Build ? Buy ? Vendor Dist. ? Constraints ? Base Platform ? End-End Stack ? 3rd party Cloud/Tools? Security? Cloud integration? Pre-Requisites (Hardware, Drivers, Software Interoperability) Cloud Build ? Buy ? 3rd party Cloud/Tools? Security? On Premise Integration? Pre-Requisites (Hardware, Drivers, Software Interoperability) Cloud Vendor ? Vendor Dist. (IaaS)? Which Native Services (PaaS)?
  • 12. 12 Tenet 4: Flexible Operating Model Recognizes the sophistication and analytics maturity at a business function level and enables the required capabilities with the necessary skills, processes, tools and support 1. Business alignment on how Haddoop environment will operate. This includes defining - Services Catalog - Service level Agreements - Tracking Usage, Benefits and Costs - User Onboarding & training 2. Defining the Business architecture - Identify capability areas and opportunities to inform the Big Data Strategy - Use Case Evaluation (risk, feasibility and business case) - Prioritization criteria - Demand / Intake process - Business Roadmap 1. Technology Alignment on how the Hadoop environment will operate. This includes defining: - Access Model (Self service vs. Controlled) - Data acquisition and classification strategy - Organization (Develop vs. Support) - Technical Skills Training 2. Defining the Technology architecture - Architecture Guiding Principles - Leading practices for data acquisition, management and delivery - Reference Architecture with solution patterns for the various use cases - Storage and infrastructure Planning - Security Model Business Operating Model Technology Operating Model
  • 13. 13 Five step strategic approach to build a strong data lake foundation Recognizes the sophistication and analytics maturity at a business function level and enables the required capabilities with the necessary skills, processes, tools and support Capabilities Leveraging client’s stated capabilities and PwC’s Capability framework with business interviews, analytical capabilities are captured and documented1 Use Case Specifications Define success criteria, information sources, dimensionality and information delivery mechanism for each use case. Each Use Case must be mapped to a set of Capabilities2 Platform Architecture & Operating Model Define end-end architecture components (‘lego blocks’) mapped to the capabilities identified with leading practices for ingestion, management , analytics and visualization. Identifies the organization, process and support structure required for agility3 Strategic Roadmap for Execution Organize the initiatives in a sequenced roadmap with scope, duration and dependencies under various themes5 Architecture Patterns Depict the architecture pattern at the use case level , leverages the logical architecture ‘lego blocks’ and also shows the information flow, respective technology component and integration touch point with client’s systems4
  • 14. 14 Case Study # 1 – Financial Services Provider – Risk Modeling for their Loans Portfolio Current State Future State • The client developed a next generation information management and analytics platform which was more business centric with an operating model that enables agility, self service, faster data management and deep analytics for the business stakeholders • Data processing window was reduced from 8-10 hours to less than 30 minutes • Business Users were able to access more granular historical data for ad hoc analysis and analytics models TableauSAS CSV Files No capability to look back history past the last month of data Sources two CSV files (total ~ 3 M rows of data) Aggregation logic performed – CSV data files exported Hadoop Distributed File System TableauHive Spark Aggregation and Data transformation logic performed using HiveQL on 67M records and 36 columns (14.7 GB of data in Hive, 16.3 GB in memory in Spark SQL) Response time between 2s and ~ 1 min per filter sourcing live data via Spark SQL Current Process – Adhoc Analysis – 8-10 hours Future Process – Adhoc Analysis – < 30 minutes • Lack of an integrated architecture and scalable technology infrastructure contributed to data management challenges • The business analytics and modeling teams were looking for more self-sufficiency and process agility • Lacked program leadership and program management discipline specifically for third party services and solution providers • Data Acquisition and management processes lacked a consistent design and architecture and were heavily siloed on an application – application basis Any trademarks included are trademarks of their respective owners and are not affiliated with, nor endorsed by, PricewaterhouseCoopers LLP, its subsidiaries or affiliates.
  • 15. 15 Case Study # 2 – Leading Retail Distribution Company – Trade Promotion Effectiveness 500k SKU’s, 250k customers, 5k suppliers, 6k Fleets Current State • On-premise, rigid infrastructure with serial data processing and limited capacity • Delayed data availability reducing applicability to impactful business decisions • No integration with 3rd party data is causing pain points with vendor collaboration and data access Future State • Flexible, scalable, cloud-based infrastructure enabling multi- stream data processing • Near real-time data availability via Apache Spark data processing providing valuable insights for decision making • Easily supported visualization and reporting platforms accessible by internal and vendors with simple access controls Any trademarks included are trademarks of their respective owners and are not affiliated with, nor endorsed by, PricewaterhouseCoopers LLP, its subsidiaries or affiliates.
  • 16. 16 How is PwC Creating Awareness and Driving Adoption in the Market Thought Leadership / Independent Research Strategic Alliances • Google • Microsoft • Oracle • SAP Data & Analytics @Scale - Client Delivery
  • 17. 17 Closing Thoughts….... • We believe external market forces will propel enterprises to embrace the Data Lake as a foundation of their data, analytics and emerging technology strategies • Although barriers remain for adoption by mainstream enterprises, there are ample opportunities for innovation and acceleration by abstracting sophistication with simplicity and superior end user experience • Enterprises should follow 4 core tenets* while developing their Next Generation Information Architecture Platform • Keep the 5 step strategic ‘capability driven’ approach in mind!! • Thanks for attending the session – please contact us with any questions!
  • 18. © 2016 PwC. All rights reserved. PwC refers to the US member firm or one of its subsidiaries or affiliates, and may sometimes refer to the PwC network. Each member firm is a separate legal entity. Please see www.pwc.com/structure for further details.