Hitachi Data Systems Hadoop Solution. Customers are seeing exponential growth of unstructured data from their social media websites to operational sources. Their enterprise data warehouses are not designed to handle such high volumes and varieties of data. Hadoop, the latest software platform that scales to process massive volumes of unstructured and semi-structured data by distributing the workload through clusters of servers, is giving customers new option to tackle data growth and deploy big data analysis to help better understand their business. Hitachi Data Systems is launching its latest Hadoop reference architecture, which is pre-tested with Cloudera Hadoop distribution to provide a faster time to market for customers deploying Hadoop applications. HDS, Cloudera and Hitachi Consulting will present together and explain how to get you there. Attend this WebTech and learn how to: Solve big-data problems with Hadoop. Deploy Hadoop in your data warehouse environment to better manage your unstructured and structured data. Implement Hadoop using HDS Hadoop reference architecture. For more information on Hitachi Data Systems Hadoop Solution please read our blog: http://blogs.hds.com/hdsblog/2012/07/a-series-on-hadoop-architecture.html
2. Customers are seeing exponential growth of unstructured data from their social media
websites to operational sources. Their enterprise data warehouses are not designed to
handle such high volumes and varieties of data.
Hadoop, the latest software platform that scales to process massive volumes of
unstructured and semi-structured data by distributing the workload through clusters of
servers, is giving customers new option to tackle data growth and deploy big data analysis
to help better understand their business.
Hitachi Data Systems is launching its latest Hadoop reference architecture, which is pre-
tested with Cloudera Hadoop distribution to provide a faster time to market for customers
deploying Hadoop applications. HDS, Cloudera and Hitachi Consulting will present together
and explain how to get you there.
Attend this WebTech and learn how to
• Solve big-data problems with Hadoop.
• Deploy Hadoop in your data warehouse environment to better manage your
unstructured and structured data.
• Implement Hadoop using HDS Hadoop reference architecture.
HITACHI DATA SYSTEMS HADOOP SOLUTION
WEBTECH EDUCATIONAL SERIES
3. PRESENTERS
Shankar Radhakrishnan, Solutions Manager, Hitachi Data Systems
Sai Saiprabhu Director, Specialized Services, Hitachi Consulting
Art Vancil Big Data Senior Manager, Hitachi Consulting
Daniel Templeton, Partner Manager, Cloudera
5. Enterprise Data EvolutionAMOUNTOFDATA
• Data collection & reporting
• Process data faster
• Store data more cost-effectively
• Simplify infrastructure
• Combine data from across the business
• Ask new questions immediately
• Enable new real-time applications
CREATE
COMPETITIVE ADVANTAGE
IMPROVE
OPERATIONAL EFFICIENCY
6. Data Has Changed in the Last 30 YearsDATAGROWTH
END-USER
APPLICATIONS
THE INTERNET
MOBILE DEVICES
SOPHISTICATED
MACHINES
STRUCTURED DATA – 10%
1980 2012
UNSTRUCTURED DATA – 90%
7. Data Management Strategies
Have Stayed the Same
• Raw data on SAN, NAS
and tape
• Data moved from
storage to compute
• Relational models with
predesigned schemas
8. Too Much Data, Too Many Sources
• Can’t ingest fast enough
9. Too Much Data, Too Many Sources
$
!
$ $
$
• Can’t ingest fast enough
• Costs too much to store
10. Too Much Data, Too Many Sources
1
2 3 4
5
• Can’t ingest fast enough
• Costs too much to store
• Exists in different places
11. Too Much Data, Too Many Sources
• Can’t ingest fast enough
• Costs too much to store
• Exists in different places
• Archived data is lost
12. Can’t Use It The Way You Want To
• Analysis and processing
takes too long
13. Can’t Use It The Way You Want To
1
2 3 4
5
• Analysis and processing
takes too long
• Data exists in silos
14. Can’t Use It The Way You Want To
? ? ?
• Analysis and processing
takes too long
• Data exists in silos
• Can’t ask new questions
15. Can’t Use It The Way You Want To
• Analysis and processing
takes too long
• Data exists in silos
• Can’t ask new questions
• Can’t analyze
unstructured data
17. SIMPLIFIED, UNIFIED, EFFICIENT
• Bulk of data stored on scalable low cost platform
• Perform end-to-end workflows
• Specialized systems reserved for specialized workloads
• Provides data access across departments or LOB
COMPLEX, FRAGMENTED, COSTLY
•Data silos by department or LOB
• Lots of data stored in expensive specialized systems
• Analysts pull select data into EDW
• No one has a complete view
The Cloudera Approach
17
Meet enterprise demands with a new way to think about data.
THE CLOUDERA WAYTHE OLD WAY
Single data platform to
support BI, Reporting &
App Serving
Multiple platforms
for multiple workloads
18. Hadoop complements the Data Warehouse
18
OLTP
Enterprise
Applications
Business
Intelligence
Data Warehouse
Query
(High $/Byte)
CLOUDERA
Store
QueryTransform
ETL
Math
Load Archive
Operational BI
Archival Data,
Exploration,
Analytics
19. INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CDH CLOUDERA
MANAGER
CLOUDERA
SUPPORT
Cloudera Enterprise: The Platform for Big Data
19
BRINGS STORAGE &
COMPUTE TOGETHER
WORKS WITH EVERY
TYPE OF DATA
CHANGES THE
ECONOMICS OF DATA
MANGAGEMENT
A Revolutionary Solution Built on Apache Hadoop
CLOUDERA
NAVIGATOR
20. CDH4
20
Big Data Storage, Processing & Analytics Based on Apache Hadoop
Store
Land structured and unstructured data in a
scalable, cost-effective repository
1
Process & Analyze
Transform data in parallel and query at the
speed of thought
2
Integrate
Interoperate with existing platforms, systems and
applications
3
21. Cloudera Manager
21
End-to-End Administration for CDH
Deploy
Install, configure & start your cluster in 3
simple steps
1
Configure & Optimize
Ensure optimal settings for all hosts & services2
Monitor, Diagnose & Report
Find & fix problems quickly, view current &
historical activity & resource usage
3
22. Cloudera Navigator
22
Data Management Layer for Cloudera Enterprise
Audit & Access Control (AVAILABLE NOW)
Ensuring appropriate permissions and reporting on
data access for compliance
1
Exploration & Lineage (COMING SOON)
Finding out what data is available, what it looks like
and where it came from
2
Lifecycle Management (COMING SOON)
Migration of data based on policies3
23. Cloudera Support
23
Our Team of Experts on Call to Help You Meet Your SLAs
Extend Your Team
Get a dedicated team at your disposal to
help you solve problems quickly
1
Leverage the Experts
Take advantage of our expertise to make sure
your cluster operates at its best
2
Influence Roadmaps
Get advocacy with the open source community to
build the features and functionality you need
3
24. Cloudera Manager
Management for the complete Hadoop system
The most mature & functionally advanced
The easiest to use w/built-in intelligence
Integration w/enterprise monitoring tools
Cloudera Enterprise
24
CDH4
The only solution with real time query (Impala)
The only solution with HDFS high availability
The most widely deployed & proven
The broadest ecosystem of certified partners
100% open source & built for the enterprise
The Best Hadoop-Based Platform
Cloudera Navigator
The only data management tool for Hadoop
Cloudera Navigator 1.0: Data audit & access
control
Cloudera Support
Dedicated team with a global presence
Contributors and committers for every part of CDH
Tens of thousands of nodes under management
across industries
27. HADOOP APPLICATION EXAMPLE:
GENOME ANALYSIS
National Institute of Genomics
– Japan
Challenge: Accelerate the
speed of analysis for genome
data from next-generation
sequencers
4 PB of data
Solution
‒ 115-node Hadoop cluster using
Hitachi Compute Rack servers
‒ Reliable and scalable solution
28. PROACTIVE MAINTENANCE AT HITACHI
SERVER DIVISION
User Inquiry
Hardware Auditing
Log
Callcenter Log
Maintenance
ReportCRM Customer
Data
Sales/Financial
Data
Distribution/Stock
Data
Location
Information
Server Log
Operation History
BOM data
Production Data Of
Business System
・Proactive hardware maintenance from logs, call center data, and product
information
・Leverage historical data for future product development
Challenge
Solution: Hadoop + SAP HANA + SAP Visual Intelligence
29. • Cost-effective for low-fidelity data
• Increase efficiency and utilization of resources and meet
required service levels
• Hardware less prone to failures
• Easy to manage
• Scale out to handle petabytes of unstructured and semi-
structured data
• Keep data closer to CPU
DATA
GROWTH
COST
COMPLEXITY
INFRASTRUCTURE REQUIREMENTS FOR
HADOOP
30. HADOOP IN THE ENTERPRISE:
ARCHITECTURE
Data Warehouse
Hadoop
Real Time
Computer
(Streaming)
Real Time
Computer
(Streaming)
Outside
Services
(Connect to
Facebook for
CRM, etc.)
One Platform for All Data, All Applications
Other Big Data Sources (Email,
Audio, Documents, etc.)
Business Apps
RDB
Real-Time
Computer
(Streaming)
Data Connector
CxOs Data Scientist
Business Users /
Customers
Business Intelligence Dashboard
Hitachi Strength and Focus
31. INTRODUCING HITACHI REFERENCE
ARCHITECTURE FOR HADOOP
Pretested and validated for
interoperability, performance, and
scalability
Flexible − customize to fit application
Pre-validated using Cloudera,
leading Hadoop distribution
(certification in progress)
Complementary to existing Hitachi
platforms for block, file, and object
Seamless management integration
with other Hitachi solutions
D
A
T
A
N
O
D
E
-
H
D
F
S
T
A
S
K
T
R
A
C
K
E
R
Name Node + Job Tracker
Secondary Name Node
Management
LAN
ENTERPRISE-READY INFRASTRUCTURE FOR HADOOP
D
A
T
A
N
O
D
E
-
H
D
F
S
T
A
S
K
T
R
A
C
K
E
R
LAN
32. REFERENCE ARCHITECTURE: HARDWARE
COMPONENTS
Qty Form factor Component Description
1 1U Management node Hitachi server CR 210H
- 2 x quad-core E2600 series
- 64GB main memory
- 2 x GigE (onboard)
- 5 x 3.5-inch 3TB NL-SAS 7200 RPM
1 2U HDFS master name node
- Name node
- Job tracker
Hitachi server CR 220S
- 2 x quad-core E2600 series
- 64GB main memory
- 2 x GigE (onboard)
- 12 x 3.5-inch 3TB NL-SAS 7200 RPM
1 2U Secondary name node Hitachi server CR 220S
- 2 x quad-core E2600 Series
- 64GB main memory
- 2 x GigE (onboard)
- 12 x 3.5-inch 3TB NL-SAS 7200 RPM
As needed 2U Data nodes
- Data node
- Task tracker
Hitachi server CR 220S
- 2 x quad-core E2600 series
- 64GB main memory
- 2 x GigE (onboard)
- 12 x 3.5-inch 3TB NL-SAS 7200 RPM
2 1U or 2U Ethernet switches
(10 GbE network)
Cisco Nexus 5548
- 48 x GigE / 10GigE or
Brocade VDX 6720-60
- 40 x GigE / 10GigE – form factor = 2U
1U
2U
CR220S
Switch-2
42U
Internal
HDD
Switch-1
1U
• High density (2U), high processing power (2 CPU sockets),
large data storage (12 HDD)
• Redundant power supplies
• Eco-friendly power saving capabilities
Why Compute
Rack Servers?
33. Component Version Description
Operating System 6.3 Redhat or CentOS 64-bit Linux distribution
Hadoop distribution CDH4 Cloudera Hadoop distribution
Hadoop
management
4.0.1 Cloudera Manager
Management
framework
n/a Hitachi Compute Systems Manager
REFERENCE ARCHITECTURE: SOFTWARE
COMPONENTS
Tested Software
D
A
T
A
N
O
D
E
-
H
D
F
S
T
A
S
K
T
R
A
C
K
E
R
Name Node + Job Tracker
HA Name Node
Management
LAN
Reference Architecture White Paper Targeted
for June 2013
34. WHY HITACHI FOR HADOOP
INFRASTRUCTURE
Enterprise-ready (RAS) for Hadoop
‒ Less worry about hardware failure, more focus on business
value
Seamless management integration with Hitachi solutions
‒ Lower opex
Competitive pricing with commodity hardware
‒ Lower capex
One platform solution for all your data volumes, velocity
and types
‒ Lower TCO, faster ROI for your big data initiatives
36. HITACHI CONSULTING
As the global consulting company of Hitachi, Ltd., Hitachi Consulting brings
business visions to life through in-depth industry expertise combined with
innovative technology solutions and services
From articulating strategy through deploying
and maintaining applications, Hitachi
Consulting helps clients quickly realize
measurable business value and achieve
sustainable ROI
The Hitachi Consulting client base includes 35
percent of the Fortune 100 and 25 percent of the
Fortune Global 100, along with many mid-market
leaders. With offices in North America, Europe,
the Middle East, and Asia, the company employs
more than 5,000 professionals, with delivery
centers in India and China for global delivery
scale
37. WHAT DO WE SEE WITH OUR CLIENTS?
Business Objectives
Refinement
Technology Adoption
without disruption
Data Science
Practice Adoption
Business
Intelligence Jump
Start With Big Data
Technologies
Emerging
Businesses
Business Intelligence
Practice Adoption
38. DO YOU NEED AN EXECUTIVE SPONSOR?
The Internet has driven most businesses to demand better information much faster than
ever before across almost every industry
Examples: Retailers can influence the next shopping visit based on analytics; Amazon
can tailor a shopping visit on a variety of dimensions (personalization, price incentives,
product combinations, etc.). How will similar dynamics impact your company?
Perhaps your company has not yet started using
Hadoop for big data initiatives. Or, perhaps you are
stuck in "discovery mode" trying to find
that golden nugget big idea from big data. If your
company is like mine, you will not be given permission
to simply play with Hadoop for months on end
In most companies your time spent on a project needs
to be backed by someone with a budget who wants to
get something done. Let's look at successful methods to
secure your big data executive sponsorship.
39. HOW DO I GET STARTED?
Award-winning luck #1
1. Your executive brings to you the
justification for big data
Award-winning luck #2
2. Your subject matter expert and your
data scientist pour over the data until
they find the “golden nugget” of
justification
If you have no budget for big data, then perhaps you are waiting for a stroke of luck?
Stop waiting, and begin now to collaborate with your business consultant to discover
the data value and the “essence” of your big data business opportunity
40. THE NITTY-GRITTY DETAILS
CEO/
CSO
• Predict the
Future
COO
• Optimize
the
Business
Process
CMO
CFO/
CTO
• Deliver
Faster and
Cheaper
Hitachi helps you to choose your big data solution
by targeting the message to your sponsor’s role
and asking the BIG QUESTIONS
• Nurture the
Customer
Relationship
41. FOR EXAMPLE
A high-end disk storage manufacturer collects daily performance data
from its customers’ storage devices, but cannot effectively analyze it
BECAUSE OF THE VOLUME
The big questions to ask: If we stored the data in Hadoop, then
Could we detect operational patterns that predict device failure worldwide?
Could we anticipate the failure AND suggest a replacement without downtime?
Could we sell the data analysis back to the customer for a fee?
Could we reduce the support effort by delivering proactive notifications?
How much revenue would we gain/costs would we eliminate?
42. SOLUTION SELECTION FRAMEWORK
The solution discovery and evaluation process is a top-down
survey of organizational leadership followed by a prioritization
and ranking, based upon business value and organizational
priorities
All Possible Solutions and Purposes
Solution
Solution
Solution
Solution
Solution
Solution
Solution
Solution
Prioritized
Big Data Solution Selection
Feasible Solutions
Solution
Solution
43. SPONSOR CONVERSATIONS: ESTABLISHED
BUSINESS INTELLIGENCE ENVIRONMENT
Specific use cases that address chosen pain
points to be tackled using big data
capabilities
Measures that show how the use cases
alleviate current pain points
External expertise needed to augment your
big data jump start
Action plan to implement prioritized use
cases and evaluate larger adoption of big
data capabilities
Executive sponsor buy-in
Executive sponsor oversight
Funding
44. LEVERAGE BIG DATA CAPABILITIES
Extend Historical
Transactions
Availability
Extend Data Staging,
Volume Processing
and Complex Data
Processing
Extend Complex
Data Processing
Ability to Process
Large Volumes
Flexibility and
Complexity
Management
Leverage Emerging
Capabilities
Extends Existing
Data
Management
Environment
Introduces New
Analytic
Capabilities
45. BIG DATA TECHNOLOGIES: ADOPTION
STRATEGY
Protect Existing Investments That are Already in the Right Place. Introduce Big
Data Technologies to Enable new and Evolving Business Needs
Big Data Appliance
Existing
Transactional
Sources
Social Media
Sources
Existing
Analytic
Capabilities
Structured Data Management and Existing Data Management
Batch or Stream
Current Augmentation to Structured Data Management (Limited)
Stream and Organize
Stream and Organize
Stream and Organize
Sporadic Analytic
Capabilities
Big Volume Data
Analyses
High Velocity
Data Analyses
Unstructured
Data Analyses
Protect Investments as Needed
Streamline as the Environment Matures
Expand as
Demand grows
Introduce New
Capabilities
Introduce,
Consolidate and
Expand New
Capabilities
Enterprise Analytics
1
2
4
3
46. SPONSOR CONVERSATIONS: EMERGING
BUSINESS INTELLIGENCE ENVIRONMENT
Business intelligence competencies needed
to attain and sustain competitive edge
Measures that help monitor business
operations alignment with business
strategies
External expertise needed to augment your
Big data and business intelligence jump
start
Action plan to implement and evaluate larger
adoption of big data business intelligence
capabilities
Executive sponsor buy-in
Executive sponsor oversight
Funding
47. NEXT STEPS
• Hitachi Unified Compute Platform for Business Analytics web page
• http://www.hds.com/products/hitachi-unified-compute-platform/business-analytics.html
• Contact your HDS sales rep for more information
49. UPCOMING WEBTECHS
WebTechs
‒ Take SAP HANA From Proof of Value Through Production Deployment,
June 20, 9 a.m. PT, noon ET
‒ A Cloud You Can Trust–Improve Datacenter Efficiency and Agility, June 26,
9 a.m. PT, noon ET
Check www.hds.com/webtech for
Links to the recording, the presentation, and Q&A (available next
week)
Schedule and registration for upcoming WebTech sessions