Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Q&A box is available for your questions
Webinar will be recorded for future viewing
Thank you for joining!
We’ll get started soon…

Create a Smarter Data Lake with HP Haven
and Apache Hadoop
We do Hadoop.

Your speakers…
Ajay Singh, Director of Technical Channels
Hortonworks
Will Gardella, Director of Product
Management, Big Data
HP

Traditional systems under pressure
Challenges
• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional

Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for
managing large volumes of high velocity and variety of data
• Built by Yahoo! to be the heartbeat of its ad & search business
• Donated to Apache Software Foundation in 2005 with rapid adoption by
large web properties & early adopter enterprises
Hadoop Advantages
 Manages new data paradigm
 Handles data at scale
 Cost effective
 Open source
Application
Storage
HDFS
Batch Processing
MapReduce

Hadoop for the Enterprise:
Implement a Modern Data Architecture with HDP
Customer Momentum
• 230+ customers (as of Q3 2014)
Hortonworks Data Platform
• Completely open multi-tenant platform for any app & any
data.
• A centralized architecture of consistent enterprise services
for resource management, security, operations, and
governance.
Partner for Customer Success
• Open source community leadership focus on enterprise
needs
• Unrivaled world class support
• Founded in 2011
• Original 24 architects, developers,
operators of Hadoop from Yahoo!
• 600+ Employees
• 800+ Ecosystem Partners

HDP delivers a completely open data platform
Hortonworks Data Platform 2.2
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture
of core enterprise services, for any application and any data.
Completely Open
• HDP incorporates every element
required of an enterprise data
platform: data storage, data
access, governance, security,
operations
• All components are developed in
open source and then rigorously
tested, certified, and delivered as
an integrated open source platform
that’s easy to consume and use by
the enterprise and ecosystem.
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
ApachePig
° °
° °
° ° °
° ° °
HDFS
(Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
ApacheHive
Cascading
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache
Zookeeper
Apache Oozie

HP & Hortonworks

HP & Hortonworks: An Integrated Part of a Modern Data Architecture
Smart Content Hub
Solution Architecture

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The Opportunity

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
Accelerating business outcomes
Data lakes – the new enterprise data hub
Social media IT/OT ImagesAudioVideo
Transactional
dataMobile Search engineEmail Texts Documents
Hadoop Distributed File System (HDFS)
Self-healing, high bandwidth, cheap clustered storage
Map/Reduce
Distributed Computing Framework
Business
outcomes

Analyzing data is a multi step process
Data type
• Structured tables
• Semi-structured
• Unstructured
• Documents
• Images
• Audio
• Video
Speed
• Batch
• Interactive
• Real-time
Process
• Acquisition
• Preparation
• Visualization
• Analysis
• Presentation
• Collaboration
Skill set
• Business users
• Programmer
• Database
expert
• Statistician
• Mathematician
• Subject Matter
Expert
Types
• Descriptive
• Diagnostic
• Predictive
• Prescriptive
Requires “Easy To Use” tools to meet wide range of skills

Challenge: Barriers between business users and
actionable information
Business users Data Scientists
Programmers Batch
Data Cleansing
Programming
Statistics
Reports
Information Requests
Hadoop

Contextual
search
Data
exploration
Image/video
analytics Geospatial
analytics
SQL on
HadoopAccelerated
analytics
Sentiment
analysis
Predictive
analytics
HP Haven Big Data platform
Access Explore Enrich Analyze Predict Serve Act
And
more...
Core big data business capabilities
On-premise In the Cloud
Industry-leading breath & depth of capabilities

A Smarter Data Lake
• Any Source − Build, enrich, and clean up your data lake
• Data Clarity & Mapped Security – Data dictionary and information security within your data
lake
• Advanced Analytics - Provide contextual search and text, image, video, speech machine
learning
Fast Analytics with HP Vertica on Hadoop
• The fastest and most advanced SQL analytics on Hadoop
• Operationalize, democratize and monetize all your data
• Data tiering – pick the best location and format for your data
HP Haven for Hadoop

2D/3D clustering, Acoustic signature, Active matching, Agents, Alerting, Auto language detection, Auto query guidance, Boolean & legacy, Operations, Breaking news clustering, Categorization, Collaboration, Community, Concept highlighting,
Concept-query, Summarization, Conceptual retrieval, Context summarization, Cross-modal suggest, Dynamic n-dimensional, Taxonomy generation, Dynamic XML, Consumption, Exact phrase matching, Expertise location, Explicit profiling,
Face recognition, Field modulation, Frame analysis, Fuzzy matching, Hot clustering, Hyperlinking, Image analysis, Image association, Implicit profiling, Keyword search, Mail object identification, Melody classification, Melody identification,
Metadata recognition, Natural language retrieval, Object identification, Object recognition, Ontology generation, Parametric refinement, Phrase spotting, Proper name identification, Query by example, Real-time aggregation, Routing,
Scene detection, Script alignment, Sentiment analysis, Sound matching, Speaker identification, Speaker recognition, Spectographic analysis, Spell checking, Tag reconciliation, Transcription, Video analysis, Voice printing, Word spotting,
Work groups, XML tagging….
AnalyzeEnrichFind Act
HP IDOL: Act on 100% of your information
Transactional
data
IOT Search
engine
ImagesSocial
media
Video Audio Mobile Documents Texts EmailCustomer
communications
Language independent Language independent
News
Forums
Blogs
…and more
Enterprise External and Cloud
HP Archiving
HP Data Protection
HP Marketing
Optimization
…and more
Act on 100% of your
information
HP IDOL
+500 powerful HP IDOL functions

Using HP Haven with Hadoop

A Smarter Data Lake Needs…
Automatically analyse rich media
Connectors & Policies
HP IDOL Features
Integration points with Hadoop
Understand myriad file formats and types
Breakdown information silos across enterprise
Improved, intuitive visibility to contents
KeyView
IDOL Server (incl HDFS Sync)
Image Server & Video Server

Hadoop HDFS Synchronizer
Deep Hadoop integration with MPP M/R architecture and enterprise-class security
• Automate the complete picture
- Extracts the entire content of a given file
residing on HDFS
- Processing on HDFS
• Configuration -> Map Reduce
- Synchronized crawlers that translate
configurations into Map/Reduce processes
- No advanced programming necessary
• Leverage M/R
- Distributed MPP processing, data locality,
minimized network traffic
• Advanced analytics built in
- OCR, entity extraction, logo detection,
IDOL HDFS Sync: prepare data for analysis

Demo: Smart Content Hub
Hadoop Cluster
HDFS
HDFS
Connector
IDOL
Enterprise
Connectors
IDOL
Apps
Enterprise
Repositorie
s
Cloud &
Web
Business
Users
HDFS
Sync
Hadoop Services
Edge
Node
Resource Slots
Compute Nodes

Hadoop and IDOL in practice

Case study: Hadoop and Big Data in healthcare
2
New use-cases enabled
• Population and Community Health
• Prediction capabilities (symptoms, ailments, outbreaks,
etc.)
• Clear picture of Community Health (attitudinal trends,
demographics, geospatial)
• International impact
• Benefit/Reference-based plan design
• Care Management/Care Coordination
• Combine with Claims to fill in gaps (symptomatic,
attitudes, education)
• Outcome Success
• Surveillance, Analysis, Product Development
Innovation
• Competitive intelligence
• Trends (attitudinal/behavioral, caregiving, device
usage, etc.)
• Monetized data insight opportunities
• Consumer Activation/Engagement/Education
• Consumer conversations, trends, blogs
• Interactive/participative approach
• Expand “Circles of Influence”
• Sets Quality Standards for Care/Providers
• Reputation Management/Outreach
• Sentiment management (competitor & brand)
• Outreach to support members, clients, providers
• Voice of the Customer
Claims Data
Treatment/
Service Data
Call Center Data
Innovation
Payment
integrity
Product
dev.
Care
delivery
BrandConsumer
activation
Providers
FWA
Recovery
Data
Provider
Information
Lines of business
Social Media
Challenges:
• Started with Payment Integrity Use-Case
• Dealing with evolving patterns of FWA
• Multiple payment systems , no single view
• No-Self Service
• Long turn around time for BI analysis
reports
HP solution:
• IDOL based solution
• Self-service analysis for business analyst
• Single point of access - Multiple systems
• Dynamic rule-engine tests against new
and historical claims to identify potential
recoveries
• Scale out on Hadoop Architecture
• New data, use-cases being added
continually
ROI:
• 24 x improvement in analysis turnaround
• Millions $$ saved in first few weeks

Using HP IDOL with Hadoop
• Reduce cost, time, and expertise
required to gain actionable insight
• Empower business users to
interact with Hadoop data
• Real-time and interactive
• IDOL’s advanced analysis of data
• Connects all data-types
• Standardized data model
RETURN ON
INFORMATION
Securely perform enterprise-class analysis of Hadoop data

www.hp.com/go/haven
hortonworks.com/partner/hp/
Solution brochure
Technical white paper
HP Vertica SQL on Hadoop
FAQ
Customer analytics use case
Learn more about HP Haven

Next Steps…
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about HP & Hortonworks
http://hortonworks.com/partner/HP/
Contact us: events@hortonworks.com

Create a Smarter Data Lake with HP Haven and Apache Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Create a Smarter Data Lake with HP Haven and Apache Hadoop

Similar to Create a Smarter Data Lake with HP Haven and Apache Hadoop (20)

More from Hortonworks

More from Hortonworks (20)

Recently uploaded

Recently uploaded (20)

Create a Smarter Data Lake with HP Haven and Apache Hadoop

Editor's Notes