2. Inflection Point - Data Management goes Open Source
Infrastructure Layer
Data Layer
Analytics
Layer
Apps Layer
(Solutions)
Disruption Impact
Workload Optimization
3. Building Blocks for Workload-Optimized Big Data
HPE Confidential - for HPE and Channel Partners only 3
Active Archive
• Multi-temperate storage with data
governance and federated queries
• Denser TB/rack u, lower $/TB for
long term storage
Data Lakes
• Ingestion of multiple types / sources
of data
• Batch, Interactive, Real-time
workloads
• Different infrastructure requirements
Data Warehouse
Modernization
• Data Staging & landing zone
• Bach processing
• Traditional and rack density
optimized form factors
Use Cases:
ProLiant DL300
series
Apollo 4530
Apollo 4200
Traditional 1U/2U
design
• Building block for
traditional Hadoop
workloads
Density optimized platform
block for traditional Hadoop
workloads
• Same spindle/core ratios
Storage optimized block
• Foundation for Data lakes
• Double the storage
density of traditional
platform
Apollo 4510
Densest Storage block
• Online Archival
• Object storage
4. A Big Data Journey…
ETL Offload Archival
Deep Learning
Event Processing
In Memory Analytics
5. HP Big Data Reference Architecture
Elastic Platform for Analytics
Event Processing
Low Latency Compute
Moonshot m710x
In Memory Analytics
Big Memory Compute
Apollo xl170r w 512G memory
Archival Storage
Apollo 4200 w 6TB HDD
ETL Offload
High Latency Compute
Apollo xl170 w 256G memory
Deep Learning
HPC Compute
Apollo xl190r w GPUs
HDFS Storage
Data Lake
Apollow 4200 w 3TB HDD
7. The Coming Landscape
– Non-Volatile Memory
– More than fast – byte addressable and persistent
– Photonics
– Optical Networking will make most NVM equidistant
– Some Implications on Big Data
– 90% of a database write transaction is eliminated
– A Shuffle …isn’t
– HPE is contributing changes to Spark with HDP
– Favored Algorithms might change
– Graph and matrix inversion based algorithms
Confidential
HPE’s “The Machine”
A shared something architecture
8. Platform Investigations for Workload Optimized Big Data
Confidential
Silicon Acceleration
Big Data/HPC/Cloud integration Composed Big Data
Multicore x86
CPU
GPGPU FPGA SoC/ASIC
Software Hardware
Meaning Aware Storage
Push work
into storage
9. HPE’s Own HDP Deployment – Modernizing Data Architecture
Millions in Savings and Significantly Improving Analytics
Data Lake Core
EA Dashboards & Reporting
- Dedicated satellite
- Marketplace interface
- Certified reports/data
- Enterprise consumption platforms
Satellite Analytics Clusters
- Super user + enterprise data
- Provisioned via project interlock
- Services analytics tools
- Domain (BU) zones and refineries (ad-hoc jobs)
- Synchronized via Hadoop replication
Data Lake Core
- Hadoop nucleus
- Enterprise refinery
- Certified enterprise data
- No direct consumption for general
users
- Full dataset discovery via limited
YARN containers
Foundation for HPE’s Go-
Forward Data Strategy
• Democratizing Analytics
• Open up analytics
innovation through self
service consumption and
governance
• Single E2E connected Data
Platform
• Serve up enterprise data w/
unprecedented speed,
accuracy, simplicity and
flexibility
10. HPE and Hortonworks Team Up
• Alliance partner for 2+ years
• HPE invested $50M in Hortonworks
• HPE CTO/EVP Martin Fink is on the Board
of Hortonworks
• Close collaboration from Engineering to
GTM
• Technical Collaboration
• YARN Node Labels (jira YARN-796)
• Spark Optimized Shuffle for big memory
• LLAP performance validation
– Together we’re driving Hadoop Forward
• More Open
• More Secure
• Optimized for Performance
Many of the world’s largest enterprises put their trust in the HPE-Hortonworks team!
11. Learn More Here at Hadoop Summit!
A New “Sparketecture” for Modernizing your
Data Warehouse
Wednesday, 11:30AM, Room 210C
Demos @ Booth 501
Play the Hadoop Trivia
Game and Win! – HPE Booth
12. Thank you
Catch HPE Session at 11:30am Wed, Room 210C
Visit the HPE booth, complete a quiz & win a prize
12
Editor's Notes
** HPE brings solutions across ALL of the Elements in this diagram (Apps Layer examples: Smart Metering, Smart Cars [ES examples])
** Of course, we’re known first and foremost for our INFRASTRUCTURE STACK, and we won’t disappoint today as we walk through this a little
** BUT, the disruption that we’re seeing regularly is between the infrastructure and data layers of the stack
Disruption = change from old (RAID integrated super intelligently into the server) to the NEW (a whole new set of optimizations for today’s data management software technologies)
Disruption = can also mean, the shift from traditional DB and BI to Open Source (SQL had 14 release in 25 years; Hadoop 65 releases in 7 years) At Data Layer
Impact:: Impact is EVERYWHERE. Revolutionizing the collection of data, and the nature of business intelligence that can be generated as a result
This is “Today” in the continuum of Today Tomorrow Someday
ProLiant Traditional 1U/2U design, essentially the gold standard
Apollo 4530 Optimized for physical density
Apollo 4510 Optimized for long term storage (densest storage)
Apollo 4200 Optimized specifically for storage density extensions to “traditional”
Every customer I’ve visited in recent memory was on some version of this journey. Note that BIG DATA IS NOT A WORKLOAD – IT IS A COLLECTION OF WORKLOADS EACH WITH ITS OWN REQUIREMENTS. As customers implement each new use case, they implement a new instance of an infrastructure stack, and end up with a dog’s breakfast on their floor
ETL OFFLOAD “Hey, here’s an opportunity to make use of all that data we’re collecting and feed it into our traditional BI (balanced systems)
EVENT PROCESSING More CPU, all flash, trying to make fast decisions in “click stream” timeframes (high perf)
ARCHIVAL Data, data, so much data. But DON’T THROW IT AWAY! (low compute, dense capacity)
DEEP LEARNING IN MEMORY ANALYTICS SAME THING
And every silo has 3 copies of data; My engineers like to remind me that this is probably the WRONG KIND OF CLUSTER
There must be a better way! “Some of the guys in my lab….”
** So we came up with this idea. We contributed some changes to YARN, for example. We proved that splitting the stack to have disaggregated optimized storage nodes not only works, but works better…..
Result is BDRA – a convergence of building block, all open sourced, to (ideally) eliminate data siloes, and purpose built compute nodes per BIG DATA / BIG ANALYTICS workload.
So far, everything is on the truck today – available, call 1-800-HPE
But here comes some stuff that might be in our future…. Things we’re poking around at… Can’t make any promises, but DAMN it’s cool what you see walking the halls at HPE…
NVM and MEMORY CENTRIC is SO COOL
PHOTONICS makes most memory equidistant
What if a SHUFFLE……..ISN’T !?
(1) WHAT IF – you had hardware acceleration built in to every node, and natively integrated with Hadoop or SPARK, etc.?
(2) WHAT IF – you had hardware PUSHDOWN built into every spindle behind every node – think predicate evaluation
(3) WHAT IF – we pulled some of our assets from years of enterprise-level experience in HPC and Cloud and integrated it with Big Data and Big Analytics!
(4) FINALLY what if your entire data center infrastructure were COMPOSABLE programmable hardware
Full dataset discovery – possible on data lake core but resources will be limited to protect Enterprise Refinery
Analytics tools and platforms that will use the Satellite Analytics Clusters include Vertica, Spark, R, NoSQL, and even traditional RDBMSs.
Hadoop synchronization still to be tested