SlideShare a Scribd company logo
1 of 12
Enterprise-grade
Big Data
Chris Eidler
VP, Solutions R&D, HPE
Inflection Point - Data Management goes Open Source
Infrastructure Layer
Data Layer
Analytics
Layer
Apps Layer
(Solutions)
Disruption Impact
Workload Optimization
Building Blocks for Workload-Optimized Big Data
HPE Confidential - for HPE and Channel Partners only 3
Active Archive
• Multi-temperate storage with data
governance and federated queries
• Denser TB/rack u, lower $/TB for
long term storage
Data Lakes
• Ingestion of multiple types / sources
of data
• Batch, Interactive, Real-time
workloads
• Different infrastructure requirements
Data Warehouse
Modernization
• Data Staging & landing zone
• Bach processing
• Traditional and rack density
optimized form factors
Use Cases:
ProLiant DL300
series
Apollo 4530
Apollo 4200
Traditional 1U/2U
design
• Building block for
traditional Hadoop
workloads
Density optimized platform
block for traditional Hadoop
workloads
• Same spindle/core ratios
Storage optimized block
• Foundation for Data lakes
• Double the storage
density of traditional
platform
Apollo 4510
Densest Storage block
• Online Archival
• Object storage
A Big Data Journey…
ETL Offload Archival
Deep Learning
Event Processing
In Memory Analytics
HP Big Data Reference Architecture
Elastic Platform for Analytics
Event Processing
Low Latency Compute
Moonshot m710x
In Memory Analytics
Big Memory Compute
Apollo xl170r w 512G memory
Archival Storage
Apollo 4200 w 6TB HDD
ETL Offload
High Latency Compute
Apollo xl170 w 256G memory
Deep Learning
HPC Compute
Apollo xl190r w GPUs
HDFS Storage
Data Lake
Apollow 4200 w 3TB HDD
What If….
Opportunities for Platform Optimization
The Coming Landscape
– Non-Volatile Memory
– More than fast – byte addressable and persistent
– Photonics
– Optical Networking will make most NVM equidistant
– Some Implications on Big Data
– 90% of a database write transaction is eliminated
– A Shuffle …isn’t
– HPE is contributing changes to Spark with HDP
– Favored Algorithms might change
– Graph and matrix inversion based algorithms
Confidential
HPE’s “The Machine”
A shared something architecture
Platform Investigations for Workload Optimized Big Data
Confidential
Silicon Acceleration
Big Data/HPC/Cloud integration Composed Big Data
Multicore x86
CPU
GPGPU FPGA SoC/ASIC
Software Hardware
Meaning Aware Storage
Push work
into storage
HPE’s Own HDP Deployment – Modernizing Data Architecture
Millions in Savings and Significantly Improving Analytics
Data Lake Core
EA Dashboards & Reporting
- Dedicated satellite
- Marketplace interface
- Certified reports/data
- Enterprise consumption platforms
Satellite Analytics Clusters
- Super user + enterprise data
- Provisioned via project interlock
- Services analytics tools
- Domain (BU) zones and refineries (ad-hoc jobs)
- Synchronized via Hadoop replication
Data Lake Core
- Hadoop nucleus
- Enterprise refinery
- Certified enterprise data
- No direct consumption for general
users
- Full dataset discovery via limited
YARN containers
Foundation for HPE’s Go-
Forward Data Strategy
• Democratizing Analytics
• Open up analytics
innovation through self
service consumption and
governance
• Single E2E connected Data
Platform
• Serve up enterprise data w/
unprecedented speed,
accuracy, simplicity and
flexibility
HPE and Hortonworks Team Up
• Alliance partner for 2+ years
• HPE invested $50M in Hortonworks
• HPE CTO/EVP Martin Fink is on the Board
of Hortonworks
• Close collaboration from Engineering to
GTM
• Technical Collaboration
• YARN Node Labels (jira YARN-796)
• Spark Optimized Shuffle for big memory
• LLAP performance validation
– Together we’re driving Hadoop Forward
• More Open
• More Secure
• Optimized for Performance
Many of the world’s largest enterprises put their trust in the HPE-Hortonworks team!
Learn More Here at Hadoop Summit!
A New “Sparketecture” for Modernizing your
Data Warehouse
Wednesday, 11:30AM, Room 210C
Demos @ Booth 501
Play the Hadoop Trivia
Game and Win! – HPE Booth
Thank you
Catch HPE Session at 11:30am Wed, Room 210C
Visit the HPE booth, complete a quiz & win a prize
12

More Related Content

What's hot

Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
DataWorks Summit/Hadoop Summit
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
DataWorks Summit/Hadoop Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
 

What's hot (20)

Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
How do you decide where your customer was?
How do you decide where your customer was?How do you decide where your customer was?
How do you decide where your customer was?
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 

Viewers also liked

Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
rhatr
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Turi, Inc.
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 

Viewers also liked (14)

Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with Spark
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
 

Similar to HPE Keynote Hadoop Summit San Jose 2016

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 

Similar to HPE Keynote Hadoop Summit San Jose 2016 (20)

Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 

More from DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

HPE Keynote Hadoop Summit San Jose 2016

  • 2. Inflection Point - Data Management goes Open Source Infrastructure Layer Data Layer Analytics Layer Apps Layer (Solutions) Disruption Impact Workload Optimization
  • 3. Building Blocks for Workload-Optimized Big Data HPE Confidential - for HPE and Channel Partners only 3 Active Archive • Multi-temperate storage with data governance and federated queries • Denser TB/rack u, lower $/TB for long term storage Data Lakes • Ingestion of multiple types / sources of data • Batch, Interactive, Real-time workloads • Different infrastructure requirements Data Warehouse Modernization • Data Staging & landing zone • Bach processing • Traditional and rack density optimized form factors Use Cases: ProLiant DL300 series Apollo 4530 Apollo 4200 Traditional 1U/2U design • Building block for traditional Hadoop workloads Density optimized platform block for traditional Hadoop workloads • Same spindle/core ratios Storage optimized block • Foundation for Data lakes • Double the storage density of traditional platform Apollo 4510 Densest Storage block • Online Archival • Object storage
  • 4. A Big Data Journey… ETL Offload Archival Deep Learning Event Processing In Memory Analytics
  • 5. HP Big Data Reference Architecture Elastic Platform for Analytics Event Processing Low Latency Compute Moonshot m710x In Memory Analytics Big Memory Compute Apollo xl170r w 512G memory Archival Storage Apollo 4200 w 6TB HDD ETL Offload High Latency Compute Apollo xl170 w 256G memory Deep Learning HPC Compute Apollo xl190r w GPUs HDFS Storage Data Lake Apollow 4200 w 3TB HDD
  • 6. What If…. Opportunities for Platform Optimization
  • 7. The Coming Landscape – Non-Volatile Memory – More than fast – byte addressable and persistent – Photonics – Optical Networking will make most NVM equidistant – Some Implications on Big Data – 90% of a database write transaction is eliminated – A Shuffle …isn’t – HPE is contributing changes to Spark with HDP – Favored Algorithms might change – Graph and matrix inversion based algorithms Confidential HPE’s “The Machine” A shared something architecture
  • 8. Platform Investigations for Workload Optimized Big Data Confidential Silicon Acceleration Big Data/HPC/Cloud integration Composed Big Data Multicore x86 CPU GPGPU FPGA SoC/ASIC Software Hardware Meaning Aware Storage Push work into storage
  • 9. HPE’s Own HDP Deployment – Modernizing Data Architecture Millions in Savings and Significantly Improving Analytics Data Lake Core EA Dashboards & Reporting - Dedicated satellite - Marketplace interface - Certified reports/data - Enterprise consumption platforms Satellite Analytics Clusters - Super user + enterprise data - Provisioned via project interlock - Services analytics tools - Domain (BU) zones and refineries (ad-hoc jobs) - Synchronized via Hadoop replication Data Lake Core - Hadoop nucleus - Enterprise refinery - Certified enterprise data - No direct consumption for general users - Full dataset discovery via limited YARN containers Foundation for HPE’s Go- Forward Data Strategy • Democratizing Analytics • Open up analytics innovation through self service consumption and governance • Single E2E connected Data Platform • Serve up enterprise data w/ unprecedented speed, accuracy, simplicity and flexibility
  • 10. HPE and Hortonworks Team Up • Alliance partner for 2+ years • HPE invested $50M in Hortonworks • HPE CTO/EVP Martin Fink is on the Board of Hortonworks • Close collaboration from Engineering to GTM • Technical Collaboration • YARN Node Labels (jira YARN-796) • Spark Optimized Shuffle for big memory • LLAP performance validation – Together we’re driving Hadoop Forward • More Open • More Secure • Optimized for Performance Many of the world’s largest enterprises put their trust in the HPE-Hortonworks team!
  • 11. Learn More Here at Hadoop Summit! A New “Sparketecture” for Modernizing your Data Warehouse Wednesday, 11:30AM, Room 210C Demos @ Booth 501 Play the Hadoop Trivia Game and Win! – HPE Booth
  • 12. Thank you Catch HPE Session at 11:30am Wed, Room 210C Visit the HPE booth, complete a quiz & win a prize 12

Editor's Notes

  1. ** HPE brings solutions across ALL of the Elements in this diagram (Apps Layer examples: Smart Metering, Smart Cars [ES examples]) ** Of course, we’re known first and foremost for our INFRASTRUCTURE STACK, and we won’t disappoint today as we walk through this a little ** BUT, the disruption that we’re seeing regularly is between the infrastructure and data layers of the stack Disruption = change from old (RAID integrated super intelligently into the server) to the NEW (a whole new set of optimizations for today’s data management software technologies) Disruption = can also mean, the shift from traditional DB and BI to Open Source (SQL had 14 release in 25 years; Hadoop 65 releases in 7 years) At Data Layer Impact:: Impact is EVERYWHERE. Revolutionizing the collection of data, and the nature of business intelligence that can be generated as a result
  2. This is “Today” in the continuum of Today  Tomorrow  Someday ProLiant  Traditional 1U/2U design, essentially the gold standard Apollo 4530  Optimized for physical density Apollo 4510  Optimized for long term storage (densest storage) Apollo 4200  Optimized specifically for storage density extensions to “traditional”
  3. Every customer I’ve visited in recent memory was on some version of this journey. Note that BIG DATA IS NOT A WORKLOAD – IT IS A COLLECTION OF WORKLOADS EACH WITH ITS OWN REQUIREMENTS. As customers implement each new use case, they implement a new instance of an infrastructure stack, and end up with a dog’s breakfast on their floor ETL OFFLOAD  “Hey, here’s an opportunity to make use of all that data we’re collecting and feed it into our traditional BI (balanced systems) EVENT PROCESSING  More CPU, all flash, trying to make fast decisions in “click stream” timeframes (high perf) ARCHIVAL  Data, data, so much data. But DON’T THROW IT AWAY! (low compute, dense capacity) DEEP LEARNING  IN MEMORY ANALYTICS  SAME THING And every silo has 3 copies of data; My engineers like to remind me that this is probably the WRONG KIND OF CLUSTER
  4. There must be a better way! “Some of the guys in my lab….” ** So we came up with this idea. We contributed some changes to YARN, for example. We proved that splitting the stack to have disaggregated optimized storage nodes not only works, but works better….. Result is BDRA – a convergence of building block, all open sourced, to (ideally) eliminate data siloes, and purpose built compute nodes per BIG DATA / BIG ANALYTICS workload.
  5. So far, everything is on the truck today – available, call 1-800-HPE But here comes some stuff that might be in our future…. Things we’re poking around at… Can’t make any promises, but DAMN it’s cool what you see walking the halls at HPE…
  6. NVM and MEMORY CENTRIC is SO COOL PHOTONICS makes most memory equidistant What if a SHUFFLE……..ISN’T !?
  7. (1) WHAT IF – you had hardware acceleration built in to every node, and natively integrated with Hadoop or SPARK, etc.? (2) WHAT IF – you had hardware PUSHDOWN built into every spindle behind every node – think predicate evaluation (3) WHAT IF – we pulled some of our assets from years of enterprise-level experience in HPC and Cloud and integrated it with Big Data and Big Analytics! (4) FINALLY  what if your entire data center infrastructure were COMPOSABLE  programmable hardware
  8. Full dataset discovery – possible on data lake core but resources will be limited to protect Enterprise Refinery Analytics tools and platforms that will use the Satellite Analytics Clusters include Vertica, Spark, R, NoSQL, and even traditional RDBMSs. Hadoop synchronization still to be tested