SlideShare a Scribd company logo
1 of 21
Download to read offline
#ibmedge© 2016 IBM Corporation
Create a Colder Storage Tier for
Hadoop & Spark Using IBM
Elastic Storage Server & HDFS
Transparency
Ted Hoover / September 19, 2016
#ibmedge
The History of Spectrum Scale
1
This infographic is the genealogy of IBM Spectrum Scale, from it’s birth as a digital
media server and HPC research project to it’s place as a foundational element in the
IBM Spectrum Storage family. It highlights key milestones in the product history,
usage, and industry to convey that Spectrum Scale may have started as GPFS, but it
is so much more now. IBM has invested in the enterprise features that make it easy
to use, reliable and suitable for mission critical storage of all types.
#ibmedge
Unified data access with
File and Object Based
Storage
Rolling
Upgrades
File Placement
Optimization
Global Active File
Management
Advanced Routing &
Caching Services
Spectrum Scale services
PCS / IBM Confidential
Commodity Hardware
Sync & Async
ReplicationFlash
Acceleration
Network performance
monitoring
Native Encryption
And Secure Erase
Common
Management
Cloud Ready
High speed
scanning engine
Transparent policy
Driven data migration
Storage Resource
Pools
POSIX HDFS
SMB/C
IFS
NFS
Swift/S
3
Global Namespace
Archive
Integration
Simplify Management
Software-Defined Agility
Enable Global
Collaboration
Enterprise
Features &
Flexibility
2
#ibmedge
Reduce Complexity
Redefining Unified Storage
Challenge
Managing Data Growth
• Lowering data costs
• Managing data retrieval & app support
• Protecting business data
Unified Scale-out Data Lake
• File In/Out, Object In/Out; Analytics on demand.
• High-performance native protocols
• Single Management Plane
• Cluster replication & global namespace
• Enterprise storage features across file, object & HDFS
SSD Fast
Disk
Slow
Disk
Tape
Spectrum Scale
NFS SMBPOSIX Swift/S3HDFS
3
#ibmedge IBM Systems
Store everywhere. Run anywhere.
Analytics without complexity
Challenge
Separate storage systems for ingest, analysis, results
• HDFS requires locality aware storage (namenode)
• Data transfer slows time to results
• Different frameworks & analytics tools use data differently
HDFS Transparency
• Map/Reduce on shared, or shared nothing storage
• No waiting for data transfer between storage systems
• Immediately share results
• Single ‘Data Lake’ for all applications
• Enterprise data management
• Archive and Analysis in-place
Ingest
ObjectFile
Direct Access
POSIX
Raw Data
Analysis
4
#ibmedge
Spectrum Scale 4.2.1 for Big Data Oceans
extending HDFS for the enterprise
An enterprise HDFS filesystem
• Expand use of Shared Nothing Clusters
• Simplicity of Storage Rich Servers with
enterprise features
• Advanced Routing (AFM), encryption,
QoS, compression
• Mix cluster types
• Shared Nothing = traditional HDFS style
• Centralized Storage = traditional enterprise
Other clients
Storage Servers
Storage
Store Everywhere. Run Anywhere.
Standard
commands
& protocols
Shared Nothing Clusters
Application specifies
hdfs:///namenode:9001
5
#ibmedge
Spectrum Scale 4.2.1 for Big Data Oceans
extending HDFS across Clusters
Extending the Filesystem
• Run analytics across multiple HDFS
and/or Spectrum Scale clusters
• No need to move the data
• Build Data Oceans on demand
Store Everywhere. Run Anywhere.
Disk
IBM Spectrum Scale HDFS Transparency Connector
DiskDisk Disk
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn2.node.net:8020/hadoop/gpfshdfs://nn1.node.net:8020/hadoop/hdfs
Cluster X Cluster Y
6
#ibmedge
Use Case 1: Federate ESS with Existing HDFS Filesystem
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn1.node.net:8020/hadoop/hdfs
IBM Spectrum Scale HDFS Transparency Connector
hdfs://nn2.node.net:8020/hadoop/gpfs
Improve Hadoop Cluster
Utilization
• Manually move less frequently
accessed data to an ESS tier
• Applications can still access data
that has been moved seamlessly
Commands:
$ hadoop distcp viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
$ Hadoop fs rm viewfs://clusterX:/hadoop/hdfs/file1
7
#ibmedge
Use Case 2: Federate ESS with Existing a Spectrum Scale
Filesystem
/gpfs/fs1
Extending a Spectrum Scale
Filesystem
• Add an ESS tier to an existing FPO
cluster and use ILM policies to migrate
data to ESS tier
• Data is still accessible from FPO cluster
rule 'FPO_USE' SET POOL 'fpodata' REPLICATE (2) FOR FILESET ('fpodata')
rule 'FPO_TO_SHARESTORAGE' MIGRATE FROM POOL 'fpodata' TO POOL 'datapool'
where CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL '10' MINUTES
rule default SET POOL 'datapool'
/gpfs/fs2
Single Name Space
8
#ibmedge
Demo
9
#ibmedge
IBM Spectrum Protect
Use Case 3: Spectrum Scale Provides Easy Integration with
Enterprise Backup Tools
IBM Spectrum Scale HDFS Transparency Connector
Protecting Business Data
• Use ESS warm data tier with
Spectrum Protect for backup
• Simplified backup administration
tools
• Scalable performance
• Optimized data protection
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn1.node.net:8020/hadoop/hdfs hdfs://nn2.node.net:8020/hadoop/gpfs
10
#ibmedge
Use Case 4: Spectrum Scale Provides Easy Integration with
Enterprise Archiving Tools
Protecting Business Data
• Use ESS warm data tier with
Spectrum Archive to tape
• Powerful policy engine
• Information Lifecycle Management
• Fast metadata ‘scanning’ and data
movement
• Automated data migration to based on
threshold
• Users not affected by data migration
• Single namespace
IBM Spectrum Scale HDFS Transparency Connector
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn1.node.net:8020/hadoop/hdfs hdfs://nn2.node.net:8020/hadoop/gpfs
11
IBM Spectrum Archive
#ibmedge
IBM Spectrum Protect
Use Case 5: Spectrum Scale Provides Easy Integration with
Enterprise Backup and Archiving tools
/gpfs/fs2
Protecting Business Data
• Optionally Spectrum Protect and
Spectrum Archive can be used
directly with Spectrum Scale FPO
/gpfs/fs1
Single Name Space
12
IBM Spectrum Archive
#ibmedge
Client Use Case 1: Unified Analytic/Workflow Pipelines
13
Ingest Analyze Export Visualize
POSIX HDFS NFS Object
RDBMSPOSIX
Data LakeHDFS
AnalyzeHDFS DashboardPOSIX
ReportPOSIX SMB Share
NFS
Object
DB2 on Shared Nothing Cluster (FPO)
Hadoop on Shared Nothing Cluster (FPO)
Storage
ESS
Structured data warehouse
Warehouse extension for
unstructured data
Compute Cluster
SAS Analytics Dashboard &
Reporting
#ibmedge
Client Use Case 2: Life Sciences with HPC and Hadoop/Spark
14
Storage Storage Storage
ESS based
shared storage
cluster
HPC Compute Cluster
File A Event on shared
pool
And
File D Event on
FPO pool
File B on shared
pool
LSF Job 2
LSF Hadoop Job
File F on FPO pool
LSF Spark Job
LSF Job 1
File C on FPO pool
File E on FPO pool
LSF Job 5
File F replicated to
a remote Spectrum
Scale Server
#ibmedge
Client Use Case 3: HPC and Data Analytics
15
Storage Storage Storage
ESS based
shared storage
cluster
HPC Compute Cluster
Ingest
POSIX
&
Object
Analyze
POSIX
iterate
Simulate
HDFS
Analyze
Analyze
HDFS
HDFS
#ibmedge
Summary: Big Data Oceans extending HDFS across Clusters
16
Unified Data
Repository, Support
Multiple Analytics
Federate ESS with
Existing HDFS
Filesystem
Federate ESS with
Existing Spectrum
Scale Filesystem
HDFS Transparency Connector Single Name Space
Improve Hadoop Cluster Utilization
• Manually move less frequently
accessed data to an ESS tier
• Applications can still access data that
has been moved seamlessly
Expand use of Shared Nothing
Clusters
• Simplicity of Storage Rich Servers with
enterprise features
• Advanced Routing (AFM), encryption,
QoS, compression
• Mix cluster types
• Backup and Archive Support
Extending the Filesystem
• Run analytics across multiple HDFS
and/or Spectrum Scale clusters
• No need to move the data
• Build Data Oceans on demand
Disk
HDFS Transparency Connector
DiskDisk Disk
#ibmedge
Spectrum Scale User Group
• The Spectrum Scale User Group is free
to join and open to all using, interested
in using or integrating Spectrum Scale.
• Join the User Group activities to meet
your peers and get access to experts
from partners and IBM.
• Next meetings:
- APAC: October 14, Melbourne
- Global at SC16 : November 13 1pm to 5pm, Salt Lake City
• Web page: http://www.spectrumscale.org/
• Presentations: http://www.spectrumscale.org/presentations/
• Mailing list: http://www.spectrumscale.org/join/
• Contact: http://www.spectrumscale.org/committee/
• Meet Bob Oesterlin (US Co-Principal) at Edge2016: Robert.Oesterlin@nuance.com
© 2016 IBM Corporation #ibmedge
Thank You
#ibmedge
Notices and Disclaimers
19
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission
from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of
initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS
DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE
USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY.
IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our
warranty terms apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers
have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in
which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials
and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or
their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and
interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such
laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law
#ibmedge
Notices and Disclaimers Con’t.
20
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not
tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the
ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual
property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,
FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG,
Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business
Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

More Related Content

What's hot

S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5Tony Pearson
 
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...xKinAnx
 
IBM Storage for Analytics, Cognitive and Cloud
IBM Storage for Analytics, Cognitive and CloudIBM Storage for Analytics, Cognitive and Cloud
IBM Storage for Analytics, Cognitive and CloudTony Pearson
 
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...Sandeep Patil
 
Cleversafe august 2016
Cleversafe august 2016Cleversafe august 2016
Cleversafe august 2016Joe Krotz
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined StorageSandeep Patil
 
IBM's Cloud Storage Options
IBM's Cloud Storage OptionsIBM's Cloud Storage Options
IBM's Cloud Storage OptionsTony Pearson
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5Tony Pearson
 
SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)Odinot Stanislas
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...xKinAnx
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Tony Pearson
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsTony Pearson
 
Consolidating File Servers into the Cloud
Consolidating File Servers into the CloudConsolidating File Servers into the Cloud
Consolidating File Servers into the CloudBuurst
 
IBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the CloudIBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the CloudTony Pearson
 
IBM general parallel file system - introduction
IBM general parallel file system - introductionIBM general parallel file system - introduction
IBM general parallel file system - introductionIBM Danmark
 
12 Architectural Requirements for Protecting Business Data in the Cloud
12 Architectural Requirements for Protecting Business Data in the Cloud12 Architectural Requirements for Protecting Business Data in the Cloud
12 Architectural Requirements for Protecting Business Data in the CloudBuurst
 
Migrate Existing Applications to AWS without Re-engineering
Migrate Existing Applications to AWS without Re-engineeringMigrate Existing Applications to AWS without Re-engineering
Migrate Existing Applications to AWS without Re-engineeringBuurst
 
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)Michael Fork
 

What's hot (20)

S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5
 
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 6 spectrumscale el...
 
IBM Storage for Analytics, Cognitive and Cloud
IBM Storage for Analytics, Cognitive and CloudIBM Storage for Analytics, Cognitive and Cloud
IBM Storage for Analytics, Cognitive and Cloud
 
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
Proactive Threat Detection and Safeguarding of Data for Enhanced Cyber resili...
 
Cleversafe august 2016
Cleversafe august 2016Cleversafe august 2016
Cleversafe august 2016
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined Storage
 
IBM's Cloud Storage Options
IBM's Cloud Storage OptionsIBM's Cloud Storage Options
IBM's Cloud Storage Options
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5
 
Ibm power systems hpc cluster
Ibm power systems hpc cluster Ibm power systems hpc cluster
Ibm power systems hpc cluster
 
SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)SNIA : Swift Object Storage adding EC (Erasure Code)
SNIA : Swift Object Storage adding EC (Erasure Code)
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 
Iaas storage-170302090824
Iaas storage-170302090824Iaas storage-170302090824
Iaas storage-170302090824
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
 
Consolidating File Servers into the Cloud
Consolidating File Servers into the CloudConsolidating File Servers into the Cloud
Consolidating File Servers into the Cloud
 
IBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the CloudIBM Spectrum Scale on the Cloud
IBM Spectrum Scale on the Cloud
 
IBM general parallel file system - introduction
IBM general parallel file system - introductionIBM general parallel file system - introduction
IBM general parallel file system - introduction
 
12 Architectural Requirements for Protecting Business Data in the Cloud
12 Architectural Requirements for Protecting Business Data in the Cloud12 Architectural Requirements for Protecting Business Data in the Cloud
12 Architectural Requirements for Protecting Business Data in the Cloud
 
Migrate Existing Applications to AWS without Re-engineering
Migrate Existing Applications to AWS without Re-engineeringMigrate Existing Applications to AWS without Re-engineering
Migrate Existing Applications to AWS without Re-engineering
 
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
SoftLayer Storage Services Overview (for Interop Las Vegas 2015)
 

Viewers also liked

Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit
 
Dynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDataWorks Summit
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 

Viewers also liked (15)

Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu Kasinathan
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Dynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark Application
 
SocSciBot(01 Mar2010) - Korean Manual
SocSciBot(01 Mar2010) - Korean ManualSocSciBot(01 Mar2010) - Korean Manual
SocSciBot(01 Mar2010) - Korean Manual
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Proxy Servers
Proxy ServersProxy Servers
Proxy Servers
 
Proxy Server
Proxy ServerProxy Server
Proxy Server
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 

Similar to Hadoop and Spark Analytics over Better Storage

Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleAbhishek Sood
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015Doug O'Flaherty
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic StoragePatrick Bouillaud
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyAlluxio, Inc.
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudAlluxio, Inc.
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAlluxio, Inc.
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
IBM Spectrum Scale ECM - Winning Combination
IBM Spectrum Scale  ECM - Winning CombinationIBM Spectrum Scale  ECM - Winning Combination
IBM Spectrum Scale ECM - Winning CombinationSasikanth Eda
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
EMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio, Inc.
 
Macroview Netapp Overview
Macroview Netapp OverviewMacroview Netapp Overview
Macroview Netapp OverviewAlex Tsui
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIDataWorks Summit
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 

Similar to Hadoop and Spark Analytics over Better Storage (20)

Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum Scale
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015IBM Spectrum Scale Overview november 2015
IBM Spectrum Scale Overview november 2015
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
IBM Spectrum Scale ECM - Winning Combination
IBM Spectrum Scale  ECM - Winning CombinationIBM Spectrum Scale  ECM - Winning Combination
IBM Spectrum Scale ECM - Winning Combination
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
EMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data AnalyticsEMC Isilon Multitenancy for Hadoop Big Data Analytics
EMC Isilon Multitenancy for Hadoop Big Data Analytics
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Macroview Netapp Overview
Macroview Netapp OverviewMacroview Netapp Overview
Macroview Netapp Overview
 
Breaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AIBreaking the Silos: Storage for Analytics & AI
Breaking the Silos: Storage for Analytics & AI
 
Cleversafe.PPTX
Cleversafe.PPTXCleversafe.PPTX
Cleversafe.PPTX
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 

Recently uploaded

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 

Hadoop and Spark Analytics over Better Storage

  • 1. #ibmedge© 2016 IBM Corporation Create a Colder Storage Tier for Hadoop & Spark Using IBM Elastic Storage Server & HDFS Transparency Ted Hoover / September 19, 2016
  • 2. #ibmedge The History of Spectrum Scale 1 This infographic is the genealogy of IBM Spectrum Scale, from it’s birth as a digital media server and HPC research project to it’s place as a foundational element in the IBM Spectrum Storage family. It highlights key milestones in the product history, usage, and industry to convey that Spectrum Scale may have started as GPFS, but it is so much more now. IBM has invested in the enterprise features that make it easy to use, reliable and suitable for mission critical storage of all types.
  • 3. #ibmedge Unified data access with File and Object Based Storage Rolling Upgrades File Placement Optimization Global Active File Management Advanced Routing & Caching Services Spectrum Scale services PCS / IBM Confidential Commodity Hardware Sync & Async ReplicationFlash Acceleration Network performance monitoring Native Encryption And Secure Erase Common Management Cloud Ready High speed scanning engine Transparent policy Driven data migration Storage Resource Pools POSIX HDFS SMB/C IFS NFS Swift/S 3 Global Namespace Archive Integration Simplify Management Software-Defined Agility Enable Global Collaboration Enterprise Features & Flexibility 2
  • 4. #ibmedge Reduce Complexity Redefining Unified Storage Challenge Managing Data Growth • Lowering data costs • Managing data retrieval & app support • Protecting business data Unified Scale-out Data Lake • File In/Out, Object In/Out; Analytics on demand. • High-performance native protocols • Single Management Plane • Cluster replication & global namespace • Enterprise storage features across file, object & HDFS SSD Fast Disk Slow Disk Tape Spectrum Scale NFS SMBPOSIX Swift/S3HDFS 3
  • 5. #ibmedge IBM Systems Store everywhere. Run anywhere. Analytics without complexity Challenge Separate storage systems for ingest, analysis, results • HDFS requires locality aware storage (namenode) • Data transfer slows time to results • Different frameworks & analytics tools use data differently HDFS Transparency • Map/Reduce on shared, or shared nothing storage • No waiting for data transfer between storage systems • Immediately share results • Single ‘Data Lake’ for all applications • Enterprise data management • Archive and Analysis in-place Ingest ObjectFile Direct Access POSIX Raw Data Analysis 4
  • 6. #ibmedge Spectrum Scale 4.2.1 for Big Data Oceans extending HDFS for the enterprise An enterprise HDFS filesystem • Expand use of Shared Nothing Clusters • Simplicity of Storage Rich Servers with enterprise features • Advanced Routing (AFM), encryption, QoS, compression • Mix cluster types • Shared Nothing = traditional HDFS style • Centralized Storage = traditional enterprise Other clients Storage Servers Storage Store Everywhere. Run Anywhere. Standard commands & protocols Shared Nothing Clusters Application specifies hdfs:///namenode:9001 5
  • 7. #ibmedge Spectrum Scale 4.2.1 for Big Data Oceans extending HDFS across Clusters Extending the Filesystem • Run analytics across multiple HDFS and/or Spectrum Scale clusters • No need to move the data • Build Data Oceans on demand Store Everywhere. Run Anywhere. Disk IBM Spectrum Scale HDFS Transparency Connector DiskDisk Disk viewfs://clusterX:/hadoop/hdfs/file1 viewfs://clusterY:/hadoop/gpfs/file1 hdfs://nn2.node.net:8020/hadoop/gpfshdfs://nn1.node.net:8020/hadoop/hdfs Cluster X Cluster Y 6
  • 8. #ibmedge Use Case 1: Federate ESS with Existing HDFS Filesystem viewfs://clusterX:/hadoop/hdfs/file1 viewfs://clusterY:/hadoop/gpfs/file1 hdfs://nn1.node.net:8020/hadoop/hdfs IBM Spectrum Scale HDFS Transparency Connector hdfs://nn2.node.net:8020/hadoop/gpfs Improve Hadoop Cluster Utilization • Manually move less frequently accessed data to an ESS tier • Applications can still access data that has been moved seamlessly Commands: $ hadoop distcp viewfs://clusterX:/hadoop/hdfs/file1 viewfs://clusterY:/hadoop/gpfs/file1 $ Hadoop fs rm viewfs://clusterX:/hadoop/hdfs/file1 7
  • 9. #ibmedge Use Case 2: Federate ESS with Existing a Spectrum Scale Filesystem /gpfs/fs1 Extending a Spectrum Scale Filesystem • Add an ESS tier to an existing FPO cluster and use ILM policies to migrate data to ESS tier • Data is still accessible from FPO cluster rule 'FPO_USE' SET POOL 'fpodata' REPLICATE (2) FOR FILESET ('fpodata') rule 'FPO_TO_SHARESTORAGE' MIGRATE FROM POOL 'fpodata' TO POOL 'datapool' where CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL '10' MINUTES rule default SET POOL 'datapool' /gpfs/fs2 Single Name Space 8
  • 11. #ibmedge IBM Spectrum Protect Use Case 3: Spectrum Scale Provides Easy Integration with Enterprise Backup Tools IBM Spectrum Scale HDFS Transparency Connector Protecting Business Data • Use ESS warm data tier with Spectrum Protect for backup • Simplified backup administration tools • Scalable performance • Optimized data protection viewfs://clusterX:/hadoop/hdfs/file1 viewfs://clusterY:/hadoop/gpfs/file1 hdfs://nn1.node.net:8020/hadoop/hdfs hdfs://nn2.node.net:8020/hadoop/gpfs 10
  • 12. #ibmedge Use Case 4: Spectrum Scale Provides Easy Integration with Enterprise Archiving Tools Protecting Business Data • Use ESS warm data tier with Spectrum Archive to tape • Powerful policy engine • Information Lifecycle Management • Fast metadata ‘scanning’ and data movement • Automated data migration to based on threshold • Users not affected by data migration • Single namespace IBM Spectrum Scale HDFS Transparency Connector viewfs://clusterX:/hadoop/hdfs/file1 viewfs://clusterY:/hadoop/gpfs/file1 hdfs://nn1.node.net:8020/hadoop/hdfs hdfs://nn2.node.net:8020/hadoop/gpfs 11 IBM Spectrum Archive
  • 13. #ibmedge IBM Spectrum Protect Use Case 5: Spectrum Scale Provides Easy Integration with Enterprise Backup and Archiving tools /gpfs/fs2 Protecting Business Data • Optionally Spectrum Protect and Spectrum Archive can be used directly with Spectrum Scale FPO /gpfs/fs1 Single Name Space 12 IBM Spectrum Archive
  • 14. #ibmedge Client Use Case 1: Unified Analytic/Workflow Pipelines 13 Ingest Analyze Export Visualize POSIX HDFS NFS Object RDBMSPOSIX Data LakeHDFS AnalyzeHDFS DashboardPOSIX ReportPOSIX SMB Share NFS Object DB2 on Shared Nothing Cluster (FPO) Hadoop on Shared Nothing Cluster (FPO) Storage ESS Structured data warehouse Warehouse extension for unstructured data Compute Cluster SAS Analytics Dashboard & Reporting
  • 15. #ibmedge Client Use Case 2: Life Sciences with HPC and Hadoop/Spark 14 Storage Storage Storage ESS based shared storage cluster HPC Compute Cluster File A Event on shared pool And File D Event on FPO pool File B on shared pool LSF Job 2 LSF Hadoop Job File F on FPO pool LSF Spark Job LSF Job 1 File C on FPO pool File E on FPO pool LSF Job 5 File F replicated to a remote Spectrum Scale Server
  • 16. #ibmedge Client Use Case 3: HPC and Data Analytics 15 Storage Storage Storage ESS based shared storage cluster HPC Compute Cluster Ingest POSIX & Object Analyze POSIX iterate Simulate HDFS Analyze Analyze HDFS HDFS
  • 17. #ibmedge Summary: Big Data Oceans extending HDFS across Clusters 16 Unified Data Repository, Support Multiple Analytics Federate ESS with Existing HDFS Filesystem Federate ESS with Existing Spectrum Scale Filesystem HDFS Transparency Connector Single Name Space Improve Hadoop Cluster Utilization • Manually move less frequently accessed data to an ESS tier • Applications can still access data that has been moved seamlessly Expand use of Shared Nothing Clusters • Simplicity of Storage Rich Servers with enterprise features • Advanced Routing (AFM), encryption, QoS, compression • Mix cluster types • Backup and Archive Support Extending the Filesystem • Run analytics across multiple HDFS and/or Spectrum Scale clusters • No need to move the data • Build Data Oceans on demand Disk HDFS Transparency Connector DiskDisk Disk
  • 18. #ibmedge Spectrum Scale User Group • The Spectrum Scale User Group is free to join and open to all using, interested in using or integrating Spectrum Scale. • Join the User Group activities to meet your peers and get access to experts from partners and IBM. • Next meetings: - APAC: October 14, Melbourne - Global at SC16 : November 13 1pm to 5pm, Salt Lake City • Web page: http://www.spectrumscale.org/ • Presentations: http://www.spectrumscale.org/presentations/ • Mailing list: http://www.spectrumscale.org/join/ • Contact: http://www.spectrumscale.org/committee/ • Meet Bob Oesterlin (US Co-Principal) at Edge2016: Robert.Oesterlin@nuance.com
  • 19. © 2016 IBM Corporation #ibmedge Thank You
  • 20. #ibmedge Notices and Disclaimers 19 Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law
  • 21. #ibmedge Notices and Disclaimers Con’t. 20 Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.