2. #ibmedge
The History of Spectrum Scale
1
This infographic is the genealogy of IBM Spectrum Scale, from it’s birth as a digital
media server and HPC research project to it’s place as a foundational element in the
IBM Spectrum Storage family. It highlights key milestones in the product history,
usage, and industry to convey that Spectrum Scale may have started as GPFS, but it
is so much more now. IBM has invested in the enterprise features that make it easy
to use, reliable and suitable for mission critical storage of all types.
3. #ibmedge
Unified data access with
File and Object Based
Storage
Rolling
Upgrades
File Placement
Optimization
Global Active File
Management
Advanced Routing &
Caching Services
Spectrum Scale services
PCS / IBM Confidential
Commodity Hardware
Sync & Async
ReplicationFlash
Acceleration
Network performance
monitoring
Native Encryption
And Secure Erase
Common
Management
Cloud Ready
High speed
scanning engine
Transparent policy
Driven data migration
Storage Resource
Pools
POSIX HDFS
SMB/C
IFS
NFS
Swift/S
3
Global Namespace
Archive
Integration
Simplify Management
Software-Defined Agility
Enable Global
Collaboration
Enterprise
Features &
Flexibility
2
4. #ibmedge
Reduce Complexity
Redefining Unified Storage
Challenge
Managing Data Growth
• Lowering data costs
• Managing data retrieval & app support
• Protecting business data
Unified Scale-out Data Lake
• File In/Out, Object In/Out; Analytics on demand.
• High-performance native protocols
• Single Management Plane
• Cluster replication & global namespace
• Enterprise storage features across file, object & HDFS
SSD Fast
Disk
Slow
Disk
Tape
Spectrum Scale
NFS SMBPOSIX Swift/S3HDFS
3
5. #ibmedge IBM Systems
Store everywhere. Run anywhere.
Analytics without complexity
Challenge
Separate storage systems for ingest, analysis, results
• HDFS requires locality aware storage (namenode)
• Data transfer slows time to results
• Different frameworks & analytics tools use data differently
HDFS Transparency
• Map/Reduce on shared, or shared nothing storage
• No waiting for data transfer between storage systems
• Immediately share results
• Single ‘Data Lake’ for all applications
• Enterprise data management
• Archive and Analysis in-place
Ingest
ObjectFile
Direct Access
POSIX
Raw Data
Analysis
4
6. #ibmedge
Spectrum Scale 4.2.1 for Big Data Oceans
extending HDFS for the enterprise
An enterprise HDFS filesystem
• Expand use of Shared Nothing Clusters
• Simplicity of Storage Rich Servers with
enterprise features
• Advanced Routing (AFM), encryption,
QoS, compression
• Mix cluster types
• Shared Nothing = traditional HDFS style
• Centralized Storage = traditional enterprise
Other clients
Storage Servers
Storage
Store Everywhere. Run Anywhere.
Standard
commands
& protocols
Shared Nothing Clusters
Application specifies
hdfs:///namenode:9001
5
7. #ibmedge
Spectrum Scale 4.2.1 for Big Data Oceans
extending HDFS across Clusters
Extending the Filesystem
• Run analytics across multiple HDFS
and/or Spectrum Scale clusters
• No need to move the data
• Build Data Oceans on demand
Store Everywhere. Run Anywhere.
Disk
IBM Spectrum Scale HDFS Transparency Connector
DiskDisk Disk
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn2.node.net:8020/hadoop/gpfshdfs://nn1.node.net:8020/hadoop/hdfs
Cluster X Cluster Y
6
8. #ibmedge
Use Case 1: Federate ESS with Existing HDFS Filesystem
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn1.node.net:8020/hadoop/hdfs
IBM Spectrum Scale HDFS Transparency Connector
hdfs://nn2.node.net:8020/hadoop/gpfs
Improve Hadoop Cluster
Utilization
• Manually move less frequently
accessed data to an ESS tier
• Applications can still access data
that has been moved seamlessly
Commands:
$ hadoop distcp viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
$ Hadoop fs rm viewfs://clusterX:/hadoop/hdfs/file1
7
9. #ibmedge
Use Case 2: Federate ESS with Existing a Spectrum Scale
Filesystem
/gpfs/fs1
Extending a Spectrum Scale
Filesystem
• Add an ESS tier to an existing FPO
cluster and use ILM policies to migrate
data to ESS tier
• Data is still accessible from FPO cluster
rule 'FPO_USE' SET POOL 'fpodata' REPLICATE (2) FOR FILESET ('fpodata')
rule 'FPO_TO_SHARESTORAGE' MIGRATE FROM POOL 'fpodata' TO POOL 'datapool'
where CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL '10' MINUTES
rule default SET POOL 'datapool'
/gpfs/fs2
Single Name Space
8
11. #ibmedge
IBM Spectrum Protect
Use Case 3: Spectrum Scale Provides Easy Integration with
Enterprise Backup Tools
IBM Spectrum Scale HDFS Transparency Connector
Protecting Business Data
• Use ESS warm data tier with
Spectrum Protect for backup
• Simplified backup administration
tools
• Scalable performance
• Optimized data protection
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn1.node.net:8020/hadoop/hdfs hdfs://nn2.node.net:8020/hadoop/gpfs
10
12. #ibmedge
Use Case 4: Spectrum Scale Provides Easy Integration with
Enterprise Archiving Tools
Protecting Business Data
• Use ESS warm data tier with
Spectrum Archive to tape
• Powerful policy engine
• Information Lifecycle Management
• Fast metadata ‘scanning’ and data
movement
• Automated data migration to based on
threshold
• Users not affected by data migration
• Single namespace
IBM Spectrum Scale HDFS Transparency Connector
viewfs://clusterX:/hadoop/hdfs/file1
viewfs://clusterY:/hadoop/gpfs/file1
hdfs://nn1.node.net:8020/hadoop/hdfs hdfs://nn2.node.net:8020/hadoop/gpfs
11
IBM Spectrum Archive
13. #ibmedge
IBM Spectrum Protect
Use Case 5: Spectrum Scale Provides Easy Integration with
Enterprise Backup and Archiving tools
/gpfs/fs2
Protecting Business Data
• Optionally Spectrum Protect and
Spectrum Archive can be used
directly with Spectrum Scale FPO
/gpfs/fs1
Single Name Space
12
IBM Spectrum Archive
14. #ibmedge
Client Use Case 1: Unified Analytic/Workflow Pipelines
13
Ingest Analyze Export Visualize
POSIX HDFS NFS Object
RDBMSPOSIX
Data LakeHDFS
AnalyzeHDFS DashboardPOSIX
ReportPOSIX SMB Share
NFS
Object
DB2 on Shared Nothing Cluster (FPO)
Hadoop on Shared Nothing Cluster (FPO)
Storage
ESS
Structured data warehouse
Warehouse extension for
unstructured data
Compute Cluster
SAS Analytics Dashboard &
Reporting
15. #ibmedge
Client Use Case 2: Life Sciences with HPC and Hadoop/Spark
14
Storage Storage Storage
ESS based
shared storage
cluster
HPC Compute Cluster
File A Event on shared
pool
And
File D Event on
FPO pool
File B on shared
pool
LSF Job 2
LSF Hadoop Job
File F on FPO pool
LSF Spark Job
LSF Job 1
File C on FPO pool
File E on FPO pool
LSF Job 5
File F replicated to
a remote Spectrum
Scale Server
16. #ibmedge
Client Use Case 3: HPC and Data Analytics
15
Storage Storage Storage
ESS based
shared storage
cluster
HPC Compute Cluster
Ingest
POSIX
&
Object
Analyze
POSIX
iterate
Simulate
HDFS
Analyze
Analyze
HDFS
HDFS
17. #ibmedge
Summary: Big Data Oceans extending HDFS across Clusters
16
Unified Data
Repository, Support
Multiple Analytics
Federate ESS with
Existing HDFS
Filesystem
Federate ESS with
Existing Spectrum
Scale Filesystem
HDFS Transparency Connector Single Name Space
Improve Hadoop Cluster Utilization
• Manually move less frequently
accessed data to an ESS tier
• Applications can still access data that
has been moved seamlessly
Expand use of Shared Nothing
Clusters
• Simplicity of Storage Rich Servers with
enterprise features
• Advanced Routing (AFM), encryption,
QoS, compression
• Mix cluster types
• Backup and Archive Support
Extending the Filesystem
• Run analytics across multiple HDFS
and/or Spectrum Scale clusters
• No need to move the data
• Build Data Oceans on demand
Disk
HDFS Transparency Connector
DiskDisk Disk
18. #ibmedge
Spectrum Scale User Group
• The Spectrum Scale User Group is free
to join and open to all using, interested
in using or integrating Spectrum Scale.
• Join the User Group activities to meet
your peers and get access to experts
from partners and IBM.
• Next meetings:
- APAC: October 14, Melbourne
- Global at SC16 : November 13 1pm to 5pm, Salt Lake City
• Web page: http://www.spectrumscale.org/
• Presentations: http://www.spectrumscale.org/presentations/
• Mailing list: http://www.spectrumscale.org/join/
• Contact: http://www.spectrumscale.org/committee/
• Meet Bob Oesterlin (US Co-Principal) at Edge2016: Robert.Oesterlin@nuance.com
21. #ibmedge
Notices and Disclaimers Con’t.
20
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not
tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the
ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual
property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,
FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG,
Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®,
StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business
Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.