More Related Content Similar to S de2784 footprint-reduction-edge2015-v2 (20) More from Tony Pearson (20) S de2784 footprint-reduction-edge2015-v21. © Copyright IBM Corporation 2015
Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
sDE2784
Data Footprint Reduction –
Understanding IBM Storage Efficiency
Options
Tony Pearson
Master Inventor and Senior IT Specialist
IBM Corporation
2. © Copyright IBM Corporation 2015
1
Abstract
Data Footprint Reduction is
the catchall term for a
variety of technologies
designed to help reduce
storage costs. This session
will cover four techniques for
data footprint reduction: thin
provisioning, space-efficient
snapshots, data
deduplication and real-time
compression.
Come to this session to
learn how these
technologies work, which
IBM storage products
provide these capabilities
and how they will benefit
your data center.
3. © Copyright IBM Corporation 2015
This week with Tony Pearson
2
Day Time Topic
Monday 10:30am Software Defined Storage -- Why? What? How? (repeats Tuesday)
03:00pm IBM's Cloud Storage Options (repeats Wednesday)
04:30pm Data Footprint Reduction – Understanding IBM Storage Efficiency Options
Tuesday 10:30am Software Defined Storage -- Why? What? How?
12:30pm What Is Big Data? Architectures and Practical Use Cases
01:45pm IBM Smarter Storage Strategy (repeats Wednesday)
Wednesday 09:00am New Generation of Storage Tiering: Less Management Lower Investment and
Increased Performance
10:30am IBM Smarter Storage Strategy
12:30pm IBM's Cloud Storage Options
01:45pm IBM Spectrum Scale (Elastic Storage) Offerings
Thursday 12:30pm The Pendulum Swings Back -- Understanding Converged and
Hyperconverged Environments
Friday 09:00am IBM Spectrum Storage Integration with OpenStack
5. © Copyright IBM Corporation 2015
4
Why Space is Over-Allocated
Scenario 1
Space requirements under-
estimated
Running out of space requires
larger volume
New request may take weeks to
accommodate
Application outage if not addressed
in time
Data must be moved to the larger
volume
Application outage during data
movement
Scenario 2
• Space requirements
over-estimated
• Capacity lasts for years
• No data migration
• No application outages
• No penalties
When faced with this
dilemma,
most will err on the side of
over-estimating
6. © Copyright IBM Corporation 2015
5
Fully Allocated vs. Thin Provisioning
Host sees fully
allocated amount
Actual data written
Allocated but unused space
dedicated to this host,
wasted space
Host sees full
virtual amount Actual data written
Empty space available to others
Physical Space Allocated
7. © Copyright IBM Corporation 2015
6
Blocks, Grains, Extents and Volumes/LUNs
Host sees a volume
or LUN that consists
of blocks numbered
0 to nnnnnnnnnn
Extent – Allocation Unit
One or more grains
Volume/LUN – one or more extents
Grain – range of 1 or more blocks
Block – typically 512 or 4096 bytes
8. © Copyright IBM Corporation 2015
7
Thin Provisioning – Coarse and Fine Grain
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9
9
5
0
0 1 2 3 4 5 6 7 8 9
Block 0,0, 55, and 99 written
Fully Allocated, all 10 extents allocated
Coarse-Grain, only 3 extents allocated
Fine-Grain, only 1 extent allocated
Fully Allocated Fine-GrainCoarse-Grain
Grain 54-55
Grain 00-01
Grain 98-99
Grain 90-99 = extent
9. © Copyright IBM Corporation 2015
8
How IBM has Implemented Thin Provisioning
DS8000 XIV SVC and
Storwize
DCS3700,
DCS3860
Type Coarse Fine Fine Fine
Allocation
Unit
1 GB 17 GB 16MB to
8GB
4 GB
Grain size 1 MB 32-256 KB 64 KB
10. © Copyright IBM Corporation 2015
9
Thin Provisioning
Advantages
Just-in-Time increased
utilization percentage
Eliminates the pressure to
make accurate space
estimates
Dynamically expand volume
without impacting applications
or rebooting server
Reduces the data footprint
and lowers costs
Shifts focus from volumes to
storage pool capacity
Objections
Not all file systems cooperate or
friendly
Deletion of files does not free
space for others
“sdelete” writes zeros over
deleted file space
Other implementations may
impact I/O performance
May not support same set of
features, copy services, or
replication
“Selling more tickets than seats”
12. © Copyright IBM Corporation 2015
11
Space-Efficient Copies
Destination 1
100 GB allocated
40 GB written
300 GB
30 GB
Traditional Copies
Space-Efficient Copies Typical 10%
Source
Destination 2 Destination 3
13. © Copyright IBM Corporation 2015
12
Cascaded FlashCopy:
Copy the copies
Up to 256
targets
Source
Volume
FlashCopy
relationships
Start incremental FlashCopy
Data copied as normal
Some data changed by apps
Start incremental FlashCopy
Only changed data copied
by background copy
Later …
Disk0
Source
Map 1 Map 2
Map 4
Disk1
FlashCopy
target of Disk0
Disk2
FlashCopy
target of Disk1
Disk4
FlashCopy
target of Disk3
Disk3
FlashCopy
target of Disk1
Incremental FlashCopy:
Volume level
point-in-time copy
FlashCopy:
Volume level
point-in-time copy
with any mix of thin
and fully-allocated
Storwize family - FlashCopy
14. © Copyright IBM Corporation 2015
13
Space-efficient Copies
Advantages
Supports both
Fully-allocated and
Thin-Provisioned sources
Reduces the data footprint and
lowers costs
Allows you to keep more copies
online
Allows you to take copies more
frequently
Can be used as checkpoint
copies during batch processing
Objections
Some implementations may
impact I/O performance
Other implementations require
that you estimate the maximum
percentage changed
Typically 10-20 %
Exceeding the reserved space
invalidates destination copy
16. © Copyright IBM Corporation 2015
15
Data deduplication reduces capacity requirements by only
storing one unique instance of the data on disk and creating
pointers for duplicate data elements
1. Data elements are
evaluated to
determine a unique
signature for each
2. Signature values are
compared to identify
all duplicates
3. Duplicate data elements
are eliminated and
replaced with pointers to
reference element
Storage Optimization: Data Deduplication
17. © Copyright IBM Corporation 2015
Performance
Measured performance
over 2,800 MB/s inline deduplication
backup rate
Over 3600 MB/s restore rate
Capacity
Up to 1 PB physical capacity per cluster
Reduces required disk capacity by up to
25 times
Enterprise-Class Data
Integrity
Binary diff process during dedupe
designed for the highest data integrity
Active-active cluster eliminates
single points of failure
High Availability Cluster
ProtecTIER Data Deduplication Advantages
16
18. © Copyright IBM Corporation 2015
Repository
Backup Servers
FC Switch TS7650G
HyperFactor
Memory
Resident Index
“Filtered” Data
Existing Data
New Data Stream
Storage
Arrays
Only 4GB needed to map
1PB of physical disk!
IBM ProtecTIER – HyperFactor algorithm
17
19. © Copyright IBM Corporation 2015
18
Physical
capacity
ProtecTIER
Gateway
Backup
Server
Backup
Server
Represented capacity
Primary Site
Physical
capacity
ProtecTIER
Gateway
Backup
Server
Secondary Site
IP-based
WAN link
Tape
library
Virtual
cartridges can
be copied to
physical tape
at DR site
Deduplication
enables a large
amount of data to be
replicated with
significantly less
bandwidth
Significantly Reduces Replication Bandwidth
21. © Copyright IBM Corporation 2015
20
Virtual Desktop Infrastructure (VDI)
ILIO Diskless VDI and
XenApp
ILIO Diskless VDI and
XenAppILIOILIO
Application Analysis
Inline Deduplication
Content-Aware
IO Processing
Compression
Server Hardware
Hypervisor (ESX, XenServer, Hyper-V)
Coalescing
(IO Blender Fix)
NFS, iSCSI, Fibre Channel or Local DiskNFS, iSCSI, Fibre Channel or Local Disk
NFS or iSCSINFS or iSCSI
RAM as
cache
VDI represents only 5% of Flash deployment
capacity*
Deduplication and Compression can achieve
90% savings for VDI workloads
Atlantis ILIO™ Server-Side Optimization
Software
• Eliminates the storage problem at the
source
• Lower cost per desktop with better
performance
Less than $200 stateless desktop
Less than $300 persistent desktop
• Proven at scale in the largest desktop
virtualization deployments in the world
• Enterprise-class reliability with automated
deployment and HA/DR
* Source: The Adoption of and Leading Use Cases for Solid State Storage by Enterprise Customers, IDC
September 2013, IDC #242808
IBM FlashSystem
22. © Copyright IBM Corporation 2015
21
Data Deduplication
Advantages
Designed for backups and VDI
Can offer up to 25x data footprint
reduction (96% savings)
Allows more backup copies to
remain on disk for faster restores
Reduces cost of disk backup
repositories
Available with a variety of
interfaces, including VTL,
Symantec OST, CIFS and NFS
Objections
Dealing with Hash Collisions
May require byte-for-byte
comparisons or keeping
secondary copy of data
Hash-based systems do not
scale
Other systems have slow
restores
Re-hydrating data back to normal
Primary active data may not
dedupe very well
Your mileage may vary
24. © Copyright IBM Corporation 2015
23
Lossy vs. Lossless Methods
Lossy
• Used with music, photos, video,
medical images, scanned
documents,
fax machines
Lossless
• Used with databases, emails,
spreadsheets, office documents,
source code
Good
enough?
Exactly
the same
Compress
Decompress
does not return
data back to its
original contents
Compress
Decompress
returns data
back to its
original contents
25. © Copyright IBM Corporation 2015
24
How Compression Works
• Lempel-Ziv lossless compression builds a dictionary of
repeated phrases, sequences of two or more characters that
can be represented with fewer number of bits
• In the above excerpt from Lord of the Rings, all of the red text
represents repeated sequences eligible for compression
Source: The Lempel Ziv Algorithm, Christian Zeeh, 2003
26. 25
Data Footprint Reduction
Active Data Backup
Data
Real-time Compression 40-80%
Best
40-80%
20-30% 80-95 %
Best
Data
Deduplication
Real-Time Compression is a
method of reducing storage needs
by changing the encoding scheme
as the data is being read and
written
– Short patterns for frequent data
– Longer patterns for infrequent data
– Can achieve 40 to 80 percent
reduction in storage capacity for
active data
Data deduplication is a method of
reducing storage needs by
eliminating duplicate copies of data
– Store only one unique instance of the
data
– Redundant data replaced with pointer
– Can achieve 80 to 95 percent
reduction in storage capacity for
backup data
27. © Copyright IBM Corporation 2015
26
Compressed Volumes based on Thin Provisioning
Actual data written
Allocated but unused space
dedicated to this host,
wasted until written to
Full
Actual data written
Physical Space
Allocated
Thin Provisioning
Host sees full
virtual amount
Physical Space
Allocated, up to 80%
reduction from actual
data written
Actual
data
written
Thin Provisioning
with Compression
28. © Copyright IBM Corporation 2015
27
FIVO vs. VIFO
Fixed Input, Variable Output
• WAN transmission
• Sequential tape
• IBM Tivoli Storage Manager
• zip, tar, etc.
Variable Input, Fixed Output
Random Access Compression Engine™
(RACE)
• SAN Volume Controller
• Storwize V7000 and V7000 Unified
• FlashSystem V9000
• XIV Storage System
1
2
3
4
5
6
Data
1
2
3
4
5
6
1
2
3
4
5
6
Compressed
Data
2
1
3
4
5
6
Data
Compressed
Data
29. © Copyright IBM Corporation 2015
28
Traditional Approaches
A
D
B
MN
G H
C
F
I
File
New
Compressed
File
ABC DMN FGH I
Blocks Shift
Compression after Modification
Real-time Compression
File
Compressed
File
A
D
B
MN
G H
C
F
I
File
New
Compressed
File ABC DEF1
GHI MN
Identical Blocks
Compression after Modification
A
D
B
E
G H
C
F
I
ABC DEF GHI
The work to “update" a file may involve
many more I/Os
Data blocks shift
• Negative impact to deduplication
No notion of data location, data is
processed sequentially
The work to “update" a file about the
same or fewer I/O
Only modified block changed
• Enhances deduplication
Data location via map
Compression for Disk data
30. © Copyright IBM Corporation 2015
29
IBM Real-time Compression for File and Block level
For File and Block-level
access
• IBM Storwize V7000 Unified
For Block-only access
• SAN Volume Controller
• Storwize V7000
• FlashSystem V9000
• XIV Storage System – NEW
Storwize V7000
To estimate space savings for
file-level storage, use:
Real-time Compression
Appliance Scan Tool
To estimate space savings for
block-level storage, use:
Comprestimator Tool
Storwize V7000 Unified
31. © Copyright IBM Corporation 2015
IBM Real-time Compression – Estimated Savings
IBM’s Random-Access Compression Engine (RACE) delivers excellent
capacity savings for a variety of data types:
Databases (DB2, Oracle, etc.) ~ 80%
Virtual Servers
(Vmware, etc.)
Linux and Windows
Virtual guest images
50% to 70%
Microsoft Office
2003 ~ 60%
2007 or later ~ 20%
CAD/CAM Engineering drawings ~ 70%
IBM Comprestimator tool can be used to evaluate expected compression
benefits for specific environments
• This pre-sales tool is available to estimate compression savings, percentage savings
shown are typical results, based on client experiences, your mileage may vary.
• http://www14.software.ibm.com/webapp/set2/sas/f/comprestimator/home.html
45-day Free Trial of Compression available
Source: IBM internal tests and field resuls 30
32. Compression Acceleration Cards –
Intel® QuickAssist Technology
Intel QuickAssist technology integrated into new Compression Acceleration cards
Used to offload the LZ compression and decompression processing
Each node supports up to two Compression Acceleration cards
SVC uses 4 parallel compression threads per card
To use compressed volumes, nodes require at least:
SVC 2145-DH8 or next generation Storwize V7000
64GB of Cache Memory per node
One Compression Acceleration card
When compression is enabled
38GB is used as a Compression Cache
Optionally upgrade each node to contain second
Compression Acceleration card
Upgrade recommended when normal data working set > 32TB
31
33. Lower Cache
7.3.0 Software Stack
RAID
New Dual Layer Cache
Architecture
First major update to
cache since 2003
Flexible design for
plug and play style
cache algorithm
enhancements in the
future
“SVC” like L2 cache
for advanced
functions
Upper Cache – simple
write cache
Lower Cache – algorithm
intelligence
Understands mdisks
Shared buffer space
between two layers
* Only 4F2 hardware limited to running no
later than 5.1 Software due to 32bit CPU
SCSI Initiator
Forwarding
Fibre Channel
iSCSI
FCoE
SAS
PCIe
Compression
Upper Cache
FlashCopy
Virtualization
Mirroring
Thin Provisioning
Forwarding
Forwarding
Easy Tier 3
Configuration
PeerCommunications
InterfaceLayer
Clustering
SCSI Target
Replication
New
New
New
32
34. Store more IOPS Response time
Real Time Compression
[RtC]
store more Limited effect Limited effect
Auto Tiering
[Easy Tier and Flash
Technology] No effect More IOPS Faster response
Turbo Compression
[RtC + Easy Tier and Flash
Technology] store more More IOPS Faster response
+
=
Turbo Compression may double the net usability of existing Infrastructures
Turbo Compression Explained
Turbo Compression tests
Oracle TPC-C (07/2013)
[2 % Flash Capacity]
4x
Compression
2.1 x
IOPS Throughput
½ x
Response time
at a fraction of the cost of traditional means
33
35. Turbo Compression for Tiered Flash/Disk Pools
•Easy Tier (no compression)
•1 Volume 100 GB
• 4% Flash (4GB) 23% of IOPS
(assumption : skew = 7)
HDD Tier: 77% of IOPS
•Compression (RtC)
(assumption: 66% savings)
• 12% compressed data fits in 4 GB
• 12% data 60% of IOPS
• HDD Tier: 40% of IOPS
•Turbo Compression
• Pool IOPS capability nearly
doubled without adding any Flash
0%
20%
40%
60%
80%
100%
120%
0% 20% 40% 60% 80% 100%
I
O
%
Go %
RtC
4%
23%
60%
12% Capacity %
Cumulative IOps vs. Capacity
TC
34
36. © Copyright IBM Corporation 2015
35
Fully-allocated
or Thin-provisioned
volume
Volume
mirror
Only non-zero blocks copied
Copy 0 Copy 1
Compressed
volume
Compressing Existing Data
37. © Copyright IBM Corporation 2015
XIV Compression & Snapshot Views
Comprestimator tool built into IBM XIV 11.6 GUI
Right click to compress volume
Snapshot usage now reporting per volume
36
38. © Copyright IBM Corporation 2015
37
Compression
Advantages
Can be used for data
transmission, tape and disk data
Supports both file-based and
block-based disk storage
Real-time compression can be
used with Databases, CAD/CAM
and Virtual Machines with no
impact to application performance
Can offer up to 80% data footprint
reduction savings
Real-time Compression is
“Dedupe-Friendly” and combines
well with Thin Provisioning
Objections
Some implementations are post-
process
Stores uncompressed data first,
compresses later
Other implementations impact
application performance and/or
consume substantial CPU
resources
Benefits vary by data type, and
whether applications do their own
compression or encryption
Your mileage may vary
39. Summary
• Data Footprint Reduction technologies
have been around for many years
• Algorithms are stable, mature, and
well-understood by the IT industry
• Data is returned byte-for-byte identical
to what was originally stored
• Implementations between vendors and
products can vary greatly
• IBM’s implementations tend to have
faster performance, offer better
scalability, are easier to use and less
expensive TCO
40. © Copyright IBM Corporation 2015 39
Some great prizes
to be won!
Please fill out an evaluation!
Session: sDE2784
42. © Copyright IBM Corporation 2015 41
IBM Tucson Executive Briefing Center
• Tucson, Arizona is home for
storage hardware and software
design and development
• IBM Tucson Executive
Briefing Center offers:
• Technology briefings
• Product demonstrations
• Solution workshops
• Take a video tour
• http://youtu.be/CXrpoCZAazg
43. 42
About the Speaker
Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined
IBM Corporation in 1986 in Tucson, Arizona, USA, and has been there ever since. In his current role, Tony presents briefings
on storage topics covering the entire System Storage product line, and topics related to Cloud, Analytics and Social media. He
interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for
IBM’s integrated set of storage software, hardware and virtualization products.
Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners
every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine and #1
most read IBM blog on IBM’s developerWorks. The blog has been published into a series of books, Inside System Storage:
Volumes I through V.
Over the years, Tony has worked in development, marketing and consulting positions for various storage hardware and
software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in
Electrical Engineering both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and
software products.
9000 S. Rita Road
Bldg 9032 Floor 1
Tucson, AZ 85744
+1 520-799-4309 (Office)
tpearson@us.ibm.com
Tony Pearson
Master Inventor,
Senior IT Specialist
IBM System Storage™
44. © Copyright IBM Corporation 2015
Email:
tpearson@us.ibm.com
Twitter:
twitter.com/az99Øtony
Blog:
ibm.co/Pearson
Books:
www.lulu.com/spotlight/99Ø_tony
IBM Expert Network on Slideshare:
www.slideshare.net/az99Øtony
Facebook:
www.facebook.com/tony.pearson.16121
Linkedin:
www.linkedin.com/profile/view?id=103718598
Additional Resources from Tony Pearson
43
45. © Copyright IBM Corporation 2015
Continue growing your IBM skills
ibm.com/training provides a
comprehensive portfolio of skills and career
accelerators that are designed to meet all
your training needs.
• Training in cities local to you - where and
when you need it, and in the format you want
• Use IBM Training Search to locate public training classes
near to you with our five Global Training Providers
• Private training is also available with our Global Training
Providers
• Demanding a high standard of quality –
view the paths to success
• Browse Training Paths and Certifications to find the
course that is right for you
• If you can’t find the training that is right for you
with our Global Training Providers, we can help.
• Contact IBM Training at dpmc@us.ibm.com
44
Global Skills Initiative
46. © Copyright IBM Corporation 2015
Trademarks and Disclaimers
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library
is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel
Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a
registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a
registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell
Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and
the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance
characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such
products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not
tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the
supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with
respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a
good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending
upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that
an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business
Partner for the most current pricing in your geography.
Photographs shown may be engineering prototypes. Changes may be incorporated in production models.
© IBM Corporation 2015. All rights reserved.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the
World Wide Web at http://www.ibm.com/legal/copytrade.shtml.
ZSP03490-USEN-00
45