How to integrate OpenStack Swift to your "legacy" system

Copyright © 2016 NTT DATA Corporation
April 26, 2016
NTT DATA Corporation
@ OpenStack Summit 2016 Austin
How to integrate OpenStack Swift to your "legacy"
system

2Copyright © 2016 NTT DATA Corporation
Disclaimer
• Any product name, service name, software name and other marks are
trade mark or registered mark of corresponding companies.
• This presentation is in a purpose of providing the knowledge gained
from our system integration projects using swift.
• A presenter and NTT DATA Corporation provide information in as-is
basis and have no responsiveness for results that you got according to
information in this presentation material.

 Who are we?
• Takashi Kajinami, Masaaki Nakagawa, Masahiro Ikeda
• We are all platform engineers
 OSS professional sector in NTT DATA
 Our main theme
• Cloud platform using OpenStack
• Private cloud by OpenStack
• Cloud storage by Swift or Sheepdog
About us
Cloud Technologies Data Processing Technologies

4Copyright © 2016 NTT DATA Corporation 4Copyright © 2016 NTT DATA Corporation
Agenda
• What is swift and why we use swift?
• What is the problem?
• How to solve the problem?
• Case 1: Use swift as a backup storage
• Case 2: Provide conventional file server on Swift
• Conclusion

OpenStack Swift ~ Distributed object storage ~
• Storage project in OpenStack project
• Distributed object storage which has a similar feature as Amazon/S3
• REST API(GET/PUT/…) on HTTP Protocol
① Block Storage (Cinder)
② Object Storage (Swift)
The image is quoted from http://www.openstack.org/software/

3 Key features of Swift
1
2
3
Durability
Scalability
Openness

Durability
• Protect data from various defeats
Datacenter
Rack
Node
Disk

Scalability
• Flexibly adopt to the growth of data
1y 2y 5y
10TB
100TB
3PB?
150TB?

Openness
• Free from limited model and maintenance period of hardware
A B C
Vendor A B C

3 key features of Swift
Durability Scalability Openness
Datacenter
Software
Hardware
Disk
1y 2y 5y
10TB
100TB
3PB?
150TB?
A B C
Vendor A B C
Protect data from
various defeats
Flexibly adopt
to the grow of data
No vendor lock-in

Swift is great!
Swift is the best solution to realize durable, scalable
and open storage platform
But, in many projects, we faced a difficulty when
trying to integrate swift to the system

What happens with us
One day, Sales guy comes, asking me.
Hey, our customer is now planning the replacement of their storage,
and looking for the way to save their data from disasters.
Do you have any good solutions?
OK. Swift is the best solution for that! It has excellent “global cluster”
feature, and realizes geographically replicated storage!
Great! How can we use that storage? Does it provide NFS or iSCSI?
No. Swift provides REST API over HTTP protocol.
Oh... They want to replace only their storage and keep their existing
applications, which require conventional file system interface
OK. Give me a time. I’ll find a solution for that.

What is the issue?
Swift is a very good storage solution
to realize disaster recovery (Geographically replicated)
massive scalability (more than PB)
low operational cost (self-healing for disk failures)
so on.
However,
Many customers don’t change their applications at the same time
and
These applications don’t support REST API in many cases.

Copyright © 2016 NTT DATA Corporation 15Copyright © 2016 NTT DATA Corporation 15
How to solve the problem?

Overview
We’d like to introduce two use case in this presentation
Case1: Using swift as backup storage of legacy backup server
Case2: Use swift as storage for file server

○Overview
• Upgrading backup system for public agency
• This system backups various size data
ex) user data, user virtual machine image, VM host image
• Supporting contract of backup storage of this system is being expired
• This project’s goal is changing backup storage device
○Requirements
• Backed-up data size is from hundreds TB to 1 PB
• Backed-up data is stored from multi DCs
• These DCs are located far away each other
• Backup data should be stored redundantly
• Backup storage should be placed in multi datacenter
○Notes
• Backup system uses proprietary backup software
• Strongly required that backup software is kept to use. We should not
change anything without storage
Case1: Using swift as backup storage of legacy backup server

Points for design
• Backed-up data size is from hundreds TB to 1 PB
→ Scalable
• Data is come from multi DCs / DCs are located far away each other
→ Multi region
• Backup data should be stored redundantly / Backup storage should be
placed in multi datacenter
→ Global cluster
Swift is good for this customer.

System design image from points
Region A
Proxy
Backup server
Proxy
Backup server
Read/Write affinity
User system User system
Storage Storage Storage Storage Storage Storage
Replication
Read/Write affinity
This figure shows system design satisfied points
Site A Site B
Region B

That design was difficult to realize
Tape device
swift API not
supported
Backup server
Block storage
Backup server
Block storage
/ server-
specific
virtual tape
device
required
User data / VM / Virturization host
Because of low compatibility of backup software with swift, we have
some issue to clear for using swift.
We want to
change to this
part to swift

Experimentally approach – mount swift to file system
Mounted swift as block storage by using cloudfuse(※)
※https://github.com/redbo/cloudfuse
Backup server
Backup server
proxy
Storage
Mount point (ex: /mnt/swift)
Storage Storage
Tape device
Tape
device
Tape
device
Tape
device
cloudfuse

Issues of mounting swift for legacy backup system
Issue Reason Achieve plan
Fail initializing virtual
tape device
• Tape device is renamed
during initializing
• Cloudfuse does not
support rename
operation
• Improve cloudfuse to
support rename
• Select other component
like cloudfuse
• Workaround : create
device at other location
Swift doesn’t support
append operation
• Backup server append
backup data to virtual
tape device
• Using DLO plugin
There are two issues to proceed backup process.

Summary of case 1
【Requirement】
• We’d like to use swift as backend storage of Legacy backup system.
【Issues】
• Backup software doesn’t support swift API
• Backup software need to locate Virtual tape device on block device
required.
【Attempt】
• Mounting swift to file system by using cloudfuse
【Result】
We have gotten succeeded pass of backup.
• Workaround to avoid failure of initializing virtual tape device
• Using DLO plugin to support append operation

Case2：Use swift as storage for file server
Overview
• Renew file server which has access from users in
geographically separated areas
Requirements
1. Users store data in local storage without any overhead for replication
2. All users share same files
3. Support file-based protocols to write various size of files

Req1. Users store data in local storage
• To keep the latency of access low
• Since users are distributed, storages need to locate in geographically
separated area too

Req2. All users share same files
• Bundle local storages in one virtual storage to share the same files.
Virtual One Storage

Req2. All users share same files
• After a file is written in a certain storage, it is replicated to all other
storages
File
File
File File
File
File
File
②replication①write ③read
File

Req3. Support file-based protocols to write various size of files
• To reuse the existing many legacy applications
• To avoid high costs and risks while developing new applications
• To use as file server, accessing small files in low latency is required
Local
Storage
CIFS
NFS
File
FileFileFile
Many small files(～MB)
Some big files(MB～)

Options of storage software

Swift has two issues to realize requirements
Issues

Issue1. file system Interface
• Users need to access by file-based protocols such as CIFS/NFS
• But, Swift doesn’t support any protocols excepting REST API
Swift
CIFS
NFS
NG
(Only REST API)

Issue2. small file optimization
• To use as file server, accessing small files in low latency is required
• But, Swift is not good at processing many small files because swift is
optimized for bigger file to store much data
swift
REST
API
Slow
FileFileFile
Many small files(～MB)

Solutions for the two issues
1 file system interface
2 small file optimization
• Cloud Storage Gateway
• Data management with
small block size
• Storage cache
Issues Solutions

Solution1. file system interface
• Cloud Storage Gateway
– Gateway software which translates cloud storage APIs such as REST API
and standard file-based interface such as CIFS/NFS seamlessly
Swift
Cloud
Storage
Gateway
CIFS
NFS
REST
API

Solution2. small file optimization
• Data management with small block size
– Optimize the latency to access many small files randomly
• Storage cache
– CSG’s local storage to store data which users accessed
– If CSG has the file a user accesses in storage cache already, it returns the
file to the user directly
– Since there is no access to swift, the latency is very low
Swift
Cloud
Storage
Gateway
cache
①read request
②response
File A
File A
Don’t
access

Swift + CSG suit the requirements?
Solved
• Although CSG solves the two issues, it is important to confirm if CSG
has the feature of clustering
?
(depends CSG)

We used Fobas CSC
• Fobas CSC(Cloud Storage Cache)
• One of CSG (※1)
• Proprietary software
• Features
※1 http://www.fobas.jp/
1 Data management with small block size
2 Storage cache
3 Loosely Cluster which enable to share same files in multiple locations
4 ACL which enable to control access such as LDAP and Active/Directory
5 Support many protocols of CIFS, NFS, WebDAV, FTP and iSCSIs
6 High Security encrypting data with AES256
and more …

Loosely Cluster
• Bundles file systems of Fobas CSC with virtual one file system
• Enables to share same files in multiple locations in CSG layer
Swift
Fobas
CSC
[ Europe ]
Swift
[ North America ]
Fobas
CSC
Replicate
meta data
Loosely Cluster
Replicate data
Global Cluster

Swift + Fobas CSC can realize every requirements
?
(depends CSG)

Architecture
CSG CSG
CSG
CSG
• Each location has Swift + Fobas CSC
• Since they are clustered and replicate data, users can share same files.
Swift
Swift
Swift
Swift
File
File
File
File
File
SwiftFile
Swift
CSG
File
File
CSG

Performance
• Overview of the performance test
- data is only on storage cache which is SSD
- The benchmark is fileserver workload in filebench
(https://sourceforge.net/projects/filebenchs)
• The performance is good for file server when the protocol is nfs
- Compare with the performance of ordinary fileserver
- ※ ◎: better, ○: almost same, △: worse
Average
Throughput[opt/sec]
Average
Latency[msec]
Ordinary Fieserver 2000～5000 5～20
Swift + Fobas CSC(NFS) ◎ ○
Swift + Fobas CSC(CIFS) △ △

Three limitations
1. The performance of CIFS protocols isn’t so good
• I think that the performance is not so tuned so far, since CSG developers
tend to focus on flexibility, scalability and availability
• As CSG market becomes matured, this problem would be solved
2. The Performance becomes much worse when data is not on cache
• The latency increases by that of swift because CSG gets data from swift
• Design size of storage cache carefully. To compare performance and cost
efficiency while considering the tendency of data usage
3. There is a good and bad point of integration
• The good point is that we can get software’s specific features such as
Global Cluster
• The bad point is that the availability becomes lower because possible
failure points increase while the number of components are increasing

Swift has much possibilities to expand the number of users
• The market size of CSG is increasing rapidly
– $11M(2010) → $74M(2012) → $860M?(2016) (※2)
– The reasons are
• Want to store increasing data in inexpensive storage
• Want to continue to use legacy system
• The demand of the combination of Swift and CSG probably increase
– Although proprietary software have many useful features, it is not optimized
for swift
– How about developing CSG optimized for swift?
(for example, to develop a plugin of NFS-Ganesha or SMB!?)
※2 http://www.technavio.com/report/global-data-center-cloud-storage-gateway-market

Summary of second use case
• Integrating swift as back-end storage of file server which has access
from users in geographically separated areas
• Because swift doesn’t support file-based protocols and is not good at
processing small files in low latency, we join together swift and CSG to
realize requirements. The performance is enough good.
• Since swift is superior to others in global cluster feature, to realize file
server which has the feature, Swift and CSG is very good combination.

Cloud gateway sometimes help us
• Case 1: Storage for existing backup solution
cloudfuse: REST API -> FUSE (Linux filesystem) -> Backup solution
• Case 2: File sharing service between some geo locations
FOBAS CSC: REST API -> FOBASE (CIFS/NFS) -> Clients(Windows/Mac)
Cloud gateway is very good starting point to integrate swift into your
existing system
But, it often put a limitation on the benefit given by swift
• The more layers we put, the more the number possible failure points appear
• Cloud native applications get all benefits of swift

Future vision
• Define and share some successful migration stories
• Starting point:
• Gateway solutions to integrate swift to the existing applications
• Ideal goal:
• Cloud-native solutions
-> At last, you can bring all the benefits of cloud technologies into you system.
• Improve cloud gateway
• Improve stability and usability
• Scalable and Highly available NFS/CIFS storage using nfs-ganesha and
CIFS clustering

48Copyright © 2016 NTT DATA CorporationCopyright © 2016 NTT DATA Corporation
Any product name, service name, software name and other marks are trade mark or registered mark of
corresponding companies.
記載されている会社名、商品名、又はサービス名は、各社の登録商標又は商標です。

Why “Multi Endpoints” is important?
• If not supported
• User must access specific endpoint
• The latency of many people access is very high
NG
Only endpoint

Why “asynchronous replication” is important?
• If not supported (synchronous replication only)
• User have to wait the replication
• The latency is much higher than that of asynchronous
②Synchronous
replication
① write
③ ack
Slow

How to integrate OpenStack Swift to your "legacy" system

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to integrate OpenStack Swift to your "legacy" system

Similar to How to integrate OpenStack Swift to your "legacy" system (20)

Recently uploaded

Recently uploaded (20)

How to integrate OpenStack Swift to your "legacy" system

Editor's Notes