SlideShare a Scribd company logo
1 of 43
Download to read offline
Copyright © 2015 NTT DATA Corporation
2015/10/27
NTT DATA Corporation
Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail
Cloud System Powered by OpenStack Swift
2Copyright © 2015 NTT DATA Corporation
Abstract
Docomo mail is 24/7 cloud mail system which has accesses from over 20 million
people. This mail system stores user's mail archive in OpenStack Swift with Peta
Byte scale capacity deployed by NTT DATA.
We have been successfully operating this service since Sep 2014 without any
downtime. In this session, we'll present the actual issues and challenges we have
faced and conquered.
3Copyright © 2015 NTT DATA Corporation
Today’s contents and presenter
○Project Overview
Changes of Japanese mobile situation and abstraction of this project
– Project Manager : Sosuke Kakehi
○Migrate process
Process of migrating swift to existed docomo mail system
– OpenStack Swift Engineer : Masaaki Nakagawa
○Technical challenges
Swift technical challenges on this project
– OpenStack Engineer : Ryosei Kasai
○Operating session
Large scale swift operation
– OpenStack Swift Engineer : Masaaki Nakagawa
Copyright © 2013 NTT DATA Corporation 4
Project Overview
5Copyright © 2015 NTT DATA Corporation
Project Overview
1 NTT Docomo's Cloud Mail System
2 Project Background
3 Customer Requirements
6Copyright © 2015 NTT DATA Corporation
Cloud Mail System
NTT Docomo's Cloud Mail System - System Summary
• Docomo Mail - NTT Docomo’s Cloud Mail Service
• Over 20 million users
• Powered by OpenStack Swift
High Performance
Storage
Object Storage
OpenStack Swift
Later Mail
Tablet PCSmart Phone
Archived Mail
Stored to Swift
7Copyright © 2015 NTT DATA Corporation
NTT Docomo's Cloud Mail System - System Scale
• Geographically Distributed Swift Cluster
• Over 6.4 Peta Byte Logical Capacity
• Over Hundreds of Servers
Site2
Site3
Site4
Site1
Proxy Node
Storage Node
Region1
Storage Node
Region2
Storage Node
Region3
8Copyright © 2015 NTT DATA Corporation
Project Background
Shift from “Feature phone” to “Smart phone”
Service
Service
Service
Service
Smart Phone / Tablet PC
Service
Documents
Text
Photos
Music
MovieApplication
E-mail Data Size was increased
9Copyright © 2015 NTT DATA Corporation
Cost
CostCost
Cost CostCost
Project Background
High-end Storage
High-end Storage
High-end Storage
High-end Storage
High-end Storage
Extend the High-end Storage, extend, extend
= expensive cost, cost, cost
High-end Storage
10Copyright © 2015 NTT DATA Corporation
Customer Requirements
High
Availability
Low
Cost
High
Scalability
OSS(Software Storage) + IA Server
Disaster
Recovery
etc
Adopt OpenStack Swift
Copyright © 2013 NTT DATA Corporation 11
Migrate session
12Copyright © 2015 NTT DATA Corporation
Overview of migration session
NTT DOCOMO has launched docomo mail service since Oct 2013, and swift was
installed docomo mail system at Jan 2015. When we migrated swift to docomo
mail system, docomo mail did not stop user service.
In this section, I would like to introduce overall of docomo mail system and
migration process.
laterolder
Oct, 2013
docomo mail service in
Jan, 2015
Swift service in
May, 2014
test user start to use swift
Oct, 2015
General user start to
test use Swift
13Copyright © 2015 NTT DATA Corporation
swift
(archived mail holder)
High speed block storage
(later mail holder)
Swift migrate session
System construction overview
Docomo mail frontend server
(proxy of block storage and swift)
Proxy
Storage Storage Storage
Internet
archived
user mail
archived
user mail
archived
user mail
user mail user mail user mail
14Copyright © 2015 NTT DATA Corporation
Swift migrate session
Mail access flow
Docomo mail frontend server
(proxy of block storage and swift)
Block Storage
Proxy
Storage Storage Storage
Internet
archived
user mail
archived
user mail
archived
user mail
access device
user mail user mail user mail
User mail will be
archived/stored to swift
15Copyright © 2015 NTT DATA Corporation
Swift migrate session
System construction (before swift installed)
Docomo mail frontend server
Block Storage
Internet
archived
user mail
archived
user mail
user mail
16Copyright © 2015 NTT DATA Corporation
Swift migrate session
Migration 1st step – deploy swift and test
Docomo mail frontend server
Block Storage
Proxy
Storage Storage Storage
Internet
• Deploy swift
• Trouble test
• Tuning
archived
user mail
archived
user mail
user mail
17Copyright © 2015 NTT DATA Corporation
Swift migrate session
Migration 2nd step – copy test user’s archived mail
Docomo mail frontend server
Block Storage
Proxy
Storage Storage Storage
Internet
Copy test user’s archived mail
General user’s mail is
not copied
archived
user mail
archived
user mail
archived
user mail
archived
user mail
archived
user mail
user mail
18Copyright © 2015 NTT DATA Corporation
Swift migrate session
Migration 3rd step – copy general user’s archived mail
Docomo mail frontend server
Block Storage
Proxy
Storage Storage Storage
Internet
Move general user’s archived mail
keep all mail archive
against swift trouble
archived
user mail
archived
user mail
archived
user mail
archived
user mail
archived
user mail
user mail
19Copyright © 2015 NTT DATA Corporation
Swift migrate session
Migration 4th step – launch service
Docomo mail frontend server
Block Storage
Proxy
Storage Storage Storage
Internet
archived
user mail
archived
user mail
archived
user mail
archived
user mail
archived
user mail
user mail
20Copyright © 2015 NTT DATA Corporation
Conclusion of migrate session
• Firstly, docomo mail has only block storage
• We need to deploy and migrate swift with no down time
• To achieve it, we divide migrate to 4 steps
– Deploy
– Test user mail copy to swift
– General user mail copy to swift with remaining block storage
– System durability check
• We achieve no service down migration
As I said , in migrating, we achieve some technical challenges. Next session, Mr.
Kasai introduce it.
Copyright © 2013 NTT DATA Corporation 21
Technical session
22Copyright © 2015 NTT DATA Corporation
Our Technical Challenges
1 Durability assurance
2 Geographically distributed cluster
3 Quality
23Copyright © 2015 NTT DATA Corporation
Challenge 1: Durability assurance
• Quality requirement in Japan
• This system needs very high quality.
• Everything should be under control
• System design for normal situation
• System design for defeat situation
 Even on distributed system
• Analyze every behavior before building system
24Copyright © 2015 NTT DATA Corporation
Recovery test in variety of defeat pattern
• Variety of failure pattern
(1) The point of failure
• Disk, NIC, Process, Node, …
(2) The number of failures
• 1, 2, 3, 4, …
(3) The range of failures
• 1 node, multiple nodes/zones/regions, …
100s of test cases!!
Case #201
Proxy
Storage
Storage
Storage
Storage
Storage
Storage
Zone1 Zone2
…
Region 1
Case #201
Proxy
Storage
Storage
Storage
Storage
Storage
Storage
Zone1 Zone2
…
Region 1
Case #001
Proxy
Storage Storage Storage
Case #001
Proxy
Storage Storage Storage
Case #001
Proxy
Storage Storage Storage
Case #101
Proxy
Storage Storage Storage
Case #301
Proxy
Storage Storage Storage
Case #501
Proxy
Storage
Storage
Storage
Storage
Storage
Storage
Zone1 Zone2
…
Region 1
25Copyright © 2015 NTT DATA Corporation
Result of recovery test
• Extreme durability and recoverability of swift
• Swift rarely loses data in it. Only accurate snipe or great disaster can causes
data lost.
26Copyright © 2015 NTT DATA Corporation
private network
Site 3
Storage
Site 4
Storage
Site 2
Storage
Challenge 2: Geographically distributed cluster
• Geographically distributed swift cluster to realize disaster recovery
• Important points to evaluate global distribution
1. Client request
2. Durability
Site 1
Proxy
300km~300km~
300km~300km~
300km~
27Copyright © 2015 NTT DATA Corporation
Pseudo-global cluster
• Pseudo-global cluster with simulated network latency
• Proxy and 3 Storage regions placed in different locations
• 10~200msec latency between locations simulated by tc
• TL msec latency for one way, 2*TL msec latency for round trip
Proxy
Storage
region 1
Storage
region 2
Storage
region 3
10~200msec
latency
10~200msec
latency
10~200msec
latency
10~200msec
latency
10~200msec
latency
10~200msec
latency
Client
Proxy
Storage
region1
TLmsec
TLmsec
28Copyright © 2015 NTT DATA Corporation
2 points of Pseudo-global cluster testing
1. Client request
• Object PUT/GET/DELETE from client
• Error rate
• Turnaround time for 1 request
• Throughput
• Latency between proxy and storage
2. Durability
• Auto recovery by object-replicator
• Error rate
• Turnaround time of 1 sync process
• Throughput
• Latency between storages
Proxy
Storage
region 1
Storage
region 2
Storage
region 3
Storage
region 1
Storage
region 2
Storage
region 3
Client
Proxy
PUT GET
Client
29Copyright © 2015 NTT DATA Corporation
Test1: Client request
Object PUT/GET/DELETE from client
• No error caused by latency
• Degradation of turnaround time
• No throughput degradation for concurrent requests
latency
limitation of network bandwidth
PUT/GET
DELETE
Latency concurrency
ThroughputTurnaround time
30Copyright © 2015 NTT DATA Corporation
Test2: Durability
Auto recovery by object-replicator
• No error caused by latency
• Performance degradation of one process
• No throughput degradation for concurrent process
Latency concurrency
Throughput
latency
limitation of network bandwidth
Defeat
Recovery
Performance
31Copyright © 2015 NTT DATA Corporation
Challenge 3: Quality
1. Software Quality
• All processes work well ?
• Account / Container / Object
• server / replicator / updater / reaper
2. System Quality
• Our system is working well ?
• All nodes
• All APIs
32Copyright © 2015 NTT DATA Corporation
Software quality
1 Add process name checking into swift-init
2 Prevent redundant commenting by drive-audit
3 Remove invalid connection checking in db_replicator
4 Add timestamp checking in AccountBroker.is_status_deleted
5 Fix error log of proxy-server when cache middleware is disabled
 Source Code Analysis and Customize
• Official patch (below)
• Original patch
 Strict test all processes
and more …
Our official patch
33Copyright © 2015 NTT DATA Corporation
System quality
storage servers …
…
Tempest
proxy servers
checking tool
Test all nodes
• Automation testing tools for
1. APIs : All swift APIs, including error case
2. Nodes : All swift nodes
• Extended Tempest and checking tool
Test all APIs
34Copyright © 2015 NTT DATA Corporation
Our solutions
1 Durability assurance
2
Geographically
distributed cluster
3 Quality
Recovery test in variety of failure pattern
Performance test of frontend/backend
with pseudo-global swift cluster
・Source Code Analysis and Customize
・Automated testing
Challenge Solutions
Copyright © 2013 NTT DATA Corporation 35
Operating session
36Copyright © 2015 NTT DATA Corporation
Overview of operating session
Operation scheme of Docomo mail is high confidential.
We would like to introduce about NTT DATA swift solution's operation.
Docomo mail system uses NTT DATA swift solution with customizing.
37Copyright © 2015 NTT DATA Corporation
Operating session
Large scale system makes operation costly
Large scale Swift
scale outmanagementrepairtuning
38Copyright © 2015 NTT DATA Corporation
Operating session
Reduce operating work amount
Parallel access
(pssh / pscp)
Automatic deploy
(kickstart)
Tuning
(svn / puppet)
Master
repository
39Copyright © 2015 NTT DATA Corporation
Operating session
Reduce operation frequency
Disk failureNode downServer Process Down Backend process down
ex)auditor process
Service affect
40Copyright © 2015 NTT DATA Corporation
Operating session
Stop monitoring which low priority
Periodic performance check
monitoring alert
41Copyright © 2015 NTT DATA Corporation
Conclusion of operating session
• Swift is consisted by many nodes
• System operating costs of Swift tend to be costly
• NTT DATA has know-how to reduce swift operation cost
– Using operation parallelized tool
– Customizing for monitoring priority
– Change monitoring items to periodic check
42Copyright © 2015 NTT DATA Corporation
Conclusion of this presentation
We introduce usage, challenge, and operating OpenStack swift at docomo mail
service system
• System migration with no service down time
• Three technical achievement
• Reduce operating cost
Docomo mail has been service with no down time.
If you have something questions, please come to NTT booth.
○Attention
All company names, product names, and service names
mentioned are trademarks or registered trademarks of the
respective companies
Copyright © 2011 NTT DATA Corporation
Copyright © 2015 NTT DATA Corporation

More Related Content

What's hot

Cloudcamp Athens 2011 Presenting Heroku
Cloudcamp Athens 2011 Presenting HerokuCloudcamp Athens 2011 Presenting Heroku
Cloudcamp Athens 2011 Presenting Heroku
Savvas Georgiou
 

What's hot (20)

Delivering Container-based Apps to IoT Edge devices
Delivering Container-based Apps to IoT Edge devicesDelivering Container-based Apps to IoT Edge devices
Delivering Container-based Apps to IoT Edge devices
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
Node.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for ProductionNode.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for Production
 
Libpcap
LibpcapLibpcap
Libpcap
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home. SecurityPI - Hardening your IoT endpoints in Home.
SecurityPI - Hardening your IoT endpoints in Home.
 
Kernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesKernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologies
 
Trove Updates - Kilo Edition
Trove Updates - Kilo EditionTrove Updates - Kilo Edition
Trove Updates - Kilo Edition
 
Deploying datacenters with Puppet - PuppetCamp Europe 2010
Deploying datacenters with Puppet - PuppetCamp Europe 2010Deploying datacenters with Puppet - PuppetCamp Europe 2010
Deploying datacenters with Puppet - PuppetCamp Europe 2010
 
Instrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionInstrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in production
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
An Introduce of OPNFV (Open Platform for NFV)
An Introduce of OPNFV (Open Platform for NFV)An Introduce of OPNFV (Open Platform for NFV)
An Introduce of OPNFV (Open Platform for NFV)
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
VPC Implementation In OpenStack Heat
VPC Implementation In OpenStack HeatVPC Implementation In OpenStack Heat
VPC Implementation In OpenStack Heat
 
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
VMware ESXi - Intel and Qlogic NIC throughput difference v0.6
 
Cloudcamp Athens 2011 Presenting Heroku
Cloudcamp Athens 2011 Presenting HerokuCloudcamp Athens 2011 Presenting Heroku
Cloudcamp Athens 2011 Presenting Heroku
 
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution Environment
 

Similar to OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
Hirofumi Ichihara
 
Acceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to MinutesAcceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to Minutes
FileCatalyst
 

Similar to OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift (20)

How to integrate OpenStack Swift to your "legacy" system
How to integrate OpenStack Swift to your "legacy" systemHow to integrate OpenStack Swift to your "legacy" system
How to integrate OpenStack Swift to your "legacy" system
 
Media processing with serverless architecture
Media processing with serverless architectureMedia processing with serverless architecture
Media processing with serverless architecture
 
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
 
Distributed application usecase on docker
Distributed application usecase on dockerDistributed application usecase on docker
Distributed application usecase on docker
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
 
Next Steps in the SDN/OpenFlow Network Innovation
Next Steps in the SDN/OpenFlow Network InnovationNext Steps in the SDN/OpenFlow Network Innovation
Next Steps in the SDN/OpenFlow Network Innovation
 
44CON London 2015 - Inside Terracotta VPN
44CON London 2015 - Inside Terracotta VPN44CON London 2015 - Inside Terracotta VPN
44CON London 2015 - Inside Terracotta VPN
 
Effective IoT System on Openstack
Effective IoT System on OpenstackEffective IoT System on Openstack
Effective IoT System on Openstack
 
Building managedprivatecloud kvh_vancouversummit
Building managedprivatecloud kvh_vancouversummitBuilding managedprivatecloud kvh_vancouversummit
Building managedprivatecloud kvh_vancouversummit
 
Global EC Cluster Updates (OpenStack Mitaka Swift Design Summit)
Global EC Cluster Updates (OpenStack Mitaka Swift Design Summit)Global EC Cluster Updates (OpenStack Mitaka Swift Design Summit)
Global EC Cluster Updates (OpenStack Mitaka Swift Design Summit)
 
Building a Reliable Remote Communication Device with Java ME8 [CON2285]
Building a Reliable Remote Communication Device with Java ME8 [CON2285]Building a Reliable Remote Communication Device with Java ME8 [CON2285]
Building a Reliable Remote Communication Device with Java ME8 [CON2285]
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
 
Triton: A peer-assisted cloud storage systems
Triton: A peer-assisted cloud storage systems Triton: A peer-assisted cloud storage systems
Triton: A peer-assisted cloud storage systems
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-final
 
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming ReplicationBuilding Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication
Building Tungsten Clusters with PostgreSQL Hot Standby and Streaming Replication
 
SYN207: Newest and coolest NetScaler features you should be jazzed about
SYN207: Newest and coolest NetScaler features you should be jazzed aboutSYN207: Newest and coolest NetScaler features you should be jazzed about
SYN207: Newest and coolest NetScaler features you should be jazzed about
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
Acceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to MinutesAcceleration Technology: Taking Media File Transfers From Days to Minutes
Acceleration Technology: Taking Media File Transfers From Days to Minutes
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift

  • 1. Copyright © 2015 NTT DATA Corporation 2015/10/27 NTT DATA Corporation Know-how of Challlenging Deploy/Operation NTT DOCOMO's Mail Cloud System Powered by OpenStack Swift
  • 2. 2Copyright © 2015 NTT DATA Corporation Abstract Docomo mail is 24/7 cloud mail system which has accesses from over 20 million people. This mail system stores user's mail archive in OpenStack Swift with Peta Byte scale capacity deployed by NTT DATA. We have been successfully operating this service since Sep 2014 without any downtime. In this session, we'll present the actual issues and challenges we have faced and conquered.
  • 3. 3Copyright © 2015 NTT DATA Corporation Today’s contents and presenter ○Project Overview Changes of Japanese mobile situation and abstraction of this project – Project Manager : Sosuke Kakehi ○Migrate process Process of migrating swift to existed docomo mail system – OpenStack Swift Engineer : Masaaki Nakagawa ○Technical challenges Swift technical challenges on this project – OpenStack Engineer : Ryosei Kasai ○Operating session Large scale swift operation – OpenStack Swift Engineer : Masaaki Nakagawa
  • 4. Copyright © 2013 NTT DATA Corporation 4 Project Overview
  • 5. 5Copyright © 2015 NTT DATA Corporation Project Overview 1 NTT Docomo's Cloud Mail System 2 Project Background 3 Customer Requirements
  • 6. 6Copyright © 2015 NTT DATA Corporation Cloud Mail System NTT Docomo's Cloud Mail System - System Summary • Docomo Mail - NTT Docomo’s Cloud Mail Service • Over 20 million users • Powered by OpenStack Swift High Performance Storage Object Storage OpenStack Swift Later Mail Tablet PCSmart Phone Archived Mail Stored to Swift
  • 7. 7Copyright © 2015 NTT DATA Corporation NTT Docomo's Cloud Mail System - System Scale • Geographically Distributed Swift Cluster • Over 6.4 Peta Byte Logical Capacity • Over Hundreds of Servers Site2 Site3 Site4 Site1 Proxy Node Storage Node Region1 Storage Node Region2 Storage Node Region3
  • 8. 8Copyright © 2015 NTT DATA Corporation Project Background Shift from “Feature phone” to “Smart phone” Service Service Service Service Smart Phone / Tablet PC Service Documents Text Photos Music MovieApplication E-mail Data Size was increased
  • 9. 9Copyright © 2015 NTT DATA Corporation Cost CostCost Cost CostCost Project Background High-end Storage High-end Storage High-end Storage High-end Storage High-end Storage Extend the High-end Storage, extend, extend = expensive cost, cost, cost High-end Storage
  • 10. 10Copyright © 2015 NTT DATA Corporation Customer Requirements High Availability Low Cost High Scalability OSS(Software Storage) + IA Server Disaster Recovery etc Adopt OpenStack Swift
  • 11. Copyright © 2013 NTT DATA Corporation 11 Migrate session
  • 12. 12Copyright © 2015 NTT DATA Corporation Overview of migration session NTT DOCOMO has launched docomo mail service since Oct 2013, and swift was installed docomo mail system at Jan 2015. When we migrated swift to docomo mail system, docomo mail did not stop user service. In this section, I would like to introduce overall of docomo mail system and migration process. laterolder Oct, 2013 docomo mail service in Jan, 2015 Swift service in May, 2014 test user start to use swift Oct, 2015 General user start to test use Swift
  • 13. 13Copyright © 2015 NTT DATA Corporation swift (archived mail holder) High speed block storage (later mail holder) Swift migrate session System construction overview Docomo mail frontend server (proxy of block storage and swift) Proxy Storage Storage Storage Internet archived user mail archived user mail archived user mail user mail user mail user mail
  • 14. 14Copyright © 2015 NTT DATA Corporation Swift migrate session Mail access flow Docomo mail frontend server (proxy of block storage and swift) Block Storage Proxy Storage Storage Storage Internet archived user mail archived user mail archived user mail access device user mail user mail user mail User mail will be archived/stored to swift
  • 15. 15Copyright © 2015 NTT DATA Corporation Swift migrate session System construction (before swift installed) Docomo mail frontend server Block Storage Internet archived user mail archived user mail user mail
  • 16. 16Copyright © 2015 NTT DATA Corporation Swift migrate session Migration 1st step – deploy swift and test Docomo mail frontend server Block Storage Proxy Storage Storage Storage Internet • Deploy swift • Trouble test • Tuning archived user mail archived user mail user mail
  • 17. 17Copyright © 2015 NTT DATA Corporation Swift migrate session Migration 2nd step – copy test user’s archived mail Docomo mail frontend server Block Storage Proxy Storage Storage Storage Internet Copy test user’s archived mail General user’s mail is not copied archived user mail archived user mail archived user mail archived user mail archived user mail user mail
  • 18. 18Copyright © 2015 NTT DATA Corporation Swift migrate session Migration 3rd step – copy general user’s archived mail Docomo mail frontend server Block Storage Proxy Storage Storage Storage Internet Move general user’s archived mail keep all mail archive against swift trouble archived user mail archived user mail archived user mail archived user mail archived user mail user mail
  • 19. 19Copyright © 2015 NTT DATA Corporation Swift migrate session Migration 4th step – launch service Docomo mail frontend server Block Storage Proxy Storage Storage Storage Internet archived user mail archived user mail archived user mail archived user mail archived user mail user mail
  • 20. 20Copyright © 2015 NTT DATA Corporation Conclusion of migrate session • Firstly, docomo mail has only block storage • We need to deploy and migrate swift with no down time • To achieve it, we divide migrate to 4 steps – Deploy – Test user mail copy to swift – General user mail copy to swift with remaining block storage – System durability check • We achieve no service down migration As I said , in migrating, we achieve some technical challenges. Next session, Mr. Kasai introduce it.
  • 21. Copyright © 2013 NTT DATA Corporation 21 Technical session
  • 22. 22Copyright © 2015 NTT DATA Corporation Our Technical Challenges 1 Durability assurance 2 Geographically distributed cluster 3 Quality
  • 23. 23Copyright © 2015 NTT DATA Corporation Challenge 1: Durability assurance • Quality requirement in Japan • This system needs very high quality. • Everything should be under control • System design for normal situation • System design for defeat situation  Even on distributed system • Analyze every behavior before building system
  • 24. 24Copyright © 2015 NTT DATA Corporation Recovery test in variety of defeat pattern • Variety of failure pattern (1) The point of failure • Disk, NIC, Process, Node, … (2) The number of failures • 1, 2, 3, 4, … (3) The range of failures • 1 node, multiple nodes/zones/regions, … 100s of test cases!! Case #201 Proxy Storage Storage Storage Storage Storage Storage Zone1 Zone2 … Region 1 Case #201 Proxy Storage Storage Storage Storage Storage Storage Zone1 Zone2 … Region 1 Case #001 Proxy Storage Storage Storage Case #001 Proxy Storage Storage Storage Case #001 Proxy Storage Storage Storage Case #101 Proxy Storage Storage Storage Case #301 Proxy Storage Storage Storage Case #501 Proxy Storage Storage Storage Storage Storage Storage Zone1 Zone2 … Region 1
  • 25. 25Copyright © 2015 NTT DATA Corporation Result of recovery test • Extreme durability and recoverability of swift • Swift rarely loses data in it. Only accurate snipe or great disaster can causes data lost.
  • 26. 26Copyright © 2015 NTT DATA Corporation private network Site 3 Storage Site 4 Storage Site 2 Storage Challenge 2: Geographically distributed cluster • Geographically distributed swift cluster to realize disaster recovery • Important points to evaluate global distribution 1. Client request 2. Durability Site 1 Proxy 300km~300km~ 300km~300km~ 300km~
  • 27. 27Copyright © 2015 NTT DATA Corporation Pseudo-global cluster • Pseudo-global cluster with simulated network latency • Proxy and 3 Storage regions placed in different locations • 10~200msec latency between locations simulated by tc • TL msec latency for one way, 2*TL msec latency for round trip Proxy Storage region 1 Storage region 2 Storage region 3 10~200msec latency 10~200msec latency 10~200msec latency 10~200msec latency 10~200msec latency 10~200msec latency Client Proxy Storage region1 TLmsec TLmsec
  • 28. 28Copyright © 2015 NTT DATA Corporation 2 points of Pseudo-global cluster testing 1. Client request • Object PUT/GET/DELETE from client • Error rate • Turnaround time for 1 request • Throughput • Latency between proxy and storage 2. Durability • Auto recovery by object-replicator • Error rate • Turnaround time of 1 sync process • Throughput • Latency between storages Proxy Storage region 1 Storage region 2 Storage region 3 Storage region 1 Storage region 2 Storage region 3 Client Proxy PUT GET Client
  • 29. 29Copyright © 2015 NTT DATA Corporation Test1: Client request Object PUT/GET/DELETE from client • No error caused by latency • Degradation of turnaround time • No throughput degradation for concurrent requests latency limitation of network bandwidth PUT/GET DELETE Latency concurrency ThroughputTurnaround time
  • 30. 30Copyright © 2015 NTT DATA Corporation Test2: Durability Auto recovery by object-replicator • No error caused by latency • Performance degradation of one process • No throughput degradation for concurrent process Latency concurrency Throughput latency limitation of network bandwidth Defeat Recovery Performance
  • 31. 31Copyright © 2015 NTT DATA Corporation Challenge 3: Quality 1. Software Quality • All processes work well ? • Account / Container / Object • server / replicator / updater / reaper 2. System Quality • Our system is working well ? • All nodes • All APIs
  • 32. 32Copyright © 2015 NTT DATA Corporation Software quality 1 Add process name checking into swift-init 2 Prevent redundant commenting by drive-audit 3 Remove invalid connection checking in db_replicator 4 Add timestamp checking in AccountBroker.is_status_deleted 5 Fix error log of proxy-server when cache middleware is disabled  Source Code Analysis and Customize • Official patch (below) • Original patch  Strict test all processes and more … Our official patch
  • 33. 33Copyright © 2015 NTT DATA Corporation System quality storage servers … … Tempest proxy servers checking tool Test all nodes • Automation testing tools for 1. APIs : All swift APIs, including error case 2. Nodes : All swift nodes • Extended Tempest and checking tool Test all APIs
  • 34. 34Copyright © 2015 NTT DATA Corporation Our solutions 1 Durability assurance 2 Geographically distributed cluster 3 Quality Recovery test in variety of failure pattern Performance test of frontend/backend with pseudo-global swift cluster ・Source Code Analysis and Customize ・Automated testing Challenge Solutions
  • 35. Copyright © 2013 NTT DATA Corporation 35 Operating session
  • 36. 36Copyright © 2015 NTT DATA Corporation Overview of operating session Operation scheme of Docomo mail is high confidential. We would like to introduce about NTT DATA swift solution's operation. Docomo mail system uses NTT DATA swift solution with customizing.
  • 37. 37Copyright © 2015 NTT DATA Corporation Operating session Large scale system makes operation costly Large scale Swift scale outmanagementrepairtuning
  • 38. 38Copyright © 2015 NTT DATA Corporation Operating session Reduce operating work amount Parallel access (pssh / pscp) Automatic deploy (kickstart) Tuning (svn / puppet) Master repository
  • 39. 39Copyright © 2015 NTT DATA Corporation Operating session Reduce operation frequency Disk failureNode downServer Process Down Backend process down ex)auditor process Service affect
  • 40. 40Copyright © 2015 NTT DATA Corporation Operating session Stop monitoring which low priority Periodic performance check monitoring alert
  • 41. 41Copyright © 2015 NTT DATA Corporation Conclusion of operating session • Swift is consisted by many nodes • System operating costs of Swift tend to be costly • NTT DATA has know-how to reduce swift operation cost – Using operation parallelized tool – Customizing for monitoring priority – Change monitoring items to periodic check
  • 42. 42Copyright © 2015 NTT DATA Corporation Conclusion of this presentation We introduce usage, challenge, and operating OpenStack swift at docomo mail service system • System migration with no service down time • Three technical achievement • Reduce operating cost Docomo mail has been service with no down time. If you have something questions, please come to NTT booth. ○Attention All company names, product names, and service names mentioned are trademarks or registered trademarks of the respective companies
  • 43. Copyright © 2011 NTT DATA Corporation Copyright © 2015 NTT DATA Corporation