SlideShare a Scribd company logo
1 of 26
Download to read offline
Cloud Resilience
Fault Injection for Increased Resilience
Jorge Cardoso
(jorge.cardoso@huawei.com)
Huawei European Research Center
Riesstraße 25, 80992 München
The Butterfly Effect Project
OpenStack Munich - Cloud Resilience &
Experiences with OpenStack
Wednesday, April 13, 2016
6:30 PM
1
FusionSphere from Huawei
#6
2
News from OpenStack
06 April 2016
3
FAILURES ARE INEVITABLE!
THE BEST WE CAN DO IS BE
PREPARED FOR THEM AND LEARN
FROM THEM
TEST, REPAIR, LEARN & PREDICT !
4
Unplanned downtime
is caused by*
software bugs … 27%
hardware … 23%
human error … 18%
network failures … 17%
natural disasters … 8%
*Marcus, E., and Stern, H. Blueprints for High Availability: Designing Resilient Distributed Systems. John Wiley & Sons, Inc., 2003.
5
Google's 2007 found annualized failure
rates (AFRs) for drives
1 year old 1.7%
3 year old >8.6%
Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso. 2007. Failure trends in a large disk drive population. In Proceedings of
the 5th USENIX conference on File and Storage Technologies (FAST '07). USENIX Association, Berkeley, CA, USA, 2-2.
6
One reason [Netflix]: It’s the lack of control over the underlying
hardware, the inability to configure it to try to ensure 100%
uptime
Why does using a cloud infrastructure requires
advanced approaches for resiliency?
7
Technology Trends
GOOGLE TRENDS
CLOUD AVAILABILITY
CLOUD FAILURE
8
 Chaos Monkey
 Randomly terminates instances in a cluster
 Chaos Gorilla
 Simulate an Availability Zone becoming unavailable
 Chaos Kong
 Simulate an entire region outages
 Latency Monkey
 Introduce latency to network packets to simulate
degradation of the EC2 network
 Janitor Monkey
 Clean up unused resources
 Security Monkey
 Analyze and notify
on security profile changes
Netflix: Chaos Monkey
AWS recently recommended firms using
its infrastructure test their resilience by
using Chaos Monkey to induce failures
9
Netflix: Chaos Monkey
Fewer alerts
for ops team
Amazon EC2 and Amazon RDS Service
Disruption in the US East Region
April 29, 2011
September 20th, 2015
Amazon’s DynamoDB service experienced an availability issue in their US-EAST-1
Transfer traffic
to east region
10
A program designed to increase resilience by purposely injecting
major failures
Discover flaws and subtle dependencies
Amazon AWS: GameDay
“That seems totally bizarre on the face of it, but as you dig down, you end up finding
some dependency no one knew about previously […] We’ve had situations where we
brought down a network in, say, São Paulo, only to find that in doing so we broke our
links in Mexico.”
11
 Google DIRT (Disaster Recovery Test)
 Annual disaster recovery & testing exercise
 8 years since inception
 Multi-day exercise triggering (controlled) failures in systems and process
 Premise
 30-day incapacitation of headquarters following a disaster
 Other offices and facilities may be affected
 When
 “Big disaster”: Annually for 3-5 days
 Continuous testing: Year-round
 Who
 100s of engineers (Site Reliability, Network, Hardware, Software, Security, Facilities)
 Business units (Human Resources, Finance, Safety, Crisis response etc.)
Google: DiRT
http://flowcon.org/dl/flowcon-sanfran-2014/slides/KripaKrishnan_LearningContinuouslyFromFailures.pdf
12
Goal
-- Butterfly Effect System --
Enables to Automatically Test and Repair OpenStack and Cloud
Applications
CLOUD APPLICATION
HUAWEI FusionSphere
The system works by intentionally injecting different failures, test the ability to
survive them, and learn how to predict and repair failures preemptively
Failure
Repair
Test
13
Use Case: OpenStack Resiliency
Kill cinder database
(Simulate update failure)
Introduce delay in messages
(Full-scale traffic shows where
the real bottlenecks are)
Operation Error
OPENSTACK_KEYSTONE_URL = "http://%s:5000/v2.0" % OPENSTACK_HOST
Operation Error
/etc/nova/nova.conf
Delete: auth_strategy=keystone
Remove driver to HD
Remove access to NFS
(Simulate hardware failure)
Best way to avoid failure: Fail constantly
The main testing framework of OpenStack is called Tempest, an opensource project with more than 2000 tests: only black-box testing (test only access the public interfaces)
14
Use Case 1: Increasing Reliability
Public Cloud
Damage
Pattern
Butterfly Effect
Fix configurations
Fix bugs
Replace hardware
Upgrade memory
Fault Type
15
Use Case 2: Run Book Automation (RBA)
Public Cloud Incident Management
Is this really
an incident?
Major Incident
Procedure
Butterfly Effect
Fault Type
Damage
Pattern
Recovery
Script
16
MONITORING
Nagios Zabbix Cacti
StackTach Synaps Monasca
CONFIGURATION AUTOMATION
Ansible CFEngine Chef
Puppet Salt Heat
FAULT-INJECTION ENGINES
DestroyStack FSaaS
ChaosMonkey AnarchyApe
FAULT LIBRARIES AND PLANS
pyCallGraph Intellect
RunDeck Nose
DATA VISUALIZATION
Kibana Graylog2
Grafana
DAMAGE DETECTION
Tempest
Nose
DATA STORAGE
ElasticSearch OpenTSDB Neo4J
Graphite Cassandra Redis
DATA AGGREGATION
Logstash Collectd Flume
Fluentd Heka Ceilometer
MANUAL REPAIR
Bash Python
Chef Puppet
AUTOMATED REPAIR
jCOLIBRI myCBR Puppet
Rundeck (R)?ex Chef
DATA PROCESSING
Hadoop Pig
Hive Spark Storm
OPERATIONS ANALYTICS
Statsd R Panda
Weka Machine Leaning
ALERTING
Errbit Honeybadger Nagios
Zabbix OpenPager Riemann
DATA SOURCE
Log files Collectd Plg FlumeNG
OpenStack Tbls Zabbix Agt Nagios Plg
DATA TRANSPORT
rsyslog ZeroMQ
Components of a Solution
CONFIGURATION AUTOMATION
Ansible CFEngine Chef
Puppet Salt Heat
1
2 3
4
7
5
6
Design &
Deploy
Test
Infrastructure
Monitoring
Facilities
Design & Execute
Fault-Injection Plan
Identify Damages
Predict
Future Errors
Automatic
Repair
Repair & Learn
17
Technological Overview
 (1) Design & Deploy Test Environment
 Customizable, automated OpenStack deployment
 FusionServer RH2288 + VirtualBox + Vagrant + RDO
 (2) Design & Execute Fault-Injection Plan
 Language = Python (no DSL yet)
 Fault Engine = based on BPM
 Fault Plan = Workflow paradigm
 (3) Monitoring Facilities
 Monasca (from HP, RackSpace, IBM)
 Visualization with Grafana
 (4) Damage Detection
 OpenStack Tempest
 1200 tests (but only API testing :( )
 (5) Repair & Learn
 …
 (6) Predict Future Errors
 …
 (7) Automated Repair
 …
1
2
3
4
7
5
6
Design &
Deploy
Test
Infrastructure
Monitoring
Facilities
Design & Execute
Fault-Injection Plan
Damage Detection
Predict
Future Errors
Automatic
Repair
Repair & Learn
18
 Design & Deploy Test Environment
 Customizable, automated OpenStack deployment
 FusionServer RH2288 + VirtualBox + Vagrant + RDO
Deploy Test Environment
2 hours to deploy OpenStack infrastructure with 32 VMs
19
Faults to Inject
 Disk temporarily unavailable
 unmount a disk
 wait for replicas to regenerate
 remount the disk with the data intact
 wait for replicas to regenerate the extra replicas from handoff nodes
should get removed
 Disk replacement
 unmount a disk
 wait for replicas regenerate
 delete the disk and remount it
 wait for replicas to regenerate
 Extra replicas from handoff nodes should get removed
 Expected failure
 damage three disks at the same time
 more if the replica count is higher
 check that the replicas didn’t regenerate even after some time period
 fail if the replicas regenerated
 this tests if the tests themselves are correct
 VM failures
 send VM creation request
 find compute node where request was scheduled
 damage to the compute server
 check if the VM creation was re-scheduled to another node
3
Inject Faults
20
Damage Detection
The main testing framework of OpenStack is called Tempest, an opensource project with more than 2000 tests: only black-box testing (test only access the public interfaces)
Network tests
• create keypairs
• create security
groups
• create networks
Compute tests
• create a keypair
• create a security
group
• boot a instance
Swift tests
• create a volume
• get the volume
• delete the volume
Identity tests
…
Cinder tests
…
Glance tests
…
echo "$ tempest init cloud-01"
echo "$ cp tempest/etc/tempest.conf cloud-01/etc/"
echo "$ cd cloud-01"
echo "Next is the full test suite:"
echo "$ ostestr -c 3 --regex '(?!.*[.*bslowb.*])(^tempest.(api|scenario))'"
echo "Next ist the minimum basic test:"
echo "$ ostestr -c 3 --regex '(?!.*[.*bslowb.*])(^tempest.scenario.test_minimum_basic)'"
21
Zabbix and ELK
22
Monasca
 Overview: Uses the Keystone OpenStack Identity Service for authentication,
authorization and multi-tenancy. Monasca integrates with several other
OpenStack services such as Heat for auto-scaling and Ceilometer for
monitoring OpenStack resources.
 Apache Kafka: A high-throughput distributed messaging system. Kafka is a
central component in Monasca and provides the infranstructure for all internal
communications between components.
 Apache Storm: A free and open source distributed realtime computation
system. Apache Storm is used in the Monasca Threshold Engine.
 InfluxDB: An open-source distributed time series database with no external
dependencies. InfluxDB is one of the supported databases for storing metrics
and alarm history.
 MySQL: MySQL is one of the supported databases for the Monasca Config
Database.
 Grafana: An open source, feature rich metrics dashboard and graph editor.
Support for Monasca as a data source in Grafana has been added.
 Anomaly Detection: Engine implements real-time streaming anomaly detection.
Two algorithms: Numenta Platform for Intelligent Computing (NuPIC) and
Kolmogorov-Smirnov (K-S) Two Sample Test. Uses Stacktach for realtime
streaming.
 Performance: 3 HP Proliant SL390s G7 servers + InfluxDB cluster = 25K-30K
metrics/sec; monasca-api > 150K metrics/sec for a 3 node cluster with a load
balancing; for more performance use HP Vertica database.
See https://www.openstack.org/assets/presentation-media/Monasca-Deep-Dive-
Paris-Summit.pdf
Grafana (compute_instance_create_time)
Anomaly Detection (cpu.user_perc)
23
Application Domains
24
Join the Cause!
 Internship positions for MSc students
 Fault injection, fault models, fault libraries, fault plans,
brake and rebuild systems all day long, …
 OpenStack Engineers positions
 Rapid prototyping of cool ideas: propose it today,
code it, and show it running in 3 months…
 Innovative PoCs
 Solving difficult challenges of real problems using
quick and dirty prototyping
Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive
statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time
without notice.
HUAWEI ENTERPRISE ICT SOLUTIONS A BETTER WAY

More Related Content

What's hot

A survey-report-on-cloud-computing-testing-environment
A survey-report-on-cloud-computing-testing-environmentA survey-report-on-cloud-computing-testing-environment
A survey-report-on-cloud-computing-testing-environmentshritosh kumar
 
Hybrid Cloud Monitoring - Datatdog
Hybrid Cloud Monitoring - DatatdogHybrid Cloud Monitoring - Datatdog
Hybrid Cloud Monitoring - DatatdogChase Thompson
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computinginside-BigData.com
 
Cloud computing lab open stack
Cloud computing lab open stackCloud computing lab open stack
Cloud computing lab open stackarunuiet
 
Dependability assessments of reliable services in a private cloud environment
Dependability assessments of reliable services in a private cloud environmentDependability assessments of reliable services in a private cloud environment
Dependability assessments of reliable services in a private cloud environmentKPOST
 
Extending Grids with Cloud Resource Management for Scientific Computing
Extending Grids with Cloud Resource Management for Scientific ComputingExtending Grids with Cloud Resource Management for Scientific Computing
Extending Grids with Cloud Resource Management for Scientific ComputingBharat Kalia
 
Shuttle: Intrusion Recovery in Paas
Shuttle: Intrusion Recovery in PaasShuttle: Intrusion Recovery in Paas
Shuttle: Intrusion Recovery in PaasDário Nascimento
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at YorkMing Li
 
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...ijafrc
 
Morales-Capstone-IDS.IPS Deployment_revision1
Morales-Capstone-IDS.IPS Deployment_revision1Morales-Capstone-IDS.IPS Deployment_revision1
Morales-Capstone-IDS.IPS Deployment_revision1Jeremy Morales
 
App Performance Tip: Sharing Flash Across Virtualized Workloads
App Performance Tip: Sharing Flash Across Virtualized WorkloadsApp Performance Tip: Sharing Flash Across Virtualized Workloads
App Performance Tip: Sharing Flash Across Virtualized WorkloadsDataCore Software
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
 
IRJET- Implementation of Cloud Energy Saving System using Virtual Machine...
IRJET-  	  Implementation of Cloud Energy Saving System using Virtual Machine...IRJET-  	  Implementation of Cloud Energy Saving System using Virtual Machine...
IRJET- Implementation of Cloud Energy Saving System using Virtual Machine...IRJET Journal
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
An Extensible Architecture for Avionics Sensor Health Assessment Using DDS
An Extensible Architecture for Avionics Sensor Health Assessment Using DDSAn Extensible Architecture for Avionics Sensor Health Assessment Using DDS
An Extensible Architecture for Avionics Sensor Health Assessment Using DDSSumant Tambe
 

What's hot (20)

A survey-report-on-cloud-computing-testing-environment
A survey-report-on-cloud-computing-testing-environmentA survey-report-on-cloud-computing-testing-environment
A survey-report-on-cloud-computing-testing-environment
 
Hybrid Cloud Monitoring - Datatdog
Hybrid Cloud Monitoring - DatatdogHybrid Cloud Monitoring - Datatdog
Hybrid Cloud Monitoring - Datatdog
 
Dynamix IoT 2012
Dynamix IoT 2012Dynamix IoT 2012
Dynamix IoT 2012
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computing
 
Cloud computing lab open stack
Cloud computing lab open stackCloud computing lab open stack
Cloud computing lab open stack
 
Dependability assessments of reliable services in a private cloud environment
Dependability assessments of reliable services in a private cloud environmentDependability assessments of reliable services in a private cloud environment
Dependability assessments of reliable services in a private cloud environment
 
Extending Grids with Cloud Resource Management for Scientific Computing
Extending Grids with Cloud Resource Management for Scientific ComputingExtending Grids with Cloud Resource Management for Scientific Computing
Extending Grids with Cloud Resource Management for Scientific Computing
 
Shuttle: Intrusion Recovery in Paas
Shuttle: Intrusion Recovery in PaasShuttle: Intrusion Recovery in Paas
Shuttle: Intrusion Recovery in Paas
 
Advanced Research Computing at York
Advanced Research Computing at YorkAdvanced Research Computing at York
Advanced Research Computing at York
 
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
 
Morales-Capstone-IDS.IPS Deployment_revision1
Morales-Capstone-IDS.IPS Deployment_revision1Morales-Capstone-IDS.IPS Deployment_revision1
Morales-Capstone-IDS.IPS Deployment_revision1
 
M2M infrastructure using Docker
M2M infrastructure using DockerM2M infrastructure using Docker
M2M infrastructure using Docker
 
App Performance Tip: Sharing Flash Across Virtualized Workloads
App Performance Tip: Sharing Flash Across Virtualized WorkloadsApp Performance Tip: Sharing Flash Across Virtualized Workloads
App Performance Tip: Sharing Flash Across Virtualized Workloads
 
HADRFINAL13112016
HADRFINAL13112016HADRFINAL13112016
HADRFINAL13112016
 
92 494
92 49492 494
92 494
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
IRJET- Implementation of Cloud Energy Saving System using Virtual Machine...
IRJET-  	  Implementation of Cloud Energy Saving System using Virtual Machine...IRJET-  	  Implementation of Cloud Energy Saving System using Virtual Machine...
IRJET- Implementation of Cloud Energy Saving System using Virtual Machine...
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
An Extensible Architecture for Avionics Sensor Health Assessment Using DDS
An Extensible Architecture for Avionics Sensor Health Assessment Using DDSAn Extensible Architecture for Avionics Sensor Health Assessment Using DDS
An Extensible Architecture for Avionics Sensor Health Assessment Using DDS
 
Awalin viz sec
Awalin viz secAwalin viz sec
Awalin viz sec
 

Viewers also liked

OpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web InfrastructureOpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web InfrastructureTomoya Hashimoto
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-finalshintaro mizuno
 
OpenStack Summit 2016 Austin 参加報告
OpenStack Summit 2016 Austin 参加報告OpenStack Summit 2016 Austin 参加報告
OpenStack Summit 2016 Austin 参加報告kimura50
 
AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」
AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」
AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」KDDI
 
OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月
OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月
OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月VirtualTech Japan Inc.
 
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackAutomated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackNTT Communications Technology Development
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...VirtualTech Japan Inc.
 
OpenStack Day Taiwan 2016 -Shintaro Mizuno
OpenStack Day Taiwan 2016 -Shintaro MizunoOpenStack Day Taiwan 2016 -Shintaro Mizuno
OpenStack Day Taiwan 2016 -Shintaro Mizunoshintaro mizuno
 
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...VirtualTech Japan Inc.
 
Cloud Platform for IoT
Cloud Platform for IoTCloud Platform for IoT
Cloud Platform for IoTNaoto Umemori
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 

Viewers also liked (12)

OpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web InfrastructureOpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-final
 
OpenStack Summit 2016 Austin 参加報告
OpenStack Summit 2016 Austin 参加報告OpenStack Summit 2016 Austin 参加報告
OpenStack Summit 2016 Austin 参加報告
 
NTT i3 at OpenStack Summit - May 20th, 2015
NTT i3 at OpenStack Summit - May 20th, 2015NTT i3 at OpenStack Summit - May 20th, 2015
NTT i3 at OpenStack Summit - May 20th, 2015
 
AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」
AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」
AWS Summit 2016 「新規事業 "auでんき”をクラウドスピードでサービスイン」
 
OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月
OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月
OpenStack Summit Austin 2016 参加報告 - OpenStack最新情報セミナー 2016年5月
 
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackAutomated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
 
OpenStack Day Taiwan 2016 -Shintaro Mizuno
OpenStack Day Taiwan 2016 -Shintaro MizunoOpenStack Day Taiwan 2016 -Shintaro Mizuno
OpenStack Day Taiwan 2016 -Shintaro Mizuno
 
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
 
Cloud Platform for IoT
Cloud Platform for IoTCloud Platform for IoT
Cloud Platform for IoT
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 

Similar to Cloud Resilience with Open Stack

DOST 2016 Cloud Without Failures
DOST 2016 Cloud Without FailuresDOST 2016 Cloud Without Failures
DOST 2016 Cloud Without FailuresJorge Cardoso
 
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...Jorge Cardoso
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
 
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...Yandex
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
Vulnerability Exploitation in Docker Container Environments
Vulnerability Exploitation in Docker Container EnvironmentsVulnerability Exploitation in Docker Container Environments
Vulnerability Exploitation in Docker Container EnvironmentsFlawCheck
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to KubernetesPaul Czarkowski
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing Ran Levy
 
Weave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapWeave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapPatrick Chanezon
 
The Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdf
The Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdfThe Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdf
The Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdfChristopher Doman
 
ElasTest - Testing in the large
ElasTest - Testing in the largeElasTest - Testing in the large
ElasTest - Testing in the largeElasTest Project
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & PackerLAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & PackerJan-Christoph Küster
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos EngineeringSIGHUP
 
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAconCloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAconMario-Leander Reimer
 
OpenStack for VMware Administrators
OpenStack for VMware AdministratorsOpenStack for VMware Administrators
OpenStack for VMware AdministratorsTrevor Roberts Jr.
 
Fine line between performance and security
Fine line between performance and securityFine line between performance and security
Fine line between performance and securityAlmudena Vivanco
 

Similar to Cloud Resilience with Open Stack (20)

DOST 2016 Cloud Without Failures
DOST 2016 Cloud Without FailuresDOST 2016 Cloud Without Failures
DOST 2016 Cloud Without Failures
 
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in Production
 
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
<iframe src="http://video.yandex.ru/iframe/ya-events/0ro6nfi3fv.5216/" hei...
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Vulnerability Exploitation in Docker Container Environments
Vulnerability Exploitation in Docker Container EnvironmentsVulnerability Exploitation in Docker Container Environments
Vulnerability Exploitation in Docker Container Environments
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to Kubernetes
 
Cl306
Cl306Cl306
Cl306
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing
 
Weave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapWeave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 Recap
 
The Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdf
The Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdfThe Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdf
The Ultimate Guide to Docker & Kubernetes Forensics and Incident Response.pdf
 
ElasTest - Testing in the large
ElasTest - Testing in the largeElasTest - Testing in the large
ElasTest - Testing in the large
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & PackerLAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
LAMP Stack (Reloaded) - Infrastructure as Code with Terraform & Packer
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAconCloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
 
OpenStack for VMware Administrators
OpenStack for VMware AdministratorsOpenStack for VMware Administrators
OpenStack for VMware Administrators
 
ChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptxChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptx
 
Linux clustering solution
Linux clustering solutionLinux clustering solution
Linux clustering solution
 
Fine line between performance and security
Fine line between performance and securityFine line between performance and security
Fine line between performance and security
 

More from Jorge Cardoso

On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...Jorge Cardoso
 
Distributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLDistributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLJorge Cardoso
 
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep LearningAIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep LearningJorge Cardoso
 
AIOps: Anomalies Detection of Distributed Traces
AIOps: Anomalies Detection of Distributed TracesAIOps: Anomalies Detection of Distributed Traces
AIOps: Anomalies Detection of Distributed TracesJorge Cardoso
 
Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016Jorge Cardoso
 
Evolution and Overview of Linked USDL
Evolution and Overview of Linked USDLEvolution and Overview of Linked USDL
Evolution and Overview of Linked USDLJorge Cardoso
 
Ten years of service research from a computer science perspective
Ten years of service research from a computer science perspectiveTen years of service research from a computer science perspective
Ten years of service research from a computer science perspectiveJorge Cardoso
 
Cloud Computing Automation: Integrating USDL and TOSCA
 Cloud Computing Automation: Integrating USDL and TOSCA Cloud Computing Automation: Integrating USDL and TOSCA
Cloud Computing Automation: Integrating USDL and TOSCAJorge Cardoso
 
Open Service Network Analysis
Open Service Network AnalysisOpen Service Network Analysis
Open Service Network AnalysisJorge Cardoso
 
Open Semantic Service Networks: Modeling and Analysis
Open Semantic Service Networks: Modeling and AnalysisOpen Semantic Service Networks: Modeling and Analysis
Open Semantic Service Networks: Modeling and AnalysisJorge Cardoso
 
Modeling Service Relationships for Service Networks
Modeling Service Relationships for Service NetworksModeling Service Relationships for Service Networks
Modeling Service Relationships for Service NetworksJorge Cardoso
 
Challenges for Open Semantic Service Networks : models, theory, applications
Challenges for Open Semantic Service Networks: models, theory, applications Challenges for Open Semantic Service Networks: models, theory, applications
Challenges for Open Semantic Service Networks : models, theory, applications Jorge Cardoso
 
Description and portability of cloud services with USDL and TOSCA
Description and portability of cloud services with USDL and TOSCADescription and portability of cloud services with USDL and TOSCA
Description and portability of cloud services with USDL and TOSCAJorge Cardoso
 
Open Semantic Service Networks
Open Semantic Service NetworksOpen Semantic Service Networks
Open Semantic Service NetworksJorge Cardoso
 
Dynamic Open Semantic Service Networks
Dynamic Open Semantic Service NetworksDynamic Open Semantic Service Networks
Dynamic Open Semantic Service NetworksJorge Cardoso
 
Genssiz Projects: Year 2012 2013
Genssiz Projects: Year 2012 2013Genssiz Projects: Year 2012 2013
Genssiz Projects: Year 2012 2013Jorge Cardoso
 
IEEE SE2012 Internet-based self-services
IEEE SE2012 Internet-based self-servicesIEEE SE2012 Internet-based self-services
IEEE SE2012 Internet-based self-servicesJorge Cardoso
 
Community based harversting for USDL
Community based harversting for USDLCommunity based harversting for USDL
Community based harversting for USDLJorge Cardoso
 

More from Jorge Cardoso (20)

On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...On the Application of AI for Failure Management: Problems, Solutions and Algo...
On the Application of AI for Failure Management: Problems, Solutions and Algo...
 
Distributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLDistributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using ML
 
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep LearningAIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
 
AIOps: Anomalies Detection of Distributed Traces
AIOps: Anomalies Detection of Distributed TracesAIOps: Anomalies Detection of Distributed Traces
AIOps: Anomalies Detection of Distributed Traces
 
Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016Recapitulation Workshop Cloud Reliability Resilience 2016
Recapitulation Workshop Cloud Reliability Resilience 2016
 
Shape the Cloud
Shape the CloudShape the Cloud
Shape the Cloud
 
Evolution and Overview of Linked USDL
Evolution and Overview of Linked USDLEvolution and Overview of Linked USDL
Evolution and Overview of Linked USDL
 
Ten years of service research from a computer science perspective
Ten years of service research from a computer science perspectiveTen years of service research from a computer science perspective
Ten years of service research from a computer science perspective
 
Cloud Computing Automation: Integrating USDL and TOSCA
 Cloud Computing Automation: Integrating USDL and TOSCA Cloud Computing Automation: Integrating USDL and TOSCA
Cloud Computing Automation: Integrating USDL and TOSCA
 
Open Service Network Analysis
Open Service Network AnalysisOpen Service Network Analysis
Open Service Network Analysis
 
Open Semantic Service Networks: Modeling and Analysis
Open Semantic Service Networks: Modeling and AnalysisOpen Semantic Service Networks: Modeling and Analysis
Open Semantic Service Networks: Modeling and Analysis
 
Modeling Service Relationships for Service Networks
Modeling Service Relationships for Service NetworksModeling Service Relationships for Service Networks
Modeling Service Relationships for Service Networks
 
Linked USDL
Linked USDLLinked USDL
Linked USDL
 
Challenges for Open Semantic Service Networks : models, theory, applications
Challenges for Open Semantic Service Networks: models, theory, applications Challenges for Open Semantic Service Networks: models, theory, applications
Challenges for Open Semantic Service Networks : models, theory, applications
 
Description and portability of cloud services with USDL and TOSCA
Description and portability of cloud services with USDL and TOSCADescription and portability of cloud services with USDL and TOSCA
Description and portability of cloud services with USDL and TOSCA
 
Open Semantic Service Networks
Open Semantic Service NetworksOpen Semantic Service Networks
Open Semantic Service Networks
 
Dynamic Open Semantic Service Networks
Dynamic Open Semantic Service NetworksDynamic Open Semantic Service Networks
Dynamic Open Semantic Service Networks
 
Genssiz Projects: Year 2012 2013
Genssiz Projects: Year 2012 2013Genssiz Projects: Year 2012 2013
Genssiz Projects: Year 2012 2013
 
IEEE SE2012 Internet-based self-services
IEEE SE2012 Internet-based self-servicesIEEE SE2012 Internet-based self-services
IEEE SE2012 Internet-based self-services
 
Community based harversting for USDL
Community based harversting for USDLCommunity based harversting for USDL
Community based harversting for USDL
 

Recently uploaded

A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxBipin Adhikari
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 

Recently uploaded (20)

A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptx
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 

Cloud Resilience with Open Stack

  • 1. Cloud Resilience Fault Injection for Increased Resilience Jorge Cardoso (jorge.cardoso@huawei.com) Huawei European Research Center Riesstraße 25, 80992 München The Butterfly Effect Project OpenStack Munich - Cloud Resilience & Experiences with OpenStack Wednesday, April 13, 2016 6:30 PM
  • 4. 3 FAILURES ARE INEVITABLE! THE BEST WE CAN DO IS BE PREPARED FOR THEM AND LEARN FROM THEM TEST, REPAIR, LEARN & PREDICT !
  • 5. 4 Unplanned downtime is caused by* software bugs … 27% hardware … 23% human error … 18% network failures … 17% natural disasters … 8% *Marcus, E., and Stern, H. Blueprints for High Availability: Designing Resilient Distributed Systems. John Wiley & Sons, Inc., 2003.
  • 6. 5 Google's 2007 found annualized failure rates (AFRs) for drives 1 year old 1.7% 3 year old >8.6% Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso. 2007. Failure trends in a large disk drive population. In Proceedings of the 5th USENIX conference on File and Storage Technologies (FAST '07). USENIX Association, Berkeley, CA, USA, 2-2.
  • 7. 6 One reason [Netflix]: It’s the lack of control over the underlying hardware, the inability to configure it to try to ensure 100% uptime Why does using a cloud infrastructure requires advanced approaches for resiliency?
  • 8. 7 Technology Trends GOOGLE TRENDS CLOUD AVAILABILITY CLOUD FAILURE
  • 9. 8  Chaos Monkey  Randomly terminates instances in a cluster  Chaos Gorilla  Simulate an Availability Zone becoming unavailable  Chaos Kong  Simulate an entire region outages  Latency Monkey  Introduce latency to network packets to simulate degradation of the EC2 network  Janitor Monkey  Clean up unused resources  Security Monkey  Analyze and notify on security profile changes Netflix: Chaos Monkey AWS recently recommended firms using its infrastructure test their resilience by using Chaos Monkey to induce failures
  • 10. 9 Netflix: Chaos Monkey Fewer alerts for ops team Amazon EC2 and Amazon RDS Service Disruption in the US East Region April 29, 2011 September 20th, 2015 Amazon’s DynamoDB service experienced an availability issue in their US-EAST-1 Transfer traffic to east region
  • 11. 10 A program designed to increase resilience by purposely injecting major failures Discover flaws and subtle dependencies Amazon AWS: GameDay “That seems totally bizarre on the face of it, but as you dig down, you end up finding some dependency no one knew about previously […] We’ve had situations where we brought down a network in, say, São Paulo, only to find that in doing so we broke our links in Mexico.”
  • 12. 11  Google DIRT (Disaster Recovery Test)  Annual disaster recovery & testing exercise  8 years since inception  Multi-day exercise triggering (controlled) failures in systems and process  Premise  30-day incapacitation of headquarters following a disaster  Other offices and facilities may be affected  When  “Big disaster”: Annually for 3-5 days  Continuous testing: Year-round  Who  100s of engineers (Site Reliability, Network, Hardware, Software, Security, Facilities)  Business units (Human Resources, Finance, Safety, Crisis response etc.) Google: DiRT http://flowcon.org/dl/flowcon-sanfran-2014/slides/KripaKrishnan_LearningContinuouslyFromFailures.pdf
  • 13. 12 Goal -- Butterfly Effect System -- Enables to Automatically Test and Repair OpenStack and Cloud Applications CLOUD APPLICATION HUAWEI FusionSphere The system works by intentionally injecting different failures, test the ability to survive them, and learn how to predict and repair failures preemptively Failure Repair Test
  • 14. 13 Use Case: OpenStack Resiliency Kill cinder database (Simulate update failure) Introduce delay in messages (Full-scale traffic shows where the real bottlenecks are) Operation Error OPENSTACK_KEYSTONE_URL = "http://%s:5000/v2.0" % OPENSTACK_HOST Operation Error /etc/nova/nova.conf Delete: auth_strategy=keystone Remove driver to HD Remove access to NFS (Simulate hardware failure) Best way to avoid failure: Fail constantly The main testing framework of OpenStack is called Tempest, an opensource project with more than 2000 tests: only black-box testing (test only access the public interfaces)
  • 15. 14 Use Case 1: Increasing Reliability Public Cloud Damage Pattern Butterfly Effect Fix configurations Fix bugs Replace hardware Upgrade memory Fault Type
  • 16. 15 Use Case 2: Run Book Automation (RBA) Public Cloud Incident Management Is this really an incident? Major Incident Procedure Butterfly Effect Fault Type Damage Pattern Recovery Script
  • 17. 16 MONITORING Nagios Zabbix Cacti StackTach Synaps Monasca CONFIGURATION AUTOMATION Ansible CFEngine Chef Puppet Salt Heat FAULT-INJECTION ENGINES DestroyStack FSaaS ChaosMonkey AnarchyApe FAULT LIBRARIES AND PLANS pyCallGraph Intellect RunDeck Nose DATA VISUALIZATION Kibana Graylog2 Grafana DAMAGE DETECTION Tempest Nose DATA STORAGE ElasticSearch OpenTSDB Neo4J Graphite Cassandra Redis DATA AGGREGATION Logstash Collectd Flume Fluentd Heka Ceilometer MANUAL REPAIR Bash Python Chef Puppet AUTOMATED REPAIR jCOLIBRI myCBR Puppet Rundeck (R)?ex Chef DATA PROCESSING Hadoop Pig Hive Spark Storm OPERATIONS ANALYTICS Statsd R Panda Weka Machine Leaning ALERTING Errbit Honeybadger Nagios Zabbix OpenPager Riemann DATA SOURCE Log files Collectd Plg FlumeNG OpenStack Tbls Zabbix Agt Nagios Plg DATA TRANSPORT rsyslog ZeroMQ Components of a Solution CONFIGURATION AUTOMATION Ansible CFEngine Chef Puppet Salt Heat 1 2 3 4 7 5 6 Design & Deploy Test Infrastructure Monitoring Facilities Design & Execute Fault-Injection Plan Identify Damages Predict Future Errors Automatic Repair Repair & Learn
  • 18. 17 Technological Overview  (1) Design & Deploy Test Environment  Customizable, automated OpenStack deployment  FusionServer RH2288 + VirtualBox + Vagrant + RDO  (2) Design & Execute Fault-Injection Plan  Language = Python (no DSL yet)  Fault Engine = based on BPM  Fault Plan = Workflow paradigm  (3) Monitoring Facilities  Monasca (from HP, RackSpace, IBM)  Visualization with Grafana  (4) Damage Detection  OpenStack Tempest  1200 tests (but only API testing :( )  (5) Repair & Learn  …  (6) Predict Future Errors  …  (7) Automated Repair  … 1 2 3 4 7 5 6 Design & Deploy Test Infrastructure Monitoring Facilities Design & Execute Fault-Injection Plan Damage Detection Predict Future Errors Automatic Repair Repair & Learn
  • 19. 18  Design & Deploy Test Environment  Customizable, automated OpenStack deployment  FusionServer RH2288 + VirtualBox + Vagrant + RDO Deploy Test Environment 2 hours to deploy OpenStack infrastructure with 32 VMs
  • 20. 19 Faults to Inject  Disk temporarily unavailable  unmount a disk  wait for replicas to regenerate  remount the disk with the data intact  wait for replicas to regenerate the extra replicas from handoff nodes should get removed  Disk replacement  unmount a disk  wait for replicas regenerate  delete the disk and remount it  wait for replicas to regenerate  Extra replicas from handoff nodes should get removed  Expected failure  damage three disks at the same time  more if the replica count is higher  check that the replicas didn’t regenerate even after some time period  fail if the replicas regenerated  this tests if the tests themselves are correct  VM failures  send VM creation request  find compute node where request was scheduled  damage to the compute server  check if the VM creation was re-scheduled to another node 3 Inject Faults
  • 21. 20 Damage Detection The main testing framework of OpenStack is called Tempest, an opensource project with more than 2000 tests: only black-box testing (test only access the public interfaces) Network tests • create keypairs • create security groups • create networks Compute tests • create a keypair • create a security group • boot a instance Swift tests • create a volume • get the volume • delete the volume Identity tests … Cinder tests … Glance tests … echo "$ tempest init cloud-01" echo "$ cp tempest/etc/tempest.conf cloud-01/etc/" echo "$ cd cloud-01" echo "Next is the full test suite:" echo "$ ostestr -c 3 --regex '(?!.*[.*bslowb.*])(^tempest.(api|scenario))'" echo "Next ist the minimum basic test:" echo "$ ostestr -c 3 --regex '(?!.*[.*bslowb.*])(^tempest.scenario.test_minimum_basic)'"
  • 23. 22 Monasca  Overview: Uses the Keystone OpenStack Identity Service for authentication, authorization and multi-tenancy. Monasca integrates with several other OpenStack services such as Heat for auto-scaling and Ceilometer for monitoring OpenStack resources.  Apache Kafka: A high-throughput distributed messaging system. Kafka is a central component in Monasca and provides the infranstructure for all internal communications between components.  Apache Storm: A free and open source distributed realtime computation system. Apache Storm is used in the Monasca Threshold Engine.  InfluxDB: An open-source distributed time series database with no external dependencies. InfluxDB is one of the supported databases for storing metrics and alarm history.  MySQL: MySQL is one of the supported databases for the Monasca Config Database.  Grafana: An open source, feature rich metrics dashboard and graph editor. Support for Monasca as a data source in Grafana has been added.  Anomaly Detection: Engine implements real-time streaming anomaly detection. Two algorithms: Numenta Platform for Intelligent Computing (NuPIC) and Kolmogorov-Smirnov (K-S) Two Sample Test. Uses Stacktach for realtime streaming.  Performance: 3 HP Proliant SL390s G7 servers + InfluxDB cluster = 25K-30K metrics/sec; monasca-api > 150K metrics/sec for a 3 node cluster with a load balancing; for more performance use HP Vertica database. See https://www.openstack.org/assets/presentation-media/Monasca-Deep-Dive- Paris-Summit.pdf Grafana (compute_instance_create_time) Anomaly Detection (cpu.user_perc)
  • 25. 24 Join the Cause!  Internship positions for MSc students  Fault injection, fault models, fault libraries, fault plans, brake and rebuild systems all day long, …  OpenStack Engineers positions  Rapid prototyping of cool ideas: propose it today, code it, and show it running in 3 months…  Innovative PoCs  Solving difficult challenges of real problems using quick and dirty prototyping
  • 26. Copyright©2015 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice. HUAWEI ENTERPRISE ICT SOLUTIONS A BETTER WAY