SlideShare a Scribd company logo
1 of 38
Copyright©2015 NTT DOCOMO, INC. All rights reserved.
After One Year of OpenStack Cloud
Operation (NTT DOCOMO)
NTT DOCOMO Inc.
Ken Igarashi
NTT Software
Asako Ishigaki
NEC
Akihiro Motoki
DOCOMO, INC All Rights Reserved
Ken Igarashi
○ Leading OpenStack Project at NTT DOCOMO
○ One of the first members of proposing
OpenStack Bare Metal Provisioning (currently
called "Ironic") - bit.ly/1stuN2E
Asako Ishigaki
○ Engineer, NTT Software
○ Developing OpenStack log collection and
analytics tools.
Akihiro Motoki
○ Senior Research Engineer, NEC
○ Core developer of Neutron and Horizon.
About Us
2
Copyright©2015 NTT DOCOMO, INC. All rights reserved.
Our Project
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 4
Scalable Test
using 100 nodes
(10)
System
Design
(8)
Recovery Tests
(12)
Racking and
Cabling
(14)
24/7 support
(14)
User Support
(+x)
2014-6 2014-8 2014-11 2015-2 2015-5 2015-112015-8
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 5
o Team Rules (Culture)
 Focusing on using OpenStack instead of developing OpenStack
 Think how to use it.
 Don’t think OpenStack can’t do XXXX.
 Reducing Opex/Promoting Automation
 Operation tools
• “Anything that a humane needs to do more than twice must be
automated.”
 Reduce operators by HA and self healing.
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 6
o Tools
 Ansible, Python, Shell Script
CI/CD
• pep-8
• Ansible-lint
• Install
Spec Writing
Test
Review
Production
+5
200+ deployments
(2015)
2000+ patches
(2015)
Deployment
Procedure
Copyright©2015 NTT DOCOMO, INC. All rights reserved.
Operation
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 8
o OpenStack Configuration(http://bit.ly/1DbJPUO)
 Double redundancies for hardware
 Triple redundancies for software
VM
VM
VM
VM
VM
VM
MySQL (Galera)
Arbitrator
DB1 DB2
DB3 DB4 VM
VM
Nova
OpenStack
APIs
Zabbix
LBLB
Neutron Agents
PXE, DNS, DHCP
MaaS
RabbitMQ
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 9
o OpenStack Configuration(http://bit.ly/1DbJPUO)
 Double redundancies for hardware
 Triple redundancies for software
VM
VM
VM
VM
VM
VM
MySQL (Galera)
Arbitrator
DB1 DB2
DB3 DB4 VM
VM
Nova
OpenStack
APIs
Zabbix
LBLB
Neutron Agents
PXE, DNS, DHCP
MaaS
RabbitMQ
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 10
o Deployment
 CMDB Registration
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 11
o Choose playbooks for Ansible Dynamic Inventory
Ansible
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 12
o Deployments
 Common: network, account, logging, Zabbix agent, drivers/firmware x
37
 OpenStack: Nova, Swift, Neutron, ……. x 62
 HA Configuration
compileInitial update setup
kernel driver firmware filesystem
development
environment
Install HDD Driver
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 13
o Operation x 31
 Common: process restart, log correction
 OpenStack Operation: usage, VM migration/backup, user
add/delete/quota change
 OpenStack Monitoring: health check tools
 perhost instance check
• Launch instances on given node(s)
• boot succeed, instance log
• Metadata retrieval, login prompt, SSH access
• Optionally, test volume attach and its read/write access
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 14
o 2015/10/27 4:40pm - 5:20pm
 Heian (New Takanawa)
What are operators doing
behind the Cloud?
Copyright©2015 NTT DOCOMO, INC. All rights reserved.
Monitoring System
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 16
o Monitoring System
Weekday daytime
24h / 365d
VM
VM
…
VM
VM
Swift
VM
VM
Cinder
VM
VM
Nova
RabbitMQ
Neutron Agents
Data Bases
Fluentd
Elastic
search
Zabbi
x
Kibana
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 17
VM
VM
…
VM
VM
Swift
VM
VM
Cinder
VM
VM
Nova
RabbitMQ
Neutron Agents Data Bases
Memory CPU Network HDD
General
OpenStack
Monitoring Items Self Healing
1,970 25
3,957 59
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 18
o RabbitMQ
 Configuration
 3 node cluster
 cluster_partition_handling, autoheal
 Monitoring
 Split Brain check:
• “rabbitmqctl eval '[N||{partitions,N}<-rabbit_mnesia:status()].’”
 Port Check (5672, 25672)
 Process Check
• Beam.smp
• Rabbitmq-server
At least one node running(1/3)
• {Openstack-RabbitMQ:grpsum["HostG-
RabbitMQ","net.tcp.service[tcp,,25672]",last
,0].count(#3,0,"eq")}=3
• {OpenStack-RabbitMQ:grpsum["HostG-
RabbitMQ","proc.num[beam.smp]",last,0].c
ount(#3,0,"eq")}=3
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 19
o MySQL
 Configuration
 4 Nodes + 1 Arbitrator
 Monitoring
 Cluster Check
• wsrep_local_recv_queue
• wsrep_local_send_queue
• wsrep_flow_control_paused
• wsrep_local_commits
Arbitrator
LB
R/W
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 20
o MySQL Cluster
Master
Disk
Galera
recv_queuesend_queue
Commit
Disk
Replication
OK
Slave
MySQL
Client
OK
Wait until receive OK
from replication
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 21
o MySQL Cluster Freeze
Master
Disk
Galera
recv_queuesend_queue
Commit
Disk
Replication
OK
Slave
MySQL
Client
OK
Wait until receive OK
from replication
👿
• Disk Failure: 😀 (removed from cluster)
• Disk Speed Throttling : 😢
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 22
DOCOMO, INC All Rights Reserved
○ Prohibit some self-healing actions
 Do not reboot some OpenStack processes
– neutron-plugin-openvswitch-agent
 Do not reboot network nodes
– loose network reachability (can’t recreate network namespace)
Prohibited Actions while MySQL Cluster Freeze
23
Solved at Liberty?
All the VMs loose connections
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 24
o Throttling happens during DB backup
 Limit Backup Node
 Backup Method
LB
R/W
Limit Backup Node
LOCK TABLES FOR
BACKUP (online)
1. Take from cluster
(Donor/Desynced)
2. DB lock and do backup
(FLUSH TABLES WITH READ
LOCK)
3. Return to cluster
(wsrep_desync=OFF)
– wsrep_local_recv_queue
– wsrep_local_commits
Copyright©2015 NTT DOCOMO, INC. All rights reserved.
Log Analytics
Kibana
DOCOMO, INC All Rights Reserved
(1) detect critical system-
failure
We have to
recover
immediately
(2) detect malicious access
We need to
notify users
(3) detect no critical errors
Better to be fixed
as soon as
possible
(4) find errors/warnings that
have no service impact
We want to
filter out
next time
Purpose of Log Analytics
26
DOCOMO, INC All Rights Reserved
○ e.g.Logs of a day
Total:
100 GB, 80M lines
Sum of critical, error and warning logs:
200K lines
The meaningful logs are more restrictive:
(1) 0 critical failure (2) 0 malicious access
(3) 6 non-critical failure (4) 6 ignorable failure
0%
0%
1%
30%
39%
30%
Breakdown of Logs
Critical
Error
Warning
Info
Debug
Other
Treasure Hunt in The Ocean of Logs
0%
24%
24%49%
3%
HW
OS
OpenStack
backend
OpenStack
Operation
tools
27
DOCOMO, INC All Rights Reserved
○ We analyze logs to enhance our black list and white list.
○ Logs found in our black list are sent to Zabbix.
Log Analytics Based on White/Black List
-----
-----
-----
Logs
trash
Zabbix Kibana
-----
-----
-----
-----
expand
expand
reduce
analyze…
28
add
add
black list
white list
DOCOMO, INC All Rights Reserved
Log Server
Network
Node
Control
Node
Compute
Node
How to Adopt Black/White List Using Fluentd
Fluentd
Elasticsearch
zabbix_sender
fluentd
LB
UTM
• Add “ignorable” flag according to
white list
• Put metadata to create graphs
from the logs
rsyslog
refer
Zabbix
alerts
Kibana
graph graph
Notify Zabbix according to
black list
29
DOCOMO, INC All Rights Reserved
Log Server
How to Adopt Black/White List Using Fluentd
Fluentd
Elasticsearch
zabbix_sender
fluentd
1. syslog
10:01 crit: hardware failure
path: syslog rsyslog api.log
timestamp: 10:01 10:03 10:04
severity: crit warn ERROR
item: - ids ignore
source_ip: - x.x.x.x -
message: hardware
failure
IDS:
from
x.x.x.x
invalid
request
format
3. api.log
10:04 ERROR: invalid request format
2. rsyslog
10:03 warn: IDS: from x.x.x.x
Zabbix
hardware
failure
Kibana
IDS
graph
crit
graph
refer
30
DOCOMO, INC All Rights Reserved
Example of Our White List # with Juno
• Count response codes and understand the trend.
That’s enough.
^keystonemiddleware.auth_token [-] Unable to find
authentication token in headers$
• This ERROR means user’s operation was denied due to quota.
• It has no impact to our system. Should be INFO log?
^nova.api.openstack [[^]]*] Caught error:
VolumeSizeExceedsAvailableQuota: Requested volume or
snapshot exceeds allowed Gigabytes quota..*$
• This WARNING is caused by presence of SHUTOFF instances.
• It is commonplace condition. Need to be ignored.
^nova.scheduler.host_manager [[^]]+] Host has more
disk space than database expected .*$
31
1
2
3
DOCOMO, INC All Rights Reserved
○ We succeeded in reducing logs to be analyzed.
 In other words, so many meaningless logs have high log-levels.
Effect of Our White List
Without White List: 160K
With White List: 37
reduce
99.98%
32
Today
We can analyze all logs in 2-
3 hours a day!
1 year ago
We couldn’t analyze all logs
in a day
DOCOMO, INC All Rights Reserved
Example of Our Black List
• This message indicates disk problem on
Compute node.
^kernel: [[^]]*] XXXXX.*hardware failure.$
• Corosync needs cleanup its resources.
^pengine: warning: unpack_rsc_op:
Processing failed op monitor for .*$
• Fullbackup of mysql failed once.
^mysql_fullbackup[d+]:sFailedstosMySQLsfullbacku
p.*$
33
Warning
alert
Information
alert
Information
alert
1
2
3
DOCOMO, INC All Rights Reserved
Demonstration with Kibana
○ 3 dashboards
 OpenStack
 All Logs
 Error Logs
 Critical Logs
 Warning Logs
 IDS
34
DOCOMO, INC All Rights Reserved
Trademarks
○ Kibana is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
○ Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
○ logstash is a trademark of Elasticsearch BV.
35
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 36
o Presentation - Operation
 2015/10/27 4:40pm - 5:20pm Heian (New Takanawa)
「What are operators doing behind the Cloud?」
o Exhibition
 NEC Booth(H4)
 28(Wed.)10:45-13:00,16:30-18:30, 29(Thu.) 9:00-14:00
 NTT Group Booth(S14)
 28(Wed.) 13:15-16:15
「Touch and Feel! NTT DOCOMO’s Cloud Operation」
contact-cloudpf-ml@nttdocomo.com
Copyright©2015 NTT DOCOMO, INC. All rights reserved. 37
NEC NTT
Copyright©2015 NTT DOCOMO, INC. All rights reserved.
ご清聴ありがとうございました。

More Related Content

What's hot

OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN ControllerOpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN ControllerYongyoon Shin
 
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月VirtualTech Japan Inc.
 
OpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web InfrastructureOpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web InfrastructureTomoya Hashimoto
 
OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728
OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728
OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728jieun kim
 
Is OpenStack Neutron production ready for large scale deployments?
Is OpenStack Neutron production ready for large scale deployments?Is OpenStack Neutron production ready for large scale deployments?
Is OpenStack Neutron production ready for large scale deployments?Елена Ежова
 
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...VirtualTech Japan Inc.
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstackIkuo Kumagai
 
OpenStack: Inside Out
OpenStack: Inside OutOpenStack: Inside Out
OpenStack: Inside OutEtsuji Nakai
 
OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석Yongyoon Shin
 
[2015-11월 정기 세미나]K8s on openstack
[2015-11월 정기 세미나]K8s on openstack[2015-11월 정기 세미나]K8s on openstack
[2015-11월 정기 세미나]K8s on openstackOpenStack Korea Community
 
Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Yongyoon Shin
 
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackAutomated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackNTT Communications Technology Development
 
Canonical ubuntu introduction_20170330
Canonical ubuntu introduction_20170330Canonical ubuntu introduction_20170330
Canonical ubuntu introduction_20170330Takaaki Suzuki
 
Unlock Your Cloud Potential with Mirantis OpenStack & Cumulus Linux
Unlock Your Cloud Potential with Mirantis OpenStack & Cumulus LinuxUnlock Your Cloud Potential with Mirantis OpenStack & Cumulus Linux
Unlock Your Cloud Potential with Mirantis OpenStack & Cumulus LinuxCumulus Networks
 
Delivering Container-based Apps to IoT Edge devices
Delivering Container-based Apps to IoT Edge devicesDelivering Container-based Apps to IoT Edge devices
Delivering Container-based Apps to IoT Edge devicesAjeet Singh Raina
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus Hirofumi Ichihara
 
L3HA-VRRP-20141201
L3HA-VRRP-20141201L3HA-VRRP-20141201
L3HA-VRRP-20141201Manabu Ori
 
Ansible x napalm x nso 解説・比較パネルディスカッション nso
Ansible x napalm x nso 解説・比較パネルディスカッション nsoAnsible x napalm x nso 解説・比較パネルディスカッション nso
Ansible x napalm x nso 解説・比較パネルディスカッション nsoAkira Iwamoto
 

What's hot (20)

OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN ControllerOpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
 
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
 
OpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web InfrastructureOpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
OpenStack at NTT Resonant: Lessons Learned in Web Infrastructure
 
OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728
OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728
OpenStack Korea 2015 상반기스터디(devops) 스크립트로 오픈스택 설치하기 20150728
 
Is OpenStack Neutron production ready for large scale deployments?
Is OpenStack Neutron production ready for large scale deployments?Is OpenStack Neutron production ready for large scale deployments?
Is OpenStack Neutron production ready for large scale deployments?
 
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
Trove Updates - Kilo Edition
Trove Updates - Kilo EditionTrove Updates - Kilo Edition
Trove Updates - Kilo Edition
 
OpenStack: Inside Out
OpenStack: Inside OutOpenStack: Inside Out
OpenStack: Inside Out
 
OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석
 
[2015-11월 정기 세미나]K8s on openstack
[2015-11월 정기 세미나]K8s on openstack[2015-11월 정기 세미나]K8s on openstack
[2015-11월 정기 세미나]K8s on openstack
 
Neutron CI Run on Docker
Neutron CI Run on DockerNeutron CI Run on Docker
Neutron CI Run on Docker
 
Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1Harmonia open iris_basic_v0.1
Harmonia open iris_basic_v0.1
 
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackAutomated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
 
Canonical ubuntu introduction_20170330
Canonical ubuntu introduction_20170330Canonical ubuntu introduction_20170330
Canonical ubuntu introduction_20170330
 
Unlock Your Cloud Potential with Mirantis OpenStack & Cumulus Linux
Unlock Your Cloud Potential with Mirantis OpenStack & Cumulus LinuxUnlock Your Cloud Potential with Mirantis OpenStack & Cumulus Linux
Unlock Your Cloud Potential with Mirantis OpenStack & Cumulus Linux
 
Delivering Container-based Apps to IoT Edge devices
Delivering Container-based Apps to IoT Edge devicesDelivering Container-based Apps to IoT Edge devices
Delivering Container-based Apps to IoT Edge devices
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
 
L3HA-VRRP-20141201
L3HA-VRRP-20141201L3HA-VRRP-20141201
L3HA-VRRP-20141201
 
Ansible x napalm x nso 解説・比較パネルディスカッション nso
Ansible x napalm x nso 解説・比較パネルディスカッション nsoAnsible x napalm x nso 解説・比較パネルディスカッション nso
Ansible x napalm x nso 解説・比較パネルディスカッション nso
 

Viewers also liked

OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー...
 OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー... OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー...
OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー...VirtualTech Japan Inc.
 
DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月
DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月
DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月VirtualTech Japan Inc.
 
使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演
使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演
使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演VirtualTech Japan Inc.
 
NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月
NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月
NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月VirtualTech Japan Inc.
 
ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介VirtualTech Japan Inc.
 
仮想化環境の設計手法〜プロのテクニック教えます〜
仮想化環境の設計手法〜プロのテクニック教えます〜仮想化環境の設計手法〜プロのテクニック教えます〜
仮想化環境の設計手法〜プロのテクニック教えます〜VirtualTech Japan Inc.
 
仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」
仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」
仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」VirtualTech Japan Inc.
 
2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライド
2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライド2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライド
2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライドEMC Japan
 
日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月
日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月
日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月VirtualTech Japan Inc.
 
『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月
『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月
『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月VirtualTech Japan Inc.
 

Viewers also liked (11)

OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー...
 OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー... OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー...
OpenStackもオンプレミスもまとめて一元監視 Hatohol+Zabbixでハイブリッドクラウド監視を実現 - OpenStack最新情報セミナー...
 
DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月
DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月
DeNAがオンプレでこれからやろうとしてること - OpenStack最新情報セミナー 2015年12月
 
使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演
使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演
使ってわかった!現場担当者が語るOpenStack運用管理の課題:OpenStack Days 2015 Tokyo 講演
 
NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月
NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月
NFV/OPNFV概要 – OpenStack最新情報セミナー 2015年4月
 
ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介ZabbixによるOpenStack監視のご紹介
ZabbixによるOpenStack監視のご紹介
 
仮想化環境の設計手法〜プロのテクニック教えます〜
仮想化環境の設計手法〜プロのテクニック教えます〜仮想化環境の設計手法〜プロのテクニック教えます〜
仮想化環境の設計手法〜プロのテクニック教えます〜
 
仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」
仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」
仮想化専門コンサルタントが教える「成功するエンタープライズクラウド環境構のポイント」
 
2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライド
2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライド2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライド
2015.6.5 EMC主催OpenStackセミナー - 日本仮想化技術様講演スライド
 
NFVについて
NFVについてNFVについて
NFVについて
 
日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月
日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月
日本仮想化技術講演 OpenStack最新情報セミナー 2014年2月
 
『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月
『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月
『フルスタックエンジニアを目指す』ためのOpenStack勉強術 - OpenStack最新情報セミナー 2014年2月
 

Similar to NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud Operation (NTT DOCOMO)」

HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...Yuji Kubota
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.pptzagreb2
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceDocker, Inc.
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Akihiro Suda
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawnGábor Nyers
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...Imperva Incapsula
 
Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5Keisuke Takahashi
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopBrian Christner
 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationMárton Balassi
 
Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)siouxhotornot
 
What’s New in UniVerse 11.2
What’s New in UniVerse 11.2What’s New in UniVerse 11.2
What’s New in UniVerse 11.2Rocket Software
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus SDN/OpenFlow switch
 
Ebs performance tuning session feb 13 2013---Presented by Oracle
Ebs performance tuning session  feb 13 2013---Presented by OracleEbs performance tuning session  feb 13 2013---Presented by Oracle
Ebs performance tuning session feb 13 2013---Presented by OracleAkash Pramanik
 

Similar to NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud Operation (NTT DOCOMO)」 (20)

HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
HeapStats: Troubleshooting with Serviceability and the New Runtime Monitoring...
 
Collect, summarize and notify of OpenStack's log
Collect, summarize and notify of OpenStack's logCollect, summarize and notify of OpenStack's log
Collect, summarize and notify of OpenStack's log
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
 
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...
 
Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5Trying and evaluating the new features of GlusterFS 3.5
Trying and evaluating the new features of GlusterFS 3.5
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integration
 
Database Firewall with Snort
Database Firewall with SnortDatabase Firewall with Snort
Database Firewall with Snort
 
Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)
 
What’s New in UniVerse 11.2
What’s New in UniVerse 11.2What’s New in UniVerse 11.2
What’s New in UniVerse 11.2
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
OpenStack with OpenDaylight
OpenStack with OpenDaylightOpenStack with OpenDaylight
OpenStack with OpenDaylight
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
 
OpenStack Murano
OpenStack MuranoOpenStack Murano
OpenStack Murano
 
Ebs performance tuning session feb 13 2013---Presented by Oracle
Ebs performance tuning session  feb 13 2013---Presented by OracleEbs performance tuning session  feb 13 2013---Presented by Oracle
Ebs performance tuning session feb 13 2013---Presented by Oracle
 

More from VirtualTech Japan Inc.

5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜
5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜
5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜VirtualTech Japan Inc.
 
エンジニアが幸せになれる会社を目指します
エンジニアが幸せになれる会社を目指しますエンジニアが幸せになれる会社を目指します
エンジニアが幸せになれる会社を目指しますVirtualTech Japan Inc.
 
今からはじめる! Linuxコマンド入門
今からはじめる! Linuxコマンド入門今からはじめる! Linuxコマンド入門
今からはじめる! Linuxコマンド入門VirtualTech Japan Inc.
 
5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へ
5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へ5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へ
5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へVirtualTech Japan Inc.
 
Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版VirtualTech Japan Inc.
 
MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築
MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築
MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築VirtualTech Japan Inc.
 
5G時代のアプリケーション開発とは
5G時代のアプリケーション開発とは5G時代のアプリケーション開発とは
5G時代のアプリケーション開発とはVirtualTech Japan Inc.
 
hbstudy#88 5G+MEC時代のシステム設計
hbstudy#88 5G+MEC時代のシステム設計hbstudy#88 5G+MEC時代のシステム設計
hbstudy#88 5G+MEC時代のシステム設計VirtualTech Japan Inc.
 
通信への課題発掘ワークショップ 「5Gイノベーション」の取り組み
通信への課題発掘ワークショップ 「5Gイノベーション」の取り組み通信への課題発掘ワークショップ 「5Gイノベーション」の取り組み
通信への課題発掘ワークショップ 「5Gイノベーション」の取り組みVirtualTech Japan Inc.
 
Kubernetes雑にまとめてみた 2019年12月版
Kubernetes雑にまとめてみた 2019年12月版Kubernetes雑にまとめてみた 2019年12月版
Kubernetes雑にまとめてみた 2019年12月版VirtualTech Japan Inc.
 
OpenStackを使用したGPU仮想化IaaS環境 事例紹介
OpenStackを使用したGPU仮想化IaaS環境 事例紹介OpenStackを使用したGPU仮想化IaaS環境 事例紹介
OpenStackを使用したGPU仮想化IaaS環境 事例紹介VirtualTech Japan Inc.
 
5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとは
5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとは5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとは
5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとはVirtualTech Japan Inc.
 
KubeCon China & MWC Shangai 出張報告
KubeCon China & MWC Shangai 出張報告KubeCon China & MWC Shangai 出張報告
KubeCon China & MWC Shangai 出張報告VirtualTech Japan Inc.
 
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...VirtualTech Japan Inc.
 
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)VirtualTech Japan Inc.
 
Multi-access Edge Computing(MEC)における”Edge”の定義
Multi-access Edge Computing(MEC)における”Edge”の定義Multi-access Edge Computing(MEC)における”Edge”の定義
Multi-access Edge Computing(MEC)における”Edge”の定義VirtualTech Japan Inc.
 
Edge Computing Architecture using GPUs and Kubernetes
Edge Computing Architecture using GPUs and KubernetesEdge Computing Architecture using GPUs and Kubernetes
Edge Computing Architecture using GPUs and KubernetesVirtualTech Japan Inc.
 

More from VirtualTech Japan Inc. (20)

5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜
5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜
5G時代のアプリケーションとは 〜 5G+MECを活用した低遅延アプリの実現へ 〜
 
エンジニアが幸せになれる会社を目指します
エンジニアが幸せになれる会社を目指しますエンジニアが幸せになれる会社を目指します
エンジニアが幸せになれる会社を目指します
 
KubeVirt 201 How to Using the GPU
KubeVirt 201 How to Using the GPUKubeVirt 201 How to Using the GPU
KubeVirt 201 How to Using the GPU
 
KubeVirt 101
KubeVirt 101KubeVirt 101
KubeVirt 101
 
今からはじめる! Linuxコマンド入門
今からはじめる! Linuxコマンド入門今からはじめる! Linuxコマンド入門
今からはじめる! Linuxコマンド入門
 
5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へ
5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へ5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へ
5G時代のアプリケーション開発とは - 5G+MECを活用した低遅延アプリの実現へ
 
Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版
 
MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築
MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築
MS Teams + OBS Studio (+ OBS Mac Virtual Camera) でのオンラインセミナーのプロトタイプの構築
 
5G時代のアプリケーション開発とは
5G時代のアプリケーション開発とは5G時代のアプリケーション開発とは
5G時代のアプリケーション開発とは
 
hbstudy#88 5G+MEC時代のシステム設計
hbstudy#88 5G+MEC時代のシステム設計hbstudy#88 5G+MEC時代のシステム設計
hbstudy#88 5G+MEC時代のシステム設計
 
通信への課題発掘ワークショップ 「5Gイノベーション」の取り組み
通信への課題発掘ワークショップ 「5Gイノベーション」の取り組み通信への課題発掘ワークショップ 「5Gイノベーション」の取り組み
通信への課題発掘ワークショップ 「5Gイノベーション」の取り組み
 
Kubernetes雑にまとめてみた 2019年12月版
Kubernetes雑にまとめてみた 2019年12月版Kubernetes雑にまとめてみた 2019年12月版
Kubernetes雑にまとめてみた 2019年12月版
 
OpenStackを使用したGPU仮想化IaaS環境 事例紹介
OpenStackを使用したGPU仮想化IaaS環境 事例紹介OpenStackを使用したGPU仮想化IaaS環境 事例紹介
OpenStackを使用したGPU仮想化IaaS環境 事例紹介
 
Docker超入門
Docker超入門Docker超入門
Docker超入門
 
5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとは
5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとは5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとは
5Gにまつわる3つの誤解 - 5G×ライブコンテンツ:5G時代の双方向コンテンツとは
 
KubeCon China & MWC Shangai 出張報告
KubeCon China & MWC Shangai 出張報告KubeCon China & MWC Shangai 出張報告
KubeCon China & MWC Shangai 出張報告
 
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...
NTT Docomo's Challenge looking ahead the world pf 5G × OpenStack - OpenStack最...
 
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)
 
Multi-access Edge Computing(MEC)における”Edge”の定義
Multi-access Edge Computing(MEC)における”Edge”の定義Multi-access Edge Computing(MEC)における”Edge”の定義
Multi-access Edge Computing(MEC)における”Edge”の定義
 
Edge Computing Architecture using GPUs and Kubernetes
Edge Computing Architecture using GPUs and KubernetesEdge Computing Architecture using GPUs and Kubernetes
Edge Computing Architecture using GPUs and Kubernetes
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud Operation (NTT DOCOMO)」

  • 1. Copyright©2015 NTT DOCOMO, INC. All rights reserved. After One Year of OpenStack Cloud Operation (NTT DOCOMO) NTT DOCOMO Inc. Ken Igarashi NTT Software Asako Ishigaki NEC Akihiro Motoki
  • 2. DOCOMO, INC All Rights Reserved Ken Igarashi ○ Leading OpenStack Project at NTT DOCOMO ○ One of the first members of proposing OpenStack Bare Metal Provisioning (currently called "Ironic") - bit.ly/1stuN2E Asako Ishigaki ○ Engineer, NTT Software ○ Developing OpenStack log collection and analytics tools. Akihiro Motoki ○ Senior Research Engineer, NEC ○ Core developer of Neutron and Horizon. About Us 2
  • 3. Copyright©2015 NTT DOCOMO, INC. All rights reserved. Our Project
  • 4. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 4 Scalable Test using 100 nodes (10) System Design (8) Recovery Tests (12) Racking and Cabling (14) 24/7 support (14) User Support (+x) 2014-6 2014-8 2014-11 2015-2 2015-5 2015-112015-8
  • 5. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 5 o Team Rules (Culture)  Focusing on using OpenStack instead of developing OpenStack  Think how to use it.  Don’t think OpenStack can’t do XXXX.  Reducing Opex/Promoting Automation  Operation tools • “Anything that a humane needs to do more than twice must be automated.”  Reduce operators by HA and self healing.
  • 6. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 6 o Tools  Ansible, Python, Shell Script CI/CD • pep-8 • Ansible-lint • Install Spec Writing Test Review Production +5 200+ deployments (2015) 2000+ patches (2015) Deployment Procedure
  • 7. Copyright©2015 NTT DOCOMO, INC. All rights reserved. Operation
  • 8. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 8 o OpenStack Configuration(http://bit.ly/1DbJPUO)  Double redundancies for hardware  Triple redundancies for software VM VM VM VM VM VM MySQL (Galera) Arbitrator DB1 DB2 DB3 DB4 VM VM Nova OpenStack APIs Zabbix LBLB Neutron Agents PXE, DNS, DHCP MaaS RabbitMQ
  • 9. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 9 o OpenStack Configuration(http://bit.ly/1DbJPUO)  Double redundancies for hardware  Triple redundancies for software VM VM VM VM VM VM MySQL (Galera) Arbitrator DB1 DB2 DB3 DB4 VM VM Nova OpenStack APIs Zabbix LBLB Neutron Agents PXE, DNS, DHCP MaaS RabbitMQ
  • 10. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 10 o Deployment  CMDB Registration
  • 11. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 11 o Choose playbooks for Ansible Dynamic Inventory Ansible
  • 12. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 12 o Deployments  Common: network, account, logging, Zabbix agent, drivers/firmware x 37  OpenStack: Nova, Swift, Neutron, ……. x 62  HA Configuration compileInitial update setup kernel driver firmware filesystem development environment Install HDD Driver
  • 13. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 13 o Operation x 31  Common: process restart, log correction  OpenStack Operation: usage, VM migration/backup, user add/delete/quota change  OpenStack Monitoring: health check tools  perhost instance check • Launch instances on given node(s) • boot succeed, instance log • Metadata retrieval, login prompt, SSH access • Optionally, test volume attach and its read/write access
  • 14. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 14 o 2015/10/27 4:40pm - 5:20pm  Heian (New Takanawa) What are operators doing behind the Cloud?
  • 15. Copyright©2015 NTT DOCOMO, INC. All rights reserved. Monitoring System
  • 16. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 16 o Monitoring System Weekday daytime 24h / 365d VM VM … VM VM Swift VM VM Cinder VM VM Nova RabbitMQ Neutron Agents Data Bases Fluentd Elastic search Zabbi x Kibana
  • 17. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 17 VM VM … VM VM Swift VM VM Cinder VM VM Nova RabbitMQ Neutron Agents Data Bases Memory CPU Network HDD General OpenStack Monitoring Items Self Healing 1,970 25 3,957 59
  • 18. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 18 o RabbitMQ  Configuration  3 node cluster  cluster_partition_handling, autoheal  Monitoring  Split Brain check: • “rabbitmqctl eval '[N||{partitions,N}<-rabbit_mnesia:status()].’”  Port Check (5672, 25672)  Process Check • Beam.smp • Rabbitmq-server At least one node running(1/3) • {Openstack-RabbitMQ:grpsum["HostG- RabbitMQ","net.tcp.service[tcp,,25672]",last ,0].count(#3,0,"eq")}=3 • {OpenStack-RabbitMQ:grpsum["HostG- RabbitMQ","proc.num[beam.smp]",last,0].c ount(#3,0,"eq")}=3
  • 19. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 19 o MySQL  Configuration  4 Nodes + 1 Arbitrator  Monitoring  Cluster Check • wsrep_local_recv_queue • wsrep_local_send_queue • wsrep_flow_control_paused • wsrep_local_commits Arbitrator LB R/W
  • 20. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 20 o MySQL Cluster Master Disk Galera recv_queuesend_queue Commit Disk Replication OK Slave MySQL Client OK Wait until receive OK from replication
  • 21. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 21 o MySQL Cluster Freeze Master Disk Galera recv_queuesend_queue Commit Disk Replication OK Slave MySQL Client OK Wait until receive OK from replication 👿 • Disk Failure: 😀 (removed from cluster) • Disk Speed Throttling : 😢
  • 22. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 22
  • 23. DOCOMO, INC All Rights Reserved ○ Prohibit some self-healing actions  Do not reboot some OpenStack processes – neutron-plugin-openvswitch-agent  Do not reboot network nodes – loose network reachability (can’t recreate network namespace) Prohibited Actions while MySQL Cluster Freeze 23 Solved at Liberty? All the VMs loose connections
  • 24. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 24 o Throttling happens during DB backup  Limit Backup Node  Backup Method LB R/W Limit Backup Node LOCK TABLES FOR BACKUP (online) 1. Take from cluster (Donor/Desynced) 2. DB lock and do backup (FLUSH TABLES WITH READ LOCK) 3. Return to cluster (wsrep_desync=OFF) – wsrep_local_recv_queue – wsrep_local_commits
  • 25. Copyright©2015 NTT DOCOMO, INC. All rights reserved. Log Analytics Kibana
  • 26. DOCOMO, INC All Rights Reserved (1) detect critical system- failure We have to recover immediately (2) detect malicious access We need to notify users (3) detect no critical errors Better to be fixed as soon as possible (4) find errors/warnings that have no service impact We want to filter out next time Purpose of Log Analytics 26
  • 27. DOCOMO, INC All Rights Reserved ○ e.g.Logs of a day Total: 100 GB, 80M lines Sum of critical, error and warning logs: 200K lines The meaningful logs are more restrictive: (1) 0 critical failure (2) 0 malicious access (3) 6 non-critical failure (4) 6 ignorable failure 0% 0% 1% 30% 39% 30% Breakdown of Logs Critical Error Warning Info Debug Other Treasure Hunt in The Ocean of Logs 0% 24% 24%49% 3% HW OS OpenStack backend OpenStack Operation tools 27
  • 28. DOCOMO, INC All Rights Reserved ○ We analyze logs to enhance our black list and white list. ○ Logs found in our black list are sent to Zabbix. Log Analytics Based on White/Black List ----- ----- ----- Logs trash Zabbix Kibana ----- ----- ----- ----- expand expand reduce analyze… 28 add add black list white list
  • 29. DOCOMO, INC All Rights Reserved Log Server Network Node Control Node Compute Node How to Adopt Black/White List Using Fluentd Fluentd Elasticsearch zabbix_sender fluentd LB UTM • Add “ignorable” flag according to white list • Put metadata to create graphs from the logs rsyslog refer Zabbix alerts Kibana graph graph Notify Zabbix according to black list 29
  • 30. DOCOMO, INC All Rights Reserved Log Server How to Adopt Black/White List Using Fluentd Fluentd Elasticsearch zabbix_sender fluentd 1. syslog 10:01 crit: hardware failure path: syslog rsyslog api.log timestamp: 10:01 10:03 10:04 severity: crit warn ERROR item: - ids ignore source_ip: - x.x.x.x - message: hardware failure IDS: from x.x.x.x invalid request format 3. api.log 10:04 ERROR: invalid request format 2. rsyslog 10:03 warn: IDS: from x.x.x.x Zabbix hardware failure Kibana IDS graph crit graph refer 30
  • 31. DOCOMO, INC All Rights Reserved Example of Our White List # with Juno • Count response codes and understand the trend. That’s enough. ^keystonemiddleware.auth_token [-] Unable to find authentication token in headers$ • This ERROR means user’s operation was denied due to quota. • It has no impact to our system. Should be INFO log? ^nova.api.openstack [[^]]*] Caught error: VolumeSizeExceedsAvailableQuota: Requested volume or snapshot exceeds allowed Gigabytes quota..*$ • This WARNING is caused by presence of SHUTOFF instances. • It is commonplace condition. Need to be ignored. ^nova.scheduler.host_manager [[^]]+] Host has more disk space than database expected .*$ 31 1 2 3
  • 32. DOCOMO, INC All Rights Reserved ○ We succeeded in reducing logs to be analyzed.  In other words, so many meaningless logs have high log-levels. Effect of Our White List Without White List: 160K With White List: 37 reduce 99.98% 32 Today We can analyze all logs in 2- 3 hours a day! 1 year ago We couldn’t analyze all logs in a day
  • 33. DOCOMO, INC All Rights Reserved Example of Our Black List • This message indicates disk problem on Compute node. ^kernel: [[^]]*] XXXXX.*hardware failure.$ • Corosync needs cleanup its resources. ^pengine: warning: unpack_rsc_op: Processing failed op monitor for .*$ • Fullbackup of mysql failed once. ^mysql_fullbackup[d+]:sFailedstosMySQLsfullbacku p.*$ 33 Warning alert Information alert Information alert 1 2 3
  • 34. DOCOMO, INC All Rights Reserved Demonstration with Kibana ○ 3 dashboards  OpenStack  All Logs  Error Logs  Critical Logs  Warning Logs  IDS 34
  • 35. DOCOMO, INC All Rights Reserved Trademarks ○ Kibana is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. ○ Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. ○ logstash is a trademark of Elasticsearch BV. 35
  • 36. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 36 o Presentation - Operation  2015/10/27 4:40pm - 5:20pm Heian (New Takanawa) 「What are operators doing behind the Cloud?」 o Exhibition  NEC Booth(H4)  28(Wed.)10:45-13:00,16:30-18:30, 29(Thu.) 9:00-14:00  NTT Group Booth(S14)  28(Wed.) 13:15-16:15 「Touch and Feel! NTT DOCOMO’s Cloud Operation」 contact-cloudpf-ml@nttdocomo.com
  • 37. Copyright©2015 NTT DOCOMO, INC. All rights reserved. 37 NEC NTT
  • 38. Copyright©2015 NTT DOCOMO, INC. All rights reserved. ご清聴ありがとうございました。

Editor's Notes

  1. L3 13 L1 11 Openstack 4 Testing 3
  2. L3 13 L1 11 Openstack 4 Testing 3
  3. We have 4 purposes of log analytics. First, we have to detect critical system-failures, and to recover them immediately. We would be happy if logs could tell us system failures beforehand. Second, we need to detect malicious access. When users’ Floating IPs are accessed maliciously, we need to notify users. Third, we need to find out non-critical errors or warnings. Of course, they are better be fixed as soon as possible. Bugs might be found by those logs. 4th, we want to identify errors and warnings which have no service impact. We’d like to filter out them next time. We call them “ignolable logs”. ----- 日本語スクリプト(英訳用) 我々はログ分析に4つの目的を設定しています。 第一に、我々は深刻なシステム障害を示すメッセージの検出を目的とします。 我々は不審なメッセージを効率良く検出し、すばやく復旧しなければなりません。 ログがシステム障害を前もって教えてくれると嬉しい。 第二に、我々は外部からの攻撃を検出しなければなりません。 攻撃の傾向に注意して、もしユーザーシステムのFloating IPやグローバルルータが攻撃にさらされていれば、我々は注意喚起したほうがいいでしょう。 もしシステムが攻撃を受けていれば、我々は対策をとる必要があります。 第三に、我々は緊急性はない障害を示すログも検出する必要があります。 障害はできるだけはやく復旧する必要があります これらのログからバグが発見されるかもしれません。 第四に、我々はサービス影響のないエラーログ、ワーニングログを区別したいと考えます。 1〜3を容易に検出するために、これらを次回から分析対象から外します。 我々はこのようなログをignorable logsと呼びます。
  4. Large numbers of logs are logged by each components of system, such as hardware, linux kernel, OpenStack or operation tools. It’s so difficult for us to find out important rare messages from them. An example of a day, there were hundreds thousand logs of critical, error, and warning. Serious logs weren’t found in this day. There were only 6 non-critical error logs, and 6 ignorable logs. ----- 日本語スクリプト(英訳用) ログは、各レイヤ(たとえばHW、OS、OpenStackや運用ツール)から大量に出力されます。 ログの海の中から、希少で重要なメッセージを探し出すのは、非常に困難です。 一例として、ある日には、約200,000行のcritical, error, warningログがありました。 1,2 のような緊急性の高いログはこの日は検出されませんでした。 緊急性のないエラーログが6件検出され、今後無視して良いログが6件見つかりました。
  5. We analyze logs and add the result to our black list and whit list. Logs found in our black list are sent to Zabbix. Ignorable logs are filtered out with our white list. The rest are shown in Kibana. We operators analyze them. We add critical logs to the black list as well as ignorable to the white list. Kibana dashboard is very useful for our log analysis, so that the white list can keep growing. Logs to be analyzed have been quite reduced. ----- 日本語スクリプト(英訳用) 我々はこの課題にログ解析ツールKibanaを使ってアプローチしました。 左の図で見えるのが、前のページの方法です。 右の図が、我々のとった方式です。 Zabbix alertはblack listで絞り込まれます。 ignorable logsはwhite list方式でフィルタされます。 残りのログはelasticsearchに蓄積され、我々はKibanaを使ってそれを解析します。 我々は、解析の結果、深刻だと判明したメッセージを新たにブラックリストに追加します。 ignorableと判明したログを、新たにwhite listに追加します。 Kibana dashboardは便利なので、white listの充実が順調です。ログ分析もかなり減りました。
  6. Now, let me explain our architecture of log processing adopting black and white list. Fluentd on every node send logs to the log servers. Some devices which we cannot install Fluentd on, send logs to the logservers using rsyslog. Rules of the black and white list are containd in configurations of Fluentd. Fleuntd sends serious logs to Zabbix following the black list. Fluentd raises a flag to ignorable logs following the white list. Fluentd puts metadata to logs in order to create graphs from them. Then, logs are stored in elasticsearch. Kibana shows graphs by refering elasticsearch records. ----- 日本語スクリプト(英訳用) black list, white listをどうやって適用しているか説明します。 flulentdは、各ノードからログをlog serverへ集めます。fluentdのagentをインストールできない機器はrsyslog経由でlog serverへログを転送します。 fluentdには、black list, white listに相当するルールが設定されています。 fluentdはblack listに従い、深刻なログをZabbixへ通知します。 fluentdはwhite listに従い、ignorableなログにフラグを付与します。 fluentdは他にも、分析項目に従ってログデータに情報を抽出・付与します。 fluentdは整形したデータをelasticsearchへ投入します。 Kibanaはelasticsearchを参照して各グラフを表示します。
  7. You can see some simplified examples at the blue textboxes. First example indicates a hardware failure. This message is contained in our black list, so fluentd sends this log to Zabbix. An alert on Zabbix tell us the failure immediately. The second example is an IDS log. Fluentd extracts source IP address from the IDS message, and inserts “ids” value to the “item” key. Kibana makes graphs from these metadata. The third example indicates user’s operation error. Since this error doesn’t impact our system, we have already added the message to the white list. Fluentd inserts “ignore” value to the “item” key. Kibana filters out this log from all graphs. ----- 日本語スクリプト(英訳用) いくつかの簡略化した例を示します。 1つめは、syslogに見つけられる、ハードウェアの障害を示すログです。 hardware failureというメッセージはブラックリストに含まれるので、fluentdはこのログをZabbixへ通知します。 Zabbix alertは即時に我々に障害を教えてくれます。 2つめは、UTMがrsyslog経由で転送した、IDSのログです。 我々はIDSの1件1件を調べるより、傾向を分析したいと考えています。 fluentdは、IDSログからsource ipを抽出します。また、ids というitem名を付与します。 KibanaはelasticsearchからIDS情報を探してグラフを表示します。 3つめは、とあるapi.logに見つけられる、ユーザの操作誤りのエラーです。 この操作誤りはシステムに影響がないので、我々はこのログをwhite listに追加していました。 fluentdはignoreというitem名をこのログ情報に付与します。 Kibanaはignoreというitem名を持つログをグラフに表示しません。
  8. Let me show you some of our white list. For example, the first message indicates access without any token. Healthcheck accesses from load balancers can’t get tokens, so this WARNING continues at all times. We watch on the trend of response codes. We don’t need this log itself. Others are related to users’ operation or commonplace condition. (しゃべらない予定) The second message indicates that user’s request was denied due to the quota limitation. It has no impact to the system, but the log has ERROR level. Its response code or a notification on Horizon could tell user the cause of request failure. I think it should be an INFO log. The third message indicates, literally, hipervisor has more disk space than Nova database grasps. It occurs when instances of SHUTOFF status exist. This is commonplace condition. No WARNING would be needed. ----- 日本語スクリプト(英訳用) 続いて、我々の実際のwhite listの一部を紹介します。 1つめのメッセージは、tokenなしのAPI accessを示します。 LB healthcheckがtokenなしでアクセスするので、このWARNINGは継続的に発生します。 レスポンスコードの統計を示すグラフがKibanaで使えるため、我々はこのメッセージを全て分析する必要がありません。 2つめのメッセージは、ユーザのリクエストがクォータを超過していたため拒否されたことを示します。 システム自体には何も影響がありません。ユーザはレスポンスコードやHorizonのメッセージでリクエスト失敗の原因を知ることができるでしょう。 本来INFOであるべきメッセージと我々は考えます。 3つめのメッセージは、文字通り、DBの把握しているHVのディスク使用量よりも、実際のディスク使用量が少ないことを示します。 SHUTOFF statusのインスタンスが存在するときにこのWARNINGが発生します。 当然起こりうる条件です。WARNINGは不要です。
  9. We have enhanced our white list. As a result, we have been reducing logs to be analyzed. In other words, many meaningless logs of ERROR or WARNING bother OpenStack operators. As you can see in this two graphs of Kibana, our white list is very effective. 1 year ago, when we did dog-fooding, we couldn’t cover all logs. Now 2 or 3 hours are sufficient to analyze all logs. ----- 日本語スクリプト(英訳用) このようにしてwhite listを充実させた結果、我々は分析対象のログを大きく減らすことができました。 言い換えれば、多くの無意味なログが運用者を悩ませています。 Kibanaのグラフで見えるように、効果は一目瞭然です。 1年前、我々はすべてのログを網羅できませんでした。 現在では、2-3時間あればログ分析には十分です。
  10. Next, let me show you some of our black list. The first message indicates that there is disk problem on a Compute node. Fluentd send this log to Zabbix as Warning level. The second and third ones need Information alert. We operators find and fix them on weekday daytime. (しゃべらない予定) The second message indicates that corosync needs cleanup its resources. This condition itself does not impact to our service, thus Fluentd send this log to Zabbix as Information level instead of Warning level. We operators find this alert on weekday daytime, and clean up the corosync resources. This rule has helped us several times. The third message indicates failure of full backup of database. We shouldn’t worry about individual failure because backup occurs 4 times a day. Fluentd send this log to Zabbix as Information level. If this alerts continued, we would debug on it. ----- 日本語スクリプト(英訳用) では、我々のblack listの一部を紹介します。 1つめのメッセージはcompute nodeのディスク障害を示します。なお、製品情報がわからないように一部マスクされています。 単一ノードの障害となるため、fluentdはZabbixにWarning alertをあげるように通知します。 2つめのメッセージはcorosyncがリソースのクリーンアップを必要としていることを示します。 この状態自体はシステム影響がないため、fluentdはZabbixにInformation alertをあげるように通知します。 運用者は日勤帯にこのInformation alertに気づき、クリーンアップを実施します。 最後のメッセージはmysqlのfullbackup失敗を示します。バックアップは1日4回起こるので、1回の失敗は大きな問題となりません。 fluentdはZabbixにInformation alertをあげるように通知します。 このInformation alertが継続したとき、運用者は原因調査します。 black listのメッセージが深刻な障害を知らせてくれたことはまだありません。 なぜなら深刻な障害はそう何度も起きていないからです! corosyncのメッセージは数回役に立ちました。
  11. Well, let me demonstrate usage of Kibana. 6 dashboards are available on Kibana. We’ll show you 3 of them. This is a dashboard of “All logs”. You can put queries to filter logs. For example, this query filters out logs with ignorable flag. Let’s select ‘Toggle’ checkbox to enable this query. The logs of graphs below have been reduced. Raw logs are also available on Kibana, classified by their log levels. Let’s expand the CRITICAL LOGS panel. You would find raw message of the critical log. You can full-text-search from all logs. Let’s add a query to find logs containing “create failed”. The results have been appeared. This dashboard is very useful to grasp overview. We prepared dashboards to provide further analysis. This is a dashboard of “critical logs”. Let’s take a look on a day in September. Since around 18 o’clock, critical logs have increased and continued. This graph tells us that neutron dhcp-agent.log have increased at that time. And this graph also indicates that many critical logs appeared in neutron. I’ll try to narrow down to neutron logs. Now Neutron has been proved to be in some failure. Raw logs would help analyzing what cause is. This dashboard shows analysis of OpenStack access. This graph color-codes API accesses of each services. You can see details like this. This shows trend of response codes, classified into normal, authentication failure, invalid request, and system error. Later, I’ll analyze about this system error. This is, important, list of users who failed to login to Horizon. The user failed dozens times, so he may be taken over his account. We’d better to contact him. Now I’ll analyze the system error. Let’s narrow down the logs to error response. You can find detail of the access log. Adding filter with request id, you can see logs related to this access. oh, I’ve found an ERROR. ----- 日本語スクリプト(英訳用) さて、これからKibanaを使ったログ分析のデモを行います。 Kibanaでは6つのdashboardが使えます。 デモには4つのdashboardが登場します。
  12. みんなで共有
  13. We found 3 inconveniences on OpenStack logs. We hope they could be improved and OpenStack operators would be happy. I’ll explain them at the following pages. ----- 日本語スクリプト(英訳用) これらのアーキテクチャの実現のために、OpenStackのログに3つの不便な点を感じました。 OpenStackのログがもっと良くなり、opsがハッピーになればいいなと思います。 これらについて次のページから説明します。
  14. Firstly, We’ve been bothered with many meaningless WARNING or ERROR logs. We hope if they could be reduced. Especially, we don’t need WARNINGs or ERRORs caused by users’ operation. INFO level is enough for them. ----- 日本語スクリプト(英訳用) まず、 Logging Guidelinesに従ってもらえたら、と思います。 Logging Guidelinesはlog-levelsを定義しています。 Warning はシステムの問題を示す。 Errorは運用者が調査すべき問題を示す。 Criticalはシステムが今にも壊れそうな問題を示す。 不必要に高いログレベルは求められません。 特に、ユーザ操作の誤りによるWARNINGやERRORは要りません。INFOで十分です。
  15. This might be uncommon point of view. We hope the end of TRACE logs are defined. Log analysis tools, such as fluentd and logstash, can treat multiline log as 1 block by knowing the end of a block. For example, an ERROR is detected with first line. We have to search for the 3rd line to know its cause. If Fluentd knew that the 4th line is the end of these TRACE logs, Fluentd could treat these 4 logs as 1 block. Then, we can make rule to ignore log blocks containing HTTPBadRequest on white list. Today Fluentd can’t know that the 4th line is the end. Fluentd have to wait for the "INFO" of the fifth line in order to treat TRACE logs as a block. It may take several minutes. ----- 日本語スクリプト(英訳用) これは少しマニアックな視点かもしれません。 TRACEログの終わりを決めていただけたらと思います。 fluentdのようなログ分析ツールは、終端がわかっていれば複数行をひとまとめに扱うことができます。 ERRORの原因を知るにはTRACEブロックを調査する必要が有ります たとえば、あるエラーが1行目で検出されます。 原因を知るために、3行目を探す必要があります。 4行目が終端だとわかっていれば、fluentdは1〜4行目を1ブロックとして分析できます。 1行目の原因がHTTPBadRequestならignorableである、といったルールをwhite listに定義することもできます。 今は、4行目が終端だとわからないので、TRACEログをブロックとして扱うには、8分後にINFOが現れるまで待つ必要があります。
  16. In the same reason, we needed to remove new line from log messages. Most of OpenStack logs have the following format. But log containing new line is divided into several lines. the next lines don’t follow the log format. I think such log is better to be shown in 1 line. I’ll welcome your opinions after this session. I would appreciate it if you improve OpenStack logs. ----- 日本語スクリプト(英訳用) こちらは意見が分かれるかもしれません。 我々はログメッセージ中の改行に悩まされました。 OpenStackのログはは赤字で示されたフォーマットに従っているはずです。 途中で改行された(line break)メッセージは、2行目からフォーマットに反します。1行に集約されるべきではないでしょうか。 このセッションの後でご意見を歓迎します。 developpersのみなさまが改善してくださるとうれしいです。
  17. I’ll get back to slides. Our log analytics helps us to detect some problems. You can see the graph which told us a problem of Swift node. Swift processes have fallen into infinite recursion several times. In relation to this problem, a large number of error were logged by Python rapidly. We could realize that object-replicator was in trouble right away. We restarted the object-replicator to recover it before users were troubled or diskfull caused. ----- 日本語スクリプト(英訳用) ログ分析はいくつかの問題の検出に役立ちました。 Swift nodeの問題を示すグラフがこちらです。 swiftのprocessが無限recursionに陥ったことが数回ありました。 pythonが問題を検出してerr logを大量に出力しました。 問題のprocessはobject-replicatorだとすぐわかりました。 ユーザが困る前に、そしてこのノードがディスクフルになる前に、我々はobject-replicatorをrestartし、解消できました。
  18. We have detected other problems, but unfortunately, we forgot to preserve those screenshots. Let me describe some of them without graphs. Firstly, a problem on a Network node. We found continuous error logs, so we analyzed them. Namespace on the Network node had problem. Users’ virtual routers fell into incommunicable status. We rebooted this node to recover. Later, we upgraded kernel to prevent recurrence. This message is included in our black list, just in case. ----- 日本語スクリプト(英訳用) 我々はほかにもいくつかの問題を検出しましたが、残念なことに、スクリーンショットがありません。 簡単に紹介させてください。 1つはNetwork nodeの問題です。 継続的なerr logsから検出しました。 namespaceの問題がユーザの仮想ルータを通信不能な状態にしていました。 我々はこのノードをrebootして、問題を解消しました。カーネルをアップグレードし原因を取り除きました。 念のため、このメッセージは今もブラックリストに含まれています。 もう1つはコンピュートノードの問題です。 大量のwarning logsがOperators’ roomのKibanaに突然現れました。 コンピュートノードはドライバーの問題で死ぬところでした。 我々は前もって障害を検出することができました。 幸いにもサービス初期で、ユーザのインスタンスが存在しなかったので、我々は影響を抑えるためにそのノードをdisableにするだけで済みました。
  19. 1 years ago, we intended to use Zabbix as a log analysis interface. To begin with, we tried sending all CRITICAL and ERROR logs to Zabbix. We extracted ignorable logs from alert list on Zabbix and add them to our white list one by one. We expected alerts to be reduced, as the white list became rich. You can see the result. There were too many alertsto handle. DB of Zabbix filled with alerts got slow. Our issue was optimization of log analytics. ----- 日本語スクリプト(英訳用) 1年前、我々はログ解析のIFにZabbixを使うつもりでした。 まず我々は、CRITICAL, ERRORログを試しにZabbixへ送ってみました。 Zabbixが見せるevent一覧の中からignorable logsを抽出して、white listに追加していきました。 white listが充実すれば、Zabbixのeventが減っていくと期待しました。 やってみた結果は、ご覧の通りです。 大量のアラートが発生し、我々はすべてを処理しきれませんでした。 ZabbixのDBは大量のアラートでいっぱいになり、応答が遅くなりました。 私の課題はログ解析の効率化でした。
  20. We coped with this issue using log analysis tool, Kibana. You can see in the left figure the way of the previous page. We took the way of the right figure. Zabbix alerts are limited by our black list. Ignorable logs are filtered our with our white list. The rest are stored in elasticsearch. We operators analyze them using Kibana. We add messages found to be serious to the black list as well as ignorable to the white list. Kibana dashboard is very useful, so that the white list can keep growing. Logs to be analyzed have been quite reduced. ----- 日本語スクリプト(英訳用) 我々はこの課題にログ解析ツールKibanaを使ってアプローチしました。 左の図で見えるのが、前のページの方法です。 右の図が、我々のとった方式です。 Zabbix alertはblack listで絞り込まれます。 ignorable logsはwhite list方式でフィルタされます。 残りのログはelasticsearchに蓄積され、我々はKibanaを使ってそれを解析します。 我々は、解析の結果、深刻だと判明したメッセージを新たにブラックリストに追加します。 ignorableと判明したログを、新たにwhite listに追加します。 Kibana dashboardは便利なので、white listの充実が順調です。ログ分析もかなり減りました。