3. Who are we?
Responsibility
- Develop/Maintain Common/Fundamental Function for Private Cloud (IaaS)
- Consider/Think of Optimization for Whole Private Cloud
Network Service Operation PlatformStorage
Software
- IaaS (OpenStack + α)
- Kubernetes
Knowledge
- Software
- Network, Virtualization, Linux
7. Difficulty of building OpenStack Cloud
TOR
Core
Aggregation
ToR
Aggregation
ToR
Hypervisor
Hypervisor
Hypervisor
Hypervisor
Hypervisor
Hypervisor
Hypervisor
Hypervisor
Aggregation
ToR
OpenStack
database
OpenStack
database
OpenStack
API
OpenStack
API
Core
Aggregation
Datacenter
Rack
● Knowledge of Networking
○ Design/Plan whole DC Network
● Knowledge of Operation for Large Product
○ Build Operation Tool which is not for
specific software
○ Consider User Support
● Knowledge of Server Kitting
○ Communicate procurement department
● Knowledge of OpenStack Software
○ Design deployment of OpenStack
○ Deploy OpenStack
○ Customize OpenStack
○ Troubleshooting
■ OpenStack Component
■ Related Software
8. Building OpenStack is not completed in one team
Network Operation Platform
● Maintain
○ Golden VM Image
○ ElasticSearch for logging
○ Prometheus for alerting
● Develop Operation Tools
● User Support
● Buy New Servers
● Design/Planning
○ DC Network
○ Inter-DC Network
● Implement Network Orchestrator
(Outside OpenStack)
● Design OpenStack Deployment
● Deploy OpenStack
● Customize OpenStack
● Troubleshooting
Member: 3+ Member: 4+ Member: 4+
9. Challenge of OpenStack
Basically We are trying to make OpenStack(IaaS) stable
What we have done
1. Legacy System Integration
2. Bring New Network Architecture into OpenStack Network
3. Maintain Customization for OSS while keep to catch up upstream
What we will do
1. Scale Emulation Environment
2. Internal Communication Visualizing/Tuning
3. Containerize OpenStack
4. Event Hub as a Platform
10. Challenge of OpenStack
Basically We are trying to make OpenStack(IaaS) stable
What we have done
1. Legacy System Integration
2. Bring New Network Architecture into OpenStack Network
3. Maintain Customization for OSS while keep to catch up upstream
What we will do
1. Scale Emulation Environment
2. Internal Communication Visualizing/Tuning
3. Containerize OpenStack
4. Event Hub as a Platform
11. Configuration Management
Challenge 1: Integration with Legacy System
Even before cloud, We have many Company-wide Systems for some purpose
CMDB
Monitoring System
Server Login
Authority Management
IPDB
Server
Register Spec, OS, Location..
Register IP address, Hostname
Register server as a monitoring target
Register acceptable user of server
setup
Ask for new server
Infra Dev
12. Challenge 1: Integration with Legacy System
After private cloud, “Server Creation” is completed without Infrastructure
department interruption. Thus Private Cloud itself should register new server
Private Cloud
Configuration Management
CMDB
Monitoring System
Server Login
Authority Management
IPDB
Server
Create new server
Dev
Register
13. Challenge 2: New Network Architecture in our DC
For scalability, operatabilty.
We introduce CLOS Network Architecture and terminate L3 on Hypervisor.
Previous New
14. Challenge 2: Support new architecture in OpenStack
Network Controller
(Neutron)
neutron-server
neutron-dhcp-agent
neutron-linuxbridge-agent
OSS implementation
neutron-metadata-agent
Expect to share L2 Network
We want all vms not to share l2 network
neutron-custom-agent
Replace
New
15. Challenge 3: Improve Customization for OSS
● We have customized many OpenStack Components
○ Perf
● Previously we just customize it after customize again and again
OpenStack
VM
(Nova)
Image
Store
(Glance)
Network
Controller
(Neutron)
Identify
(Keystone)
DNS
Controller
(Designate)
VM
(Nova)
customize commit for A
customize commit for C
customize commit for A
customize commit for B
customize commit for AIt’s difficult for us to take specific patch away from
our customized OpenStack.
Specific version
upstreamLINE version
forked
16. Challenge 3: Improve Customization for OSS
VM
(Nova)
customize commit for A
customize commit for C
customize commit for A
customize commit for B
customize commit for A
Specific version
upstreamLINE version
forked
patch for A
patch for B
patch for C
Base Commit ID
VM
(Nova)
Specific version
maintain by git
maintain by git
● Don’t fork/Stop to fork
● Just maintain only patch file in git
=> easily take patch out than before
17. Challenge will be different from Day1 to Day2
Day1 (So far)
● Develop user faced feature
○ Keep same experience as before
(legacy system)
○ Support new architecture
● Daily operation
○ Predictable
○ Unpredictable based on trouble
Day2 (from now)
● Enhance Operation
● Optimize Development
● Reduce daily operation
○ Predictable
○ Unpredictable
18. Challenge of OpenStack
Basically We are trying to make OpenStack(IaaS) stable
What we have done
1. Legacy System Integration
2. Bring New Network Architecture into OpenStack Network
3. Maintain Customization for OSS while keep to catch up upstream
What we will do
1. Scale Emulation Environment
2. Internal Communication Visualizing/Tuning
3. Containerize OpenStack
4. Event Hub as a Platform
19. Future Challenge 1: Scale Emulation Environment
導入時期 2016年
Version Mitaka + Customization
クラスタ数 4+1 (WIP: Semi Public Cloud)
Hypervisor数 1100+
● Dev Cluster: 400
● Prod Cluster: 600 (region 1)
● Prod Cluster: 76 (region 2)
● Prod Cluster: 80 (region 3)
VM数 26000+
● Dev Cluster: 15503
● Prod Cluster: 8870 (region 1)
● Prod Cluster: 335 (region 2)
● Prod Cluster: 229 (region 3)
The number of hypervisor is continuously
increased
We faced the situation
- Timing/Scale related error
- Some operation took long time
!
20. We need environment to simulate scale from following point of view without
preparing same number of Hypervisor
● Database Access
● RPC over RabbitMQ
Future Challenge 1: Scale Emulation Environment
They are control plane specific load.
We can use this environment for tuning of control plane in OpenStack
21. ● Implement Fake Agent
(nova-compute)
(neutron-agent)
● Use container instead
of actual HV
Future Challenge 1: Scale Emulation Environment
Hypervisor
(nova-compute, neutron-agent)
Controle Plane
Controle Plane
Controle Plane
600 HV
Orchestrate/Manage
Real Environment Scale Environment
Controle Plane
Controle Plane
Controle Plane
● Use same env
600 fake-HV
Server
Fake HV (docker container)
(nova-compute, neutron-agent)
Hypervisor
(nova-compute, neutron-agent)Hypervisor (HV)
(nova-compute, neutron-agent)
Fake HV (docker container)
(nova-compute, neutron-agent)
22. ● Implement Fake Agent
(nova-compute)
(neutron-agent)
● Use container instead
of actual HV
Future Challenge 1: Scale Emulation Environment
Hypervisor
(nova-compute, neutron-agent)
Controle Plane
Controle Plane
Controle Plane
600 HV
Orchestrate/Manage
Real Environment Scale Environment
Controle Plane
Controle Plane
Controle Plane
● Use same env
600 fake-HV
Server
Fake HV (docker container)
(nova-compute, neutron-agent)
Hypervisor
(nova-compute, neutron-agent)Hypervisor (HV)
(nova-compute, neutron-agent)
Fake HV (docker container)
(nova-compute, neutron-agent)Easy to add new Fake HV
=> We can emulate any number of scale
23. Future Challenge 2: Communication Visualizing
There are 2 types of communication among OpenStack each software
Authentication
(Keystone)
VM
(Nova)
Network
(Neutron)
Microservice
● Restful API
(between component)
● RPC over Messaging Bus
(inside component)
Restful API
Restful API
Restful API
neutron-agent
neutron-server
RPC
24. Future Challenge 2: Communication Visualizing
Authentication
(Keystone)
VM
(Nova)
Network
(Neutron)
Microservice
Restful API
Restful API
Restful API
neutron-agent
neutron-server
RPC
Anytime this can be broken
Communication can be failed.
- Because of scale
- Because of in-proper config
Error sometimes got
propagated from one to other
25. Future Challenge 2: Communication Visualizing
Authentication
(Keystone)
VM
(Nova)
Network
(Neutron)
Microservice
Restful API
Restful API
Restful API
neutron-agent
neutron-server
RPC
Anytime this can be broken
Communication can be failed.
- Because of scale
- Because of in-proper config
Error sometimes got
propagated from one to other
1. Very difficult to troubleshoot this kind of issue because of
- Error got propagated from one to another
- Log is not always enough information
- Log is only shown when something happen
2. Sometimes problem can be predicted by some metrics
- how many rpc got received
- how many rpc waited for reply
26. Future Challenge 2: Communication Visualizing
Authentication
(Keystone)
VM
(Nova)
Network
(Neutron)
Microservice
Restful API
Restful API
Restful API
neutron-agent
neutron-server
RPC
Monitoring tool
Monitor Communication
related metrics
27. Future Challenge 3: Containerize OpenStack
Motivation/Current Pain Point
● Complexity of packaging tool like RPM
○ Dependency between packages
○ Configuration for new file
=> We need to build RPM everytime we changed the code
● Impossible to run different version of OpenStack on same server
○ Dependency of common library of OpenStack
=> we actually deployed much more control plane servers than we actually need
● Lack of observability for all softwares running on control plane
○ No way to identify which part is to install depended library and which part is to install our
software in deployment script (ansible, chef…)
○ Deployment script doesn’t take care software running after deployed
○ We can not notice if some developer run something temporally script
28. Future Challenge 3: Containerize OpenStack
Server Server Server
Ansible
PlaybookAnsible
PlaybookAnsible
Playbook
Install library
Install software
Start software
K8s manifest
K8s manifest
nova-api
neutron-server
common-library
RPM
Server
nova-api
neutron-server
common-library
Docker
Registry
Get package
Server Server Server
nova-api container
nova-api
common-library
nova-api container
nova-api
common-library
Install software
Start software
29. Future Challenge 4: EventHub for All Component
OpenStack
VM
(Nova)
Image
Store
(Glance)
Network
Controller
(Neutron)
Identify
(Keystone)
DNS
Controller
(Designate)
Loadbalancer
L4LB L7LB
Kubernetes
(Rancher)
Storage
Block
Storage
(Ceph)
Object
Storage
(Ceph)
Database
Search/Analytics
Engine
(ElasticSearch)
RDBMS
(Mysql)
KVS
(Redis)
Messaging
(Kafka)
Function
(Knative)
Baremetal
Operation Tools
30. Future Challenge 4: EventHub for All Component
OpenStack
VM
(Nova)
Image
Store
(Glance)
Network
Controller
(Neutron)
Identify
(Keystone)
DNS
Controller
(Designate)
Loadbalancer
L4LB L7LB
Kubernetes
(Rancher)
Storage
Block
Storage
(Ceph)
Object
Storage
(Ceph)
Database
Search/Analytics
Engine
(ElasticSearch)
RDBMS
(Mysql)
KVS
(Redis)
Messaging
(Kafka)
Function
(Knative)
Baremetal
Operation Tools
Depending on others
Some component/operation script want to do something
When User(actually project) in Keystone is deleted
When VM is created
When RealServer is added to Loadbalancer
31. Pub/Sub Concept in Microservice Architecture
Authentication
Component
VM
Component
Publish important event of
own component
Subscribe just interested
events
Network
Component
This component can do
something when interested event
happenedThis component don’t have to
consider who this component
need to work with
Messaging bus
(RabbitMQ)
32. Pub/Sub Concept in Microservice Architecture
Authentication
Component
VM
Component
Publish important event of
own component
Subscribe just interested
events
Network
Component
This component can do
something when interested event
happenedThis component don’t have to
consider who this component
need to work with
Messaging bus
This mechanism allow us to extend Private Cloud
(Microservice) without changing existing code for future
33. Future Challenge 4: EventHub for All Component
This part of notification logic has been already implemented in OpenStack but...
Authentication
Component
(Keystone)
Messaging bus
(RabbitMQ)
VM
Component
(Nova)
Operation ScriptA
Operation ScriptB
L7LB
Kubernetes
Publish Event Subscribe Event
Logic for access rabbitmq
Logic for access rabbitmq
Logic for access rabbitmq
Logic for access rabbitmq
Business logic
Business logic
Business logic
Business logic
34. Future Challenge 4: EventHub for All Component
This part of notification logic has been already implemented in OpenStack but...
Authentication
Component
(Keystone)
Messaging bus
(RabbitMQ)
VM
Component
(Nova)
Operation ScriptA
Operation ScriptB
L7LB
Kubernetes
Publish Event Subscribe Event
Logic for access rabbitmq
Logic for access rabbitmq
Logic for access rabbitmq
Logic for access rabbitmq
Business logic
Business logic
Business logic
Business logic
● Sometimes Logic for access rabbitmq code got bigger
than actual business logic
● All of components/script need to implement that logic first
35. Future Challenge 4: EventHub for All Component
We are currently developing new component which allow us to register program
with interested event. It will make more easy to co-work with other component
Authentication
Component
(Keystone)
Messaging bus
(RabbitMQ)
VM
Component
(Nova)
Operation ScriptA
Operation ScriptB
L7LB
Kubernetes
Publish Event
Logic for access rabbitmq
Business logic
Business logic
Business logic
Business logic
Subscribe Event
Business logic
Business logic
Business logic
Function as a Service
New
36. For more future: IaaS to PaaS, CaaS….
We are currently trying to introduce additional abstraction layer above from IaaS
● https://engineering.linecorp.com/ja/blog/japan-container-days-v18-12-report/
● https://www.slideshare.net/linecorp/lines-private-cloud-meet-cloud-native-world
38. Many Container Related Project started in LINE
Published
● https://www.slideshare.net/linecorp/parallel-selenium-test-with-docker
● https://www.slideshare.net/linecorp/test-in-dockerized-system-architecture-of-line-now-line-now-docker
● https://www.slideshare.net/linecorp/local-development-environment-for-micro-services-with-docker
● https://www.slideshare.net/linecorp/clova-92916456 (Japanese Only)
Undergoing Project
39. Currently Application Engineer maintain it...
VM
Kubernetes Kubernetes
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Developers A in Japan Developers B in Taiwan
Private Cloud
Private Cloud Developers
Responsibility border
Application Developer
OS
VM
OS
VM
OS
BM
OS
BM
OS
BM
OS
IaaS
40. Private Cloud Developers
Operating Knowledge is distributed
VM
Kubernetes Kubernetes
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Developers A in Japan Developers B in Taiwan
Private Cloud
Responsibility border
Application Developer
OS
VM
OS
VM
OS
BM
OS
BM
OS
BM
OS
knowledge knowledge
● Lack of mechanism to share
knowledge between them
● Quality will be uneven
● New team start from beginner
IaaS
Problem
41. Time to extend our responsibility from IaaS to
Private Cloud Developers
VM
Kubernetes Kubernetes
Container
Container
Container
Container
Container
Container
Container
Container
Developers A in Japan Developers B in Taiwan
Private Cloud
Responsibility border
Application Developer
OS
VM
OS
VM
OS
BM
OS
BM
OS
BM
OS
knowledge knowledge
knowledge
IaaS
KaaS