Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OpenContrail Cloudwatt Feedback

Cloudwatt feedback to deploy OpenContrail on their production public cloud. Especially discuss the live upgrade between release 1.06 and 1.10

  • Be the first to comment

OpenContrail Cloudwatt Feedback

  1. 1. OpenContrail deployment experience at Cloudwatt
  2. 2. About me ● Network engineer since 2006 ● Working on OpenStack since the beginning 2010 ● Working on OpenContrail since a year as a developer and integrator
  3. 3. Cloudwatt IaaS ● French public cloud provider ● 3 years experience with OpenStack ● 1 year experience with OpenContrail ○ 1 data center ■ 200 compute nodes ■ 3 peta of raw swift storage ○ OpenStack IceHouse release
  4. 4. Contrail in Cloudwatt ● Started with Contrail release 1.06 in June 2014 ● Run onto a Cisco Nexus fabricpath ● Terminate l2vpn tunnel with two Juniper MX
  5. 5. Contrail in Cloudwatt
  6. 6. Contrail logical view Config Neutron API Analytics Control IF-MAP vrouter vrouter vrouter
  7. 7. Contrail in Cloudwatt ● 2 Neutron API: neutron server with Contrail plugin ● 2 config nodes: discovery, API, SVC monitor, schema, IF-MAP server ● 2 control nodes ● 2 analytics nodes ● 2 webUI nodes
  8. 8. Contrail in Cloudwatt Config Config Neutron API Neutron API Analytics Analytics Control Control vrouter vrouter vrouter IF-MAP IF-MAP WebUI WebUI XMPP
  9. 9. Contrail in Cloudwatt ● Load balancing front of APIs and WebUI ● 2 Cassandra clusters of 3 nodes each ● RabbitMQ cluster of 2 nodes ● Cluster Zookeeper compose of 3 nodes
  10. 10. Contrail in Cloudwatt Config Config Neutron API Neutron API Analytics Analytics Control Control vrouter vrouter vrouter IF-MAP XMPP Cassandra Cassandra AMQP + ZK IF-MAP WebUI WebUI
  11. 11. Issue on 1.06 ● Difficulty to operate it and upgrade/maintain it without down time ● Stabilize/compatibility Neutron to Contrail translator API ● Analytics does not work ● Some memories leak on the compute node
  12. 12. Upgrade to 1.10 ● After nine month with 1.06 ● New version to fix issues and bring new features (SNAT/LBaaS) ● Following the upstream
  13. 13. Upgrade to 1.10 Create a tool to monitor the contrail cluster status
  14. 14. Upgrade to 1.10 We deviced to do it in 2 steps: 1. Control plane (in a night) ○ Config (slave schema before) ○ Control ○ Analytics ○ WebUI ○ Neutron API
  15. 15. Upgrade to 1.10 2. Data plane (during few days) ○ upgrade/bootstrap spare compute node in 1.10 and add them in the available compute pools ○ remove all running 1.06 compute nodes to the available pool ○ let a time slot to clients on that 1.06 nodes to move their VM before upgrade that node to 1.10 (no live migration) ○ then open champagne bottles!
  16. 16. Bug met during the upgrade ● vrouter 1.06 cannot live with 1.10 with MPLSoUDP encapsulation => pass to MPLSoGRE during the cohabitation ● SNAT/LBaaS stuff does not take care of the vrouter version ● Slow all the contrail API due to the move of the Neutron Contrail plugin code from neutron-server to Contrail API ● Zookeeper timeout
  17. 17. Bug met after upgrade ● Data kernel module path memory leak ● Data kernel module path hold flows count leak (workaround: restart the vrouter agent) ● 13 Cloudwatt patches added to the 1.10 upstream release: https://review.opencontrail.org/#/q/status: open+branch:R1.10,n,z
  18. 18. Bug still persist on 1.10 ● Schema slave->master ~20 mins ● Logging stuff configuration ● Some 5xx error still appears on the Contrail API ● Live upgrade a compute node without downtime (do we need it?)
  19. 19. My wishlist to Santa SDN ● That people use more https://blueprints. launchpad.net/opencontrail ● Stable master before pulling new branch ● Use http://semver.org to number releases ● The Contrail team to be more community oriented
  20. 20. 2015S2 todo ● Improve Neutron Contrail plugin code https://review.opencontrail.org/10123 ● Upgrade to 2.x branch ● Build a CI/CD on master ○ build and deploy daily ○ run opencontrail sanity ○ run functional no-reg ○ run performance no-reg ● OpenStack L3VPN integration
  21. 21. Questions ?

×