Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Contrail at AllegroGroup

How Contrail was implemented at AllegroGroup

  • Be the first to comment

Contrail at AllegroGroup

  1. 1. Contrail at AllegroGroup Plan / Prepare / Production (3P)
  2. 2. History ➢ What we had: ○ Two separate cloud environments (Essex And Havana) ○ Floating IP in Essex and Flat Network with VLANS in Havana ○ Network complicity in Havana ○ Network performance problems in Havana
  3. 3. Goals (Plan) ➢ Ease to maintain and growth ➢ Network simplicity ➢ Network isolation for tenants ➢ Floating IP and flat network ➢ New region in new DC
  4. 4. Fabric (Prepare - 2 weeks) ➢ Easy and fast deployment (couple of corrections in fabric scripts), we used 1.20 version at that time ➢ Environment ready for test (adding new HV from “any” location of server room) ➢ Basic performance tests, LBaaS Where to find quick implementation tools: http://www.opencontrail.org/opencontrail-quick-start-guide/
  5. 5. Our way ➢ Own puppet manifests based on available ones ➢ Reasons: ○ Existing infrastructure ○ Customized deployment ○ More work at the beginning, less problems later ○ Easy procedure to add hosts (compute nodes, controllers) ○ Building new region in near future
  6. 6. Implementation ➢ We had everything prepared for version 1.20, and then we get 2.01 production version ( what to do ?! ) ➢ Environment deploying (OpenStack with Contrail, One Region- 2 CC; 3 CoC; 50 HV), during DC migration amount of computes increasing - target 250 ➢ Move tenants/users/quota from old environment to new ○ we used keystone server builded from scratch and did upgrade then pump data to Icehouse/Contrail (issue missing users), qouta were migrated as SQL tables, exporter.py (script to marge users/tenants/qoutas) between regions (target for future - one keystone)
  7. 7. Implementation ➢ DNS - we are using Designate with two handlers (one for Floating IPs, second one for Fixed routable IPs) ➢ Required Image modifications (target ansible automation build) ➢ Two days before production we did update to latest available packages from release 2.01 ➢ Breaking environment ➢ Clients at new environment
  8. 8. Results ➢ 500 VMs spawned simultaneous ➢ Network performance
  9. 9. Problems ➢ cassandra (we increased number of nodes 3 => 5), configuration tuning (TTLs in contrail-collector.conf), compaction throughput, migration of cassandra data to raid0 SSD disks ➢ OpenFiles issue (user, supervisor, init) ➢ Collector was flood by data from computes iptables -A OUTPUT -p tcp --dport 8086 -m string --algo bm -- string "flowuuid" -j DROP
  10. 10. Problems ➢ When 500K flow is not enough (vr_flow_entries, vr_oflow_entries) ➢ Flow on Hold issue ➢ Vrouter CPU consumption to high compare to VM (TBB_THREAD_COUNT /etc/contrail/supervisord_vrouter.conf)
  11. 11. Problems ➢ Rebuild instance - interface was deleted after VM was respawn ➢ Lack of support for ironic - we will build region for ironic ➢ Disabled Tenant - Not able to login to Contrail UI (keystone 2.0) ➢ Tuning configuration files required ➢ Metadata packages not sent in one session ➢ RBAC for contrail UI
  12. 12. Environment expansion and further plans ➢ 1350 VMs on 150 HV in one DC at this moment ➢ Second region on it’s way ➢ 250-300 HV per region ➢ Migration from Essex and Havana ➢ OpenStack and Contrail upgrades
  13. 13. Q/A?
  14. 14. Thank you!
  15. 15. Check us: allegrotech.io Join us: kariera.allegro.pl Twitter: allegrotechblog e-commerce full of technology

×