Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Atlassian's Build Engineering Team Has Scaled to 150k Builds Per Month and Beyond – PuppetConf 2015

4,224 views

Published on

Continuous integration is the lifeblood of any software house and extremely important in a fast-growing organisation like Atlassian. You'll hear about how the build engineering team have scaled their team, infrastructure and Bamboo over their four-year journey of continuous improvement to provide a build platform and services used internally within the organization. You'll hear about how the team has grown from three engineers servicing 300 Atlassians to 12 engineers handling over 1300 Atlassians, handling challenges such as balancing firefighting and project work. You'll hear how we've come from infrastructure that was a group of pets, to cattle, then to stateless machines; how we manage our internal Bamboo instances, balancing dogfooding new milestones and providing a critical service to the organization.

Published in: Technology

How Atlassian's Build Engineering Team Has Scaled to 150k Builds Per Month and Beyond – PuppetConf 2015

  1. 1. PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV Build Engineering @ Atlassian: Scaling to 150k builds per month & beyond PuppetConf 2015
  2. 2. TEAM INTRODUCTION INFRASTRUCTURE BAMBOO SERVERS Introduction CONCLUSION
  3. 3. Build platform & services used internally within Atlassian to build, test & deliver software
  4. 4. Developers expect a reliable infrastructure & fast CI feedback
  5. 5. • 12 Bamboo Servers • maven.atlassian.com / 9 Nexus instances / 9 TB • 7 Nexus proxies for internal traffic • Monitoring • opsview, graphite, statsd, newrelic, datadog Build Engineering today @ Atlassian • 1200 build agents on EC2 • include SCM clients, JDKs, JVM build tools, databases, headless browser testing, Python builds, NodeJS, installers & more • Maintain 20 AMIs of various build configurations
  6. 6. 4 years ago: Builds per month 21k
  7. 7. Last month: Builds per month 186k
  8. 8. Build Engineering @ Atlassian
  9. 9. JIRA alone has Automated tests 49k
  10. 10. 3 stories of gaining maturity to handle Atlassian growth
  11. 11. INTRODUCTION TEAM INFRASTRUCTURE BAMBOO SERVERS Team CONCLUSION
  12. 12. History of team roles
  13. 13. Individual Engineers Information silos Fault investigation, requests for advice, unplanned work Little project work Very interrupt driven Duplication of effort Limited to customer driven changes
  14. 14. Disturbed role Knowledge Transfer when switching between project / disturbed roles is difficult More project work Non-disturbed can focus on larger tasks Context switching Reduction in duplication of effort, promotes collaboration within the team 2 week rotation
  15. 15. Team expands Build Engineers
  16. 16. Team expands Build Engineers
  17. 17. Team expands Infra Engineers Developers Build Engineers
  18. 18. Disturbed for Dev & Infra Too interrupt driven To encourage knowledge transfer between infra & dev Staggered changeovers Minimising disruption due to context switching Disturbed pairing Couldn’t handle smaller customer raised requests & interrupt driven work
  19. 19. Supporting Developers team channel
  20. 20. Supporting Developers
  21. 21. Supporting Developers
  22. 22. 1. Measure the pain 2. Continuous Improvement
  23. 23. Technical Debt
  24. 24. Technical Debt
  25. 25. Contact Rate + Confluence Questions + Hipchat queries Customer JIRA issues Number of Developers ( ) ÷ =
  26. 26. Contact Rate
  27. 27. The Shield http://www.clker.com/cliparts/e/d/c/4/11970889822084687040sinoptik_Medieval_shield.svg.hi.png
  28. 28. Rebranding Maintenance Disturbed Removing the negative attitude towards the old role within the team
  29. 29. Project work Maintenance The Shield
  30. 30. How do we avoid this in the future? PETER LESCHEV “ ”
  31. 31. Fix it now, fix it for the future
  32. 32. Self service
  33. 33. Chat bots Self Service
  34. 34. Self Service Maven Self Help Tool
  35. 35. INTRODUCTION INFRASTRUCTURE TEAM BAMBOO SERVERS Infrastructure CONCLUSION
  36. 36. Infrastructure as Code = Puppet + SCM ?
  37. 37. 4 years ago… Started using Puppet Manually maintained snow flakes
  38. 38. Production rollout puppetmaster build agents
  39. 39. Production rollout failure puppetmaster build agents
  40. 40. Low confidence of change
  41. 41. atlassian.com/git
  42. 42. Style in Pull Requests
  43. 43. Puppet Lint https://github.com/rodjek/puppet-lint Tim Sharpe @rodjek Runs checks & posts results, fails if there are any warnings or errors Automated Build Automated Style Checking
  44. 44. • Coding on Puppet Master • Culture of manually modifying production - Configuration Drift • Impact on Builds Using Staging for Development puppetmaster build agents staging puppet environment
  45. 45. Vagrant www.vagrantup.com Mitchell Hashimoto @mitchellh Packer packer.io
  46. 46. Rolling out to staging Rolling out to production Broken build agents Developing locally
  47. 47. Behaviour Driven Development Cucumber https://github.com/cucumber/aruba
  48. 48. But it works on my machine EVERY DEVELOPER “ ”
  49. 49. Continuous Integration ‘From scratch’ provisioning Confidence that you can rebuild in disaster
  50. 50. The Pets: you give nice names, you stroke them, and when they get ill, you nurse them back to health, taking a long time over it. “ ” The Cattle: you give them numbers. When they get ill, you shoot them TIM BELL, CERN
  51. 51. Provisioning from scratch is slow
  52. 52. Profiling Puppet Runs Add “--evaltrace” to puppet apply + = Collect and show the longest occurrences of: “Evaluated in ([d.]+) seconds”
  53. 53. Profiling Cucumber runs http://itshouldbeuseful.wordpress.com/2010/11/10/find-your-slowest-running-cucumber-features/
  54. 54. • Faster local provisioning • Different class of problems found • Closer to production Delta Provisioning ‘from scratch’ provision ‘delta’ provision provisionVM exportVM fileshare importVM box provisionVM on success
  55. 55. Broken builds master
  56. 56. Branch builds BUILDENG-5670 BUILDENG-5669 master
  57. 57. Infrequent Releases
  58. 58. • Puppet runs impacted running builds • Disabling all the build agents • Manually performing the roll out • git clone / librarian-puppet / symlink update on puppetmaster • Kick off puppet on all the build agents • Enabling all the build agents • Set of Puppet environments for every Bamboo server Painful Puppet Rollouts
  59. 59. Graceful Service restarts + Bamboo Agent JVM process watches for touch file & shutdowns when Idle (written as a Bamboo Plugin)
  60. 60. Puppet environments reduced staging production server1_staging server1_production server2_staging server2_production server3_staging server3_production etc
  61. 61. Bamboo Deployments Available agents Available agents Destination server TASK 1 TASK 2 Task list 1.3 Build results (Artifacts) Release Environments Repository Build artifacts Release 1.3 Build results (Artifacts) Release 1.0 1.3 1.3 Build results (Artifacts) Release Environments Production Development 1.0 1.3 1.3 1.3 Build results (Artifacts) Release Environments Production Development 1.0 1.31.3 Production Development 1.31.3 Development Repository Build artifacts Release How artifacts work 1.3 1.0 1.31.3 Production Development 1.31.3 Development Release notes Repository Build artifacts Release staging production 1.3 Build results (Artifacts) Release Environments Repository Build artifacts Release 1.3 d results rtifacts) Release Environments sitory Build artifacts Release 1.3 Build results (Artifacts) Release Environments Repository Build artifacts Release build • git clone • librarian-puppet • to specific environments • scp to puppet master & symlink update test deploy • ‘delta’ & ‘from scratch’ vagrant provisions 1.3 Build artifacts Release build & test AMIs • Generated using Packer • AMIs on Bamboo Servers updated deploy AMIs
  62. 62. Puppet Build, Test & Deploy Pipeline
  63. 63. Puppet Build, Test & Deploy Pipeline
  64. 64. Terraform Pipeline Plan & Apply changes of staging & production environments terraform.io
  65. 65. ‘open prs’ Bot
  66. 66. Less human effort through automation = Increased frequency & reliability of releases
  67. 67. Snowflakes Pets Cattle Stateless Machines
  68. 68. Infrastructure consistency is key
  69. 69. Challenges introduces instability Lots of packages Large number of constantly updating package dependencies External dependencies
  70. 70. INTRODUCTION BAMBOO SERVERS TEAM INFRASTRUCTURE Bamboo Servers CONCLUSION
  71. 71. At scale is hard
  72. 72. Bamboo Servers 12
  73. 73. Build Plans 3500
  74. 74. Plan Branches 14k
  75. 75. Bamboo is great, but hard to manage at scale
  76. 76. Build Configuration as code
  77. 77. Plan Templates Bamboo Plugin:
  78. 78. Plan Templates Checked into SCM Bamboo Plugin: Reusable snippets changes can be code reviewed Export plans for backup, or move to another Bamboo instance easily Bulk changes Export existing plans Update 100s of job requirements with a single commit
  79. 79. Pushing Bamboo to its limits
  80. 80. Agent Smith Wallboard Bamboo Plugin: Trend data sent to Graphite https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugin.agent-smith-wallboard
  81. 81. Add metrics, then alert on them
  82. 82. Bamboo Monitoring Plugin Metrics to graphite Bamboo Plugin: Bamboo Health ActiveMQ, Database connections, Tomcat, JVM Memory usage. Background thread workers. Number of plans / plan branches, plans / plan branches for deletion.
  83. 83. When a Bamboo Server starts misbehaving…
  84. 84. Infrastructure differences? Is it Bamboo Configuration? Is it a Bamboo Plugin? Is it Bamboo the product? How is it being used?
  85. 85. Infrastructure consistency of Bamboo Servers is key
  86. 86. Bamboo Puppet provider + REST API for Administration Bamboo Puppet Provider REST calls https://forge.puppetlabs.com/atlassian/bamboo_rest
  87. 87. Bamboo Puppet provider https://forge.puppetlabs.com/atlassian/bamboo_rest Hipchat Notification Managed via Puppet
  88. 88. Bamboo Plugins ‘Continuous Plugin Deployment’ Task This text box is not intended to contain a bunch of copy. 1-click upgrades of Available agents Available agents Destination server TASK 1 TASK 2 Task list Available agentsTask list Task list Production Available agents Destination server TASK 1 TASK 2 Task list How artifacts work 1.3 1.0 1.31.3 Produc Developm 1.31.3 Development Release notes Repository Build artifacts Release How artifacts work 1.3 Build results (Artifacts) Release Environments 1.0 1.31.3 Production Development 1.31.3 Development Release notes Repository Build artifacts Release How artifacts work 1.3 1.0 1.31.3 Production Development 1.31.3 Development Artifacts Ve Test & Build JIRA issue Commit TriggerCode Release notes Repository Build artifacts Release All Bamboo Servers How artifacts work 1.3 Build results (Artifacts) Release Environments 1.0 1.31.3 Production Development 1.31.3 Development Repository Build artifacts Release build Deploy How artifacts work 1.3 1.0 1.31.3 Production Developmen 1.31.3 Development Repository Build artifacts Release build & test AMIs Build https://marketplace.atlassian.com/plugins/com.atlassian.bamboo.plugins.deploy.continuous-plugin-deployment
  89. 89. Bamboo Servers 1-click upgrades of Using scp / ssh & puppet Available agents Available agents Destination server TASK 1 TASK 2 TASK 1 TASK 2 Task list Available agents TASK 1 TASK 2 Task list Task list Production Available agents Destination server TASK 1 TASK 2 Task list How artifacts work 1.3 Build results (Artifacts) Release Environments De 1.0 1.31.3 Production Development 1.31.3 Development Repository Build artifacts Release How artifacts work 1.3 1.3 1.3 Build results (Artifacts) Release Environments Development 1.0 1.31.3 Production Development 1.31.3 Development Release notes Repository Build artifacts Release How artifacts work 1.3 1.0 1.31.3 Production Development 1.31.3 Development Artifacts Versio Test &JIRA Release notes Repository Build artifacts Release How artifacts work 1 1.3 1.3 Build results (Artifacts) Release Environments Development 1.0 1.31.3 Production Development 1.31.3 Development Repository Build artifacts Release Upgrade Bamboo 1.3 1.0 1.31.3 Production Development 1.31.3 Development Repository Build artifacts Release Build Bamboo 1.3 Build results (Artifacts) Release Environm D Repository Build artifacts 1.0 1.3 1.3 1.3 Build results (Artifacts) Release Environments Production Development Repository Build artifacts Release jira-bamboo servicedesk-bamboo
  90. 90. Infrastructure differences? Is it Bamboo Configuration? Is it a Bamboo Plugin? Is it Bamboo the product? How is it being used?
  91. 91. TEAM INFRASTRUCTURE BAMBOO SERVERS Conclusion CONCLUSION INTRODUCTION
  92. 92. Constant improvement
  93. 93. We’ve matured to handle the growth of Atlassian
  94. 94. Come join us!
  95. 95. Thank you! PETER LESCHEV • TEAM LEAD • ATLASSIAN • @PETERLESCHEV

×