Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Continuous Integration
on Steroids
Akbashev Alexander
Highload++ | November 07, 2016
Agenda
01. CI in HERE
02. Monitoring
03. Scalability
04. Jenkins
05. Nightmares Plugins
06. Morale
07. Q&A
01
Continuous Integration in
HERE
Every change goes through validation pipeline
Gerrit
Gerrit
Plugin
Pre-submit
Trigger
Pre-submit
Trigger
Build
Build
Build...
Feedback goes from tests back to Gerrit
Gerrit
Gerrit
Plugin
Pre-submit
Trigger
Pre-submit
Trigger
Build
Build
Build
Build...
Feedback comes from every pipeline
Gerrit
Gerrit
Plugin
Pre-submit
Trigger
Pre-submit
Trigger
Build
Build
Build
Build
Buil...
Numbers
100k+ builds per day ~1.5k concurrent builds 1.3-2.5k executors
• Each “build” is
execution of one build/
test job...
02
Monitoring
Collects information about every build in system
Groovy
Event
Listener
Plugin
Jenkins
build
Fluentd InfluxDB Grafana
Collects information about every build in system
Groovy
Event
Listener
Plugin
Jenkins
build Fluentd InfluxDB Grafana
JVM stats are the best “canary”
Groovy
Event
Listener
Plugin
Jenkins
build
Fluentd InfluxDB Grafana
Jenkins
JVM
03
Scalability
What do we want to achieve?
What do we want to achieve?
Keep feedback time (< 20 min.)
What do we want to achieve?
Keep feedback time (< 20 min.)
Test as much as possible
What do we want to achieve?
Keep feedback time (< 20 min.)
Test as much as possible
… with debug symbols
What do we want to achieve?
Keep feedback time (< 20 min.)
Test as much as possible
… with debug symbols
… and code covera...
What do we want to achieve?
Keep feedback time (< 20 min.)
Test as much as possible
… with debug symbols
… and code covera...
How to scale
Increase number of executors
Minimize job execution time
Smart testing
How to increase number of executors?
EC2 Plugin
TestDroid
How to minimize job execution time
How to minimize job execution time
Split tests by type
How to minimize job execution time
Split tests by type
Parallel execution
How to minimize job execution time
Split tests by type
Parallel execution
Node as cache storage
How to minimize job execution time
Split tests by type
Parallel execution
Node as cache storage
Shared compiler cache
How to minimize job execution time
Split tests by type
Parallel execution
Node as cache storage
Shared compiler cache
Prof...
04
Jenkins
Is Jenkins so slow or we are doing something wrong?
Is Jenkins so slow or we are doing something wrong?
Jenkins is ok.
Is Jenkins so slow or we are doing something wrong?
Jenkins is ok.
But…
Surprise #1
Rotation costs a lot
Surprise #2
It works much better with nginx
less jenkins.access.log | tail -n1000 | grep urt="-" | wc -l
407
Surprise #3
Some buttons are very dangerous
Surprise #3
Some buttons are very dangerous
Slave
Slave
One fundamental issue
Master
Slave
Slave
Slave
Slave
Slave
Slave
Users
What can you find in heap dump of OOM-Killed Jenkins?
What can you find in heap dump of OOM-Killed Jenkins?
Console logs
Console logs
Should be less than X MB
Verbose output goes to file
“>” and “tee” are amazing!
What can you find in heap dump of OOM-Killed Jenkins?
Console logs
What can you find in heap dump of OOM-Killed Jenkins?
Console logs
Build history
Build history
2000 entities or 3 days
Efficient rotator
What can you find in heap dump of OOM-Killed Jenkins?
Console logs
Build history
What can you find in heap dump of OOM-Killed Jenkins?
Console logs
Build history
Build artifacts
Build artifacts
Push to S3 directly from slaves
Don’t store anything on master
05
Nightmares Jenkins Plugins
Limit of number of builds
20K
Groovy Event Listener Plugin
all events
synchronized
groovy compilation
fixed since 1.010 (Mar 10, 2016)
Limit of number of builds
40K
Warnings Plugin
Just another parser of console log
parseConsole is “deprecated”
parseFile is allowed
0 warnings are very a...
Limit of number of builds
60K
Timestamper Plugin
Tail needs not only “tail”
fixed since 1.8.5 (Aug 31, 2016)
Limit of number of builds
60K
EC2 Plugin
Full list of all images in AWS
fixed since 1.35 (Jun 30, 2016)
Limit of number of builds
90K
Robot Framework Plugin
Green chart costs 100 times more
Replaced by xUnit Plugin
Limit of number of builds
120K
Build Failure Analyzer Plugin
One regexp
One stream
One thread
PR-57 is not accepted yet
Limit of number of builds
140K
Cleanup Workspace Plugin
`ü` breaks everything
PR-29 is not accepted yet
06
Morale
Final recommendations
Final recommendations
Think about scalability at first place
Final recommendations
Think about scalability at first place
Flakiness could be a huge problem
Final recommendations
Think about scalability at first place
Flakiness could be a huge problem
Reduce memory allocations
Final recommendations
Think about scalability at first place
Flakiness could be a huge problem
Reduce memory allocations
C...
Final recommendations
Think about scalability at first place
Flakiness could be a huge problem
Reduce memory allocations
C...
Workflow
Slowness? Profile! Fix! Contribute!
Open source collaboration
Let’s make our life better ;)
Full list of our contributions related to this talk
• Jenkins
• ccache
• clcache
• EC2 Plugin
• S3 Plugin
• FluentD Plugin...
Thank you
Contact
Akbashev Alexander
GitHub: Jimilian
E-mail: alexander.akbashev@here.com
Continuous Integration на стероидах / Александр Акбашев (HERE)
Continuous Integration на стероидах / Александр Акбашев (HERE)
Upcoming SlideShare
Loading in …5
×

Continuous Integration на стероидах / Александр Акбашев (HERE)

990 views

Published on

Доклад об одном из самых больших монолитных инстансов Jenkins в мире: один мастер переваривает больше 100 тысяч билдов в день и управляет в пике ~2500 executor'ов.

В докладе будут подняты следующие вопросы:
* Как развернуть CI в облаке?
* Как с помощью memcache экономить на железе для CI?
* Тормозит ли Jenkins сам по себе?
* Лайфхаки по использованию разных плагинов.
* Нужно ли мониторить GC у Jenkins?
* Какие плагины делают жизнь лучше, а какие - сильно хуже?
* Как сделать мониторинг Jenkins'a ~из скотча и линейки~ с FluentD + InfluxDB + Grafana.

И наверняка что-то еще...

Published in: Engineering

Continuous Integration на стероидах / Александр Акбашев (HERE)

  1. 1. Continuous Integration on Steroids Akbashev Alexander Highload++ | November 07, 2016
  2. 2. Agenda 01. CI in HERE 02. Monitoring 03. Scalability 04. Jenkins 05. Nightmares Plugins 06. Morale 07. Q&A
  3. 3. 01 Continuous Integration in HERE
  4. 4. Every change goes through validation pipeline Gerrit Gerrit Plugin Pre-submit Trigger Pre-submit Trigger Build Build Build Build Build Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests
  5. 5. Feedback goes from tests back to Gerrit Gerrit Gerrit Plugin Pre-submit Trigger Pre-submit Trigger Build Build Build Build Build Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests
  6. 6. Feedback comes from every pipeline Gerrit Gerrit Plugin Pre-submit Trigger Pre-submit Trigger Build Build Build Build Build Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests Tests
  7. 7. Numbers 100k+ builds per day ~1.5k concurrent builds 1.3-2.5k executors • Each “build” is execution of one build/ test job • Total number correlates with number of commits • Number of builds is not so important as number of commits • Big throughput is extremely important • Morning commit • Before lunch • “Last attempt for today” • Raised on-demand • Health checks • Jenkins strategy is not optimized for cloud
  8. 8. 02 Monitoring
  9. 9. Collects information about every build in system Groovy Event Listener Plugin Jenkins build Fluentd InfluxDB Grafana
  10. 10. Collects information about every build in system Groovy Event Listener Plugin Jenkins build Fluentd InfluxDB Grafana
  11. 11. JVM stats are the best “canary” Groovy Event Listener Plugin Jenkins build Fluentd InfluxDB Grafana Jenkins JVM
  12. 12. 03 Scalability
  13. 13. What do we want to achieve?
  14. 14. What do we want to achieve? Keep feedback time (< 20 min.)
  15. 15. What do we want to achieve? Keep feedback time (< 20 min.) Test as much as possible
  16. 16. What do we want to achieve? Keep feedback time (< 20 min.) Test as much as possible … with debug symbols
  17. 17. What do we want to achieve? Keep feedback time (< 20 min.) Test as much as possible … with debug symbols … and code coverage information
  18. 18. What do we want to achieve? Keep feedback time (< 20 min.) Test as much as possible … with debug symbols … and code coverage information and on physical devices
  19. 19. How to scale Increase number of executors Minimize job execution time Smart testing
  20. 20. How to increase number of executors? EC2 Plugin TestDroid
  21. 21. How to minimize job execution time
  22. 22. How to minimize job execution time Split tests by type
  23. 23. How to minimize job execution time Split tests by type Parallel execution
  24. 24. How to minimize job execution time Split tests by type Parallel execution Node as cache storage
  25. 25. How to minimize job execution time Split tests by type Parallel execution Node as cache storage Shared compiler cache
  26. 26. How to minimize job execution time Split tests by type Parallel execution Node as cache storage Shared compiler cache Profiling!
  27. 27. 04 Jenkins
  28. 28. Is Jenkins so slow or we are doing something wrong?
  29. 29. Is Jenkins so slow or we are doing something wrong? Jenkins is ok.
  30. 30. Is Jenkins so slow or we are doing something wrong? Jenkins is ok. But…
  31. 31. Surprise #1 Rotation costs a lot
  32. 32. Surprise #2 It works much better with nginx less jenkins.access.log | tail -n1000 | grep urt="-" | wc -l 407
  33. 33. Surprise #3 Some buttons are very dangerous
  34. 34. Surprise #3 Some buttons are very dangerous
  35. 35. Slave Slave One fundamental issue Master Slave Slave Slave Slave Slave Slave Users
  36. 36. What can you find in heap dump of OOM-Killed Jenkins?
  37. 37. What can you find in heap dump of OOM-Killed Jenkins? Console logs
  38. 38. Console logs Should be less than X MB Verbose output goes to file “>” and “tee” are amazing!
  39. 39. What can you find in heap dump of OOM-Killed Jenkins? Console logs
  40. 40. What can you find in heap dump of OOM-Killed Jenkins? Console logs Build history
  41. 41. Build history 2000 entities or 3 days Efficient rotator
  42. 42. What can you find in heap dump of OOM-Killed Jenkins? Console logs Build history
  43. 43. What can you find in heap dump of OOM-Killed Jenkins? Console logs Build history Build artifacts
  44. 44. Build artifacts Push to S3 directly from slaves Don’t store anything on master
  45. 45. 05 Nightmares Jenkins Plugins
  46. 46. Limit of number of builds 20K
  47. 47. Groovy Event Listener Plugin all events synchronized groovy compilation fixed since 1.010 (Mar 10, 2016)
  48. 48. Limit of number of builds 40K
  49. 49. Warnings Plugin Just another parser of console log parseConsole is “deprecated” parseFile is allowed 0 warnings are very appreciated :)
  50. 50. Limit of number of builds 60K
  51. 51. Timestamper Plugin Tail needs not only “tail” fixed since 1.8.5 (Aug 31, 2016)
  52. 52. Limit of number of builds 60K
  53. 53. EC2 Plugin Full list of all images in AWS fixed since 1.35 (Jun 30, 2016)
  54. 54. Limit of number of builds 90K
  55. 55. Robot Framework Plugin Green chart costs 100 times more Replaced by xUnit Plugin
  56. 56. Limit of number of builds 120K
  57. 57. Build Failure Analyzer Plugin One regexp One stream One thread PR-57 is not accepted yet
  58. 58. Limit of number of builds 140K
  59. 59. Cleanup Workspace Plugin `ü` breaks everything PR-29 is not accepted yet
  60. 60. 06 Morale
  61. 61. Final recommendations
  62. 62. Final recommendations Think about scalability at first place
  63. 63. Final recommendations Think about scalability at first place Flakiness could be a huge problem
  64. 64. Final recommendations Think about scalability at first place Flakiness could be a huge problem Reduce memory allocations
  65. 65. Final recommendations Think about scalability at first place Flakiness could be a huge problem Reduce memory allocations Cache as much as possible
  66. 66. Final recommendations Think about scalability at first place Flakiness could be a huge problem Reduce memory allocations Cache as much as possible Failing builds can be expensive
  67. 67. Workflow Slowness? Profile! Fix! Contribute!
  68. 68. Open source collaboration Let’s make our life better ;)
  69. 69. Full list of our contributions related to this talk • Jenkins • ccache • clcache • EC2 Plugin • S3 Plugin • FluentD Plugin • BuildRotator Plugin • Groovy Event Listener Plugin • Timestamper Plugin • Robot Framework Plugin • Build Failure Analyzer Plugin • JVM GC Log Plugin for FluentD
  70. 70. Thank you Contact Akbashev Alexander GitHub: Jimilian E-mail: alexander.akbashev@here.com

×