Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix Conference 2015



How BlaBlaCar designed and operates a Zabbix based monitoring platform, optimizing Zabbix configuration, developping & using python-protobix & jmx-zabbix for more scalability

  • Be the first to comment

Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix Conference 2015

  1. 1. How we monitor 1 billion km of monthly ride sharing Jean Baptiste Favre Ops Lead @jbfavre
  2. 2. How are we ?
  3. 3. 5 million members in december 2013
  4. 4. 20 million members monitoring
  5. 5. 7 million members in april 2014
  6. 6. 50 million members monitoring
  7. 7. 20 million members in april 2015
  8. 8. 2015
  9. 9. 100 million members monitoring
  10. 10. How we monitor 1 billion km of monthly ride sharing
  11. 11. KEEP CALM AND MONITOR ALL THE THINGS Zabbix
  12. 12. How many items ?
  13. 13. How many new VPS ?
  14. 14. Load ? What load ?:)
  15. 15. How ?
  16. 16. Standardization
  17. 17. Standardization Server triggers probe execution via zabbix-agent active item Probes collects, format and send informations using zabbix sender protocol Probe's exit code is send back to the server for feedback loop
  18. 18. Standard : 0 => OK 1 => fail during init 2 => fail while getting informations 3 => fail during Container update 4 => fail during Send phase Exit codes
  19. 19. Python or Java LLD wherever possible trappers always Only 2 zabbix-agent (active) items per template Client side probes
  20. 20. python-protobix KEEP CALM AND USE TRAPPERS & LLD EVERYWHERE Almost
  21. 21. python-protobix Actually no, but could have been https://github.com/jbfavre/python­protobix (also on pypi.python.org)
  22. 22. #!/usr/bin/env python import protobix ''' create DataContainer, providing data_type, zabbix server and port ''' zbx_container = protobix.DataContainer('lld', 'localhost', 10051) hostname='myhost' item='hardware.power_supply' value=[     { '{#SLOT}': 0, '{#PLUGGED}' : 1 },     { '{#SLOT}': 1, '{#PLUGGED}' : 0 }, ] zbx_container.add_item( hostname, item, value) try:     zbx_response = zbx_container.send() except protobix.SenderException:     print 'Oups...' LLD example PUT YOUR OWN LOGIC HERE :)
  23. 23. PUT YOUR OWN LOGIC HERE :) #!/usr/bin/env python import protobix ''' create DataContainer, providing data_type, zabbix server and port ''' zbx_container = protobix.DataContainer('items', 'localhost', 10051) hostname='myhost' item='hardware.power_supply[0,status]' value=1 zbx_container.add_item( hostname, item, value) try:     zbx_response = zbx_container.send() except protobix.SenderException:     print 'Oups...' item example
  24. 24. Low Level Discovery vhosts & queues thresholds Update values message number in/out ratio Who is master of this queue RabbitMQ example
  25. 25. Low Level Discovery Galera storage engines Multi-replication Update values Pretty much everything:) MariaDB example
  26. 26. Protobix probes 16 probes available And more to come redis/dynomite zookeeper … https://github.com/jbfavre/python­zabbix
  27. 27. jmx-zabbix KEEP CALM AND MONITOR ALL THE JAVA THINGS
  28. 28. Because python is not (always) enough :)Because python is not (always) enough :) jmx-zabbix https://github.com/n0rad/jmx­zabbix
  29. 29. Embedded inside a Java process – Internal Java daemons Aside any Java process (separate service) – Cassandra – Elasticsearch – … jmx-zabbix
  30. 30. serverName: <hostname in Zabbix> pushIntervalSecond: 60 inMemoryMaxQueueSize: 10 zabbix:   host: <Zabbix server hostname or IP>   port: 10051 jmx:   url: service:jmx:rmi:///jndi/rmi://localhost:7199/jmxrmi   username: zabbix   password: zabbix   timeoutSecond: 30 [...] configuration
  31. 31. [...] metrics:   cassandra.status.failure:org.apache.cassandra.net:type=FailureDetector   cassandra.status.timeouts:org.apache.cassandra.net:type=MessagingService   cassandra.db.storage: org.apache.cassandra.db:type=StorageProxy valuesCaptured:   org.apache.cassandra.gms.FailureDetector: ["DownEndpointCount"]   org.apache.cassandra.net.MessagingService: ["RecentTotalTimouts"]   org.apache.cassandra.service.StorageProxy: ["RecentRangeLatencyMicros",                                                "RecentReadLatencyMicros",                                                "RecentWriteLatencyMicros"] JMX to ZBX mapping
  32. 32. Zabbix visualization KEEP CALM AND LOOK AT THE GRAPHS
  33. 33. Grafana
  34. 34. Grafana + Zabbix datasource = 10 dashboards in 2 days Grafana https://github.com/grafana/grafana https://github.com/alexanderzobnin/grafana­zabbix
  35. 35. Dashing https://gist.github.com/chojayr/7401426 https://github.com/tolleiv/dashing­zabbix
  36. 36. Caveats KEEP CALM AND FIX THINGS BEFORE CTO NOTICES
  37. 37. Plugins & templates synchronization Zabbix configuration automatization Use same hostname everywhere Beware of
  38. 38. What next ? KEEP CALM AND WAIT FOR ZABBIX 3.0
  39. 39. Announced – Trends predictions – More scalable backend – SSL communications Not announced (As far as I know) – Trends from – Implicit dependency against proxy – Detailled web scenario – Per item maintenance – Anomaly detection What I miss in Zabbix
  40. 40. 3 Take aways Now you can wake up :) 1. Define & use standards 2. Use LLD & Trappers 3. Visualization is critical Let's discuss all that !

    Be the first to comment

    Login to see the comments

  • karlesnine

    Sep. 11, 2015
  • JunshanHe

    Dec. 17, 2015
  • carfieldboy

    Jan. 13, 2017
  • GautierBEGIN

    Feb. 22, 2017

How BlaBlaCar designed and operates a Zabbix based monitoring platform, optimizing Zabbix configuration, developping & using python-protobix & jmx-zabbix for more scalability

Views

Total views

3,465

On Slideshare

0

From embeds

0

Number of embeds

1,746

Actions

Downloads

0

Shares

0

Comments

0

Likes

4

×