Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intro to open source telemetry linux con 2016

1,941 views

Published on

Abstract
As part of the team delivering Snap, an open telemetry framework, I've run through dozens of use cases where gathering disparate metrics from services can roll up into meaningful diagrams for operations engineers and developers alike. We will use Snap's plugin model to collect, process and publish these measurements into meaningful graphs using open source tools. By joining this session, you can follow along and install industry-standard open source projects, deploy them and then use Snap to collect, process and visualize these metrics.

Audience
Anyone with an operations-background (or future ahead of them) that wants to see the breadth of available open source tooling around telemetry. This proposal is designed for the hands-on user, who is comfortable running containers or virtual machines locally.

Experience Level
Intermediate

Benefits to the Ecosystem
By joining this session, you can follow along and install industry-standard open source projects, deploy them and then use Snap to collect, process and visualize these metrics. This empowers users within the Linux ecosystem to see their knowledge as powerful when visualized next to other layers of the datacenter.

Published in: Software
  • Login to see the comments

Intro to open source telemetry linux con 2016

  1. 1. LinuxCon 2016 An introduction to datacenter telemetry using open source tools Matt Brender (@mjbrender)
  2. 2. Briefly, About Me Am: @mjbrender (everywhere) Developer Advocate, Orchestration Engineering Pretty good at Open Source practices Was: Storage array performance VMware NoSQL
  3. 3. Loose Agenda 1. Wishful thinking of the lab config 2. What is telemetry 3. One opinion on the state of open source tooling
  4. 4. Let’s Test the Network 4 linuxcon.snap-telemetry.io then git clone I encourage you to keep downloading stuff until you’re ready to go.
  5. 5. Lab Hopes 5
  6. 6. High Level View 6 Grafana + InfluxDB Snap Snap “Admin” ”Production”
  7. 7. Less High Level View 7 Your Laptop Ubuntu 16.04 Vagrant Ubuntu 16.04Ubuntu 16.04
  8. 8. Less High Level View 8 Your Laptop Ubuntu 16.04 Vagrant Ansible Ubuntu 16.04Ubuntu 16.04 SnapDocker Snap
  9. 9. Less High Level View 9 Your Laptop Ubuntu 16.04 Vagrant Ansible Ubuntu 16.04Ubuntu 16.04 SnapDocker Snap Compose InfluxDB Grafana
  10. 10. Why??? 10
  11. 11. 11 Telemetry
  12. 12. 12 Snap collectd StatsD telegraf beats Logstash diamond InfluxDB OpenTSDB KairosDB Graphite Prometheus ElasticSearch Bosun Grafana Sensu Ganglia RRDtool Nagios Facette Vector (Netflix)
  13. 13. 13 what my friends think telemetry is what my parents think telemetry is what society thinks telemetry is what my boss thinks telemetry is what I think telemetry is what telemetry actually is
  14. 14. What Is Telemetry? Telemetry is the stuff you can measure and the process of capturing it: from the heat generated on a CPU core to the throughput of Nginx* running in a Docker* container on a Kubernetes cluster. It’s all measurable and it’s all summarized in that one word. • Telemetry - the process of using equipment to take measurements of something and send them to another place • Metrics - measurements of facts throughout the data center • Analytics - the method of logical analysis that determines the consequences of information
  15. 15. What Is Telemetry? What How Application Availability ping Operating System Performance psutil Hardware Utilization Intel Performance Counter Metrics (PCM)
  16. 16. What Is Telemetry? What How Why Application Availability ping SLA compliance Operating System Performance psutil System performance Hardware Utilization Intel Performance Counter Metrics (PCM) Scaling capacity
  17. 17. What snap is and what it isn’t 17 Telemetry Analytics
  18. 18. What snap is and what it isn’t 18 Telemetry Analytics snap snap is a framework for metrics. snap is NOT an analytics alternative.
  19. 19. What snap is and what it isn’t 19 Telemetry Analytics Automation Scheduling IRO
  20. 20. collect process publish The Watcher Workflow 20
  21. 21. 21 Collectors in snap
  22. 22. Processors in snap 22
  23. 23. Publishers in snap 23
  24. 24. 24 Collectors in snap Collect telemetry data once via plugins for: § Bare metal, including Intel specific platform metrics (CPU, NIC, BMC, SMARTS) § Operating Environments and existing telemetry (Docker, libvirt, psutil) § Application services and adjacencies (Ceph, HAProxy, Etcd, Facter, MySQL, Apache) Populate a dynamically generated single-namespace telemetry catalog
  25. 25. 25 Filter, alter or append metadata via plugins for: § Filtering (Moving Averages) § Normalization § Encryption for all or part of the data set § Injection of metadata § Tokens § Tenant IDs Forking to one or more endpoints Processors in snap
  26. 26. 26 Publish data via plugins for: § Dashboard Tools (Graphite, Grafana, Riemann) § Queues and Logs (RabbitMQ, Kafka, File) § Databases (PostgreSQL, InfluxDB, OpenTSDB, SAP HANA) To one or more endpoints Publishers in snap
  27. 27. Visibility at all layers 27 App OS HW ? ? ? ? Analytics Pipeline Dashboards
  28. 28. Visibility at all layers 28 ? App OS HW Analytics Pipeline Dashboards
  29. 29. Visibility at all layers 29 Snap App OS HW Analytics Pipeline Dashboards
  30. 30. Visibility at all layers 30 OS HW Analytics Pipeline Dashboards App OS Virtualization HW App Snap
  31. 31. Visibility at all layers 31 OS HW Analytics Pipeline Dashboards App OS Virtualization HW App Snap
  32. 32. Visibility at all layers 32 OS HW Analytics Pipeline Dashboards App OS HW App Snap Kubernetes
  33. 33. Visibility at all layers OS HW App Snap Kubernetes OS HW App OS HW App OS HW App OS HW App OS HW App OS HW App
  34. 34. 34 REST & CLI Flexible Scheduling Caching Security Plugin Lifecycle Management Worker Queues Metric Catalog Tribe
  35. 35. Thought Leadership Ahead 35 Warning:
  36. 36. Monitoring is 36 Monitoring
  37. 37. 37 Monitoring Telemetry Alerts Persistence Learning Visualization Logging Notifications Monitoring is
  38. 38. 38 Monitoring is Telemetry
  39. 39. 39 Monitoring is Telemetry Collect Process Publish Schedule Automate
  40. 40. 40 Monitoring Telemetry Alerts Persistence Learning Visualization Logging Notifications Monitoring is
  41. 41. 41 Monitoring Telemetry Alerts Persistence Learning Visualization Logging Notifications Monitoring is Snap
  42. 42. 42 Monitoring Telemetry Alerts Persistence Learning Visualization Logging Notifications Monitoring is Grafana
  43. 43. Better Thought Leadership 43 by @obscurify by @caskey https://github.com/mjbrender/what-we-talk-about-when-we-talk-about-telemetry
  44. 44. Q&A 44
  45. 45. FAQ 45 Do I need telemetry?
  46. 46. FAQ 46 I don’t need telemetry, I have ____________.
  47. 47. FAQ 47 I don’t need telemetry, I have ____________.Graphite
  48. 48. 48 Monitoring Telemetry Alerts Persistence Learning Visualization Logging Notifications Monitoring is Graphite
  49. 49. FAQ 49 Do I need monitoring?
  50. 50. FAQ 50 We run ________ for monitoring.Nagios
  51. 51. 51 Monitoring Telemetry Alerts Persistence Learning Visualization Logging Notifications Monitoring is Nagios
  52. 52. What Is Telemetry? (revisited) What How Application Availability ping Operating System Performance psutil Hardware Utilization Intel Performance Counter Metrics (PCM)
  53. 53. What Is Telemetry? (revisited) What Query Collect Process Publish Visualize Application Availability ping ? ? ? ? Operating System Performance psutil ? ? ? ? Hardware Utilization PCM ? ? ? ? How Expanded
  54. 54. What Is Telemetry? (revisited) What Query Collect Process Publish Visualize Application Availability ping ? ? ? ? Operating System Performance psutil ? ? ? ? Hardware Utilization PCM ? ? ? ? How Expanded Snap Grafana
  55. 55. 55
  56. 56. Next Up 56 Start using Snap! • snap-telemetry.io • github.com/intelsdi-x Find me: • on The Geek Whisperers • and @mjbrender
  57. 57. additional information 57
  58. 58. Everything is Challenging At Scale 58
  59. 59. Add new task 59
  60. 60. Add new task 60
  61. 61. define as a tribe Scaling with Tribe 61
  62. 62. Scaling with Tribe Add new task 62
  63. 63. snap | What’s next? Physical/Virtual Host Scheduler Processing Publishing Collection 63
  64. 64. snap | What’s next? 64 Physical/VM Host Physical/VM Host Physical/VM Host Physical/VM Host Physical/VM Host Physical/VM Host Collection Collection Collection Scheduler Processing Publishing
  65. 65. § Plugin load § Dynamic, does not require restart § Automatically is informed by plugin on the features, metrics, and configuration detail. § Dynamically extends the metric catalog when loaded. § Plugin unload § Removes metrics from catalog automatically § Loading a new plugin automatically upgrades running workflows in tasks § Optionally the collection can be pinned to a version (ex: get /intel/server/cpu/load/v1) § Each scheduled workflow automatically uses the most mature plugin for that step § Coupled with dynamic plugin loading results in instantaneous updates to existing workflows § Helpful for bug fixes, security patching, improving accuracy snap | Plugin Lifecycle 65
  66. 66. Customizable definition of task and related workflow: Collect Publish Publish Collect Publish ProcessCollect Publish Collect Process Publish Process Publish snap | Overview – Example Workflows 66
  67. 67. The Catalog 67 Intel PCM psutil HAProxy /intel/psutil/load/load1 /intel/psutil/load/load5 /intel/psutil/vm/available /intel/pcm/EXEC /intel/pcm/FREQ /intel/linux/docker/cpu_stats/throttling_data/periods snapctl metric list /intel/server/health/score Docker Intel Health /intel/haproxy/info/MaxConnRate snap

×