SlideShare a Scribd company logo
1 of 22
Download to read offline
Ticketmaster confidential. Do not distribute.
Ticketmaster confidential. Do not distribute.
Prometheus +
Thanos
Thanos long term storage for Prometheus
Ticketmaster confidential. Do not distribute.
About me...
• Rocking it at Ticketmaster
• Student in Master of Business Administration (MBA) Strategic project management
• Bachelor of Applied Science (B.A.Sc.) in Computer Science
• Devops, Cloud, Kubernetes, Kafka, etc.
• Sport lover (Hockey, Dek Hockey, Ultimate Frisbee, Football…)
2
SECTION
Ticketmaster confidential. Do not distribute.
Where Prometheus & Thanos are in the CNCF landscape...
3
Prometheus
Ticketmaster confidential. Do not distribute.
Insert photography and crop to size of grey box.
(24.25 in. x 7.21 in.)
What is Prometheus?
4
Prometheus
Prometheus was the second project to be graduated of the Cloud Native Computing Foundation.
Ticketmaster confidential. Do not distribute.
Prometheus Architecture
Ticketmaster confidential. Do not distribute.
How we are using Prometheus at Ticketmaster?
• We are using the Prometheus-Operator made by CoreOS (RedHat (IBM))
• We have created a Helm chart that created common pulling jobs, settings and exporters
• Scrape EC2 instances based on Tags, Kubernetes
• Exporters like cloudwatch-metrics, kafka, blackbox
• Ingresses
• Thanos
• Federations
6
Prometheus
Ticketmaster confidential. Do not distribute.
Why moving away from the federation?
• Calculate the right ingestion rate of our scrape job is not really easy
• When the disk is full
• Prometheus stop working
• We needed to delete the PVC to recreated it bigger.
• Long term storage is costly SSD == $$$$
• Single point of failure
• Availability
• Operator error
• Hardware failure
• Rollout
7
Prometheus
Ticketmaster confidential. Do not distribute.
Solution
Ticketmaster confidential. Do not distribute.
Goals
9
Thanos
- Easy Deployment model
- Minimal number of dependencies
- Minimal baseline cost
Have a global view Seamless
integration with
Prometheus
Increase retentionHave a HA in place
Ticketmaster confidential. Do not distribute.
Global view
10
Thanos
Ticketmaster confidential. Do not distribute.
Global view + HA
11
Thanos
Ticketmaster confidential. Do not distribute.
Increase retention (Persist data)
12
Thanos
Ticketmaster confidential. Do not distribute.
Increase retention (Querying)
• A series is made up of one or more “chunks”
• A chunk contains ~120 samples each
• Chunks can be retrieved through HTTP byte range queries
Example:
• 1000 series @ 30s scrape interval
• Query 1 year
• 8.7 million chunks/range queries
• Chunks of the same series are aligned
• Similar series are aligned due to same metric name
This reduce request count by 4=6 orders of magnitude.
8.7 million requests turned into O(20) requests
•
13
Thanos
Ticketmaster confidential. Do not distribute.
Increase retention (Querying)
14
Thanos
group chunk same series group by same metric name
Ticketmaster confidential. Do not distribute.
Compaction
15
Thanos
Ticketmaster confidential. Do not distribute.
Full Architecture
16
Thanos
Ticketmaster confidential. Do not distribute.
Deployment Model( Example)
• Federation through Store API
17
Thanos
Ticketmaster confidential. Do not distribute.
Cost
• Store + Query node + Compaction ~ Savings on Prometheus side (+/- 0)
• Fewer SSD space on Prometheus side (Savings)
• Basically we are only paying for your data stored in S3/GCS/etc + requests
18
Thanos
Ticketmaster confidential. Do not distribute.
Cortex
19
Alternatives
Ticketmaster confidential. Do not distribute.
Cortex - Reason that we didn’t choose Cortex
• Documentation is not really good
• Need to maintain another dataset (NoSQL DB)
• Interfere with the datapath
20
Alternatives
Ticketmaster confidential. Do not distribute.
Question?
Ticketmaster confidential. Do not distribute.

More Related Content

What's hot

What's hot (20)

Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
OSMC 2022 | VictoriaMetrics: scaling to 100 million metrics per second by Ali...
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
Prometheus
PrometheusPrometheus
Prometheus
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.ioTHE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
THE STATE OF OPENTELEMETRY, DOTAN HOROVITS, Logz.io
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)
Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)
Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)
 

Similar to Prometheus and Thanos

Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
DataWorks Summit
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 

Similar to Prometheus and Thanos (20)

Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream AnalyticsImplementing a canonical IoT backend in Azure with Azure Stream Analytics
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Token Design as Optimization Design
Token Design as Optimization DesignToken Design as Optimization Design
Token Design as Optimization Design
 
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Starboard Solutions - 3PL
Starboard Solutions - 3PLStarboard Solutions - 3PL
Starboard Solutions - 3PL
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
AWS Roadshow Herbst 2013: Datenanalyse und Business IntelligenceAWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
AWS Roadshow Herbst 2013: Datenanalyse und Business Intelligence
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
Break out of The Box - Part 2
Break out of The Box - Part 2Break out of The Box - Part 2
Break out of The Box - Part 2
 
Jan 2012 HUG: Storm
Jan 2012 HUG: StormJan 2012 HUG: Storm
Jan 2012 HUG: Storm
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 

More from CloudOps2005

More from CloudOps2005 (20)

Defense in Depth: Securing your new Kubernetes cluster from the challenges th...
Defense in Depth: Securing your new Kubernetes cluster from the challenges th...Defense in Depth: Securing your new Kubernetes cluster from the challenges th...
Defense in Depth: Securing your new Kubernetes cluster from the challenges th...
 
Human No, Machine Yes: Welcome to the CDF with Incremental Confidence
Human No, Machine Yes: Welcome to the CDF with Incremental ConfidenceHuman No, Machine Yes: Welcome to the CDF with Incremental Confidence
Human No, Machine Yes: Welcome to the CDF with Incremental Confidence
 
The Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with KubernetesThe Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with Kubernetes
 
Own your Destiny in the Cloud - Ian Rae - Cloud Native Day Montreal 2019
Own your Destiny in the Cloud - Ian Rae - Cloud Native Day Montreal 2019Own your Destiny in the Cloud - Ian Rae - Cloud Native Day Montreal 2019
Own your Destiny in the Cloud - Ian Rae - Cloud Native Day Montreal 2019
 
Plateformes et infrastructure infonuagique natif de ville de Montréall
Plateformes et infrastructure infonuagique natif de ville de MontréallPlateformes et infrastructure infonuagique natif de ville de Montréall
Plateformes et infrastructure infonuagique natif de ville de Montréall
 
Using Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with CephUsing Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with Ceph
 
Kafka on Kubernetes
Kafka on KubernetesKafka on Kubernetes
Kafka on Kubernetes
 
Kubernetes: Crossing the Chasm
Kubernetes: Crossing the ChasmKubernetes: Crossing the Chasm
Kubernetes: Crossing the Chasm
 
Distributed Logging with Kubernetes
Distributed Logging with KubernetesDistributed Logging with Kubernetes
Distributed Logging with Kubernetes
 
Kubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentKubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy Agent
 
Advanced Deployment Strategies with Kubernetes and Istio
Advanced Deployment Strategies with Kubernetes and IstioAdvanced Deployment Strategies with Kubernetes and Istio
Advanced Deployment Strategies with Kubernetes and Istio
 
GitOps with ArgoCD
GitOps with ArgoCDGitOps with ArgoCD
GitOps with ArgoCD
 
Kubernetes Services are sooo Yesterday!
Kubernetes Services are sooo Yesterday!Kubernetes Services are sooo Yesterday!
Kubernetes Services are sooo Yesterday!
 
Amazon EKS: the good, the bad, and the ugly
Amazon EKS: the good, the bad, and the uglyAmazon EKS: the good, the bad, and the ugly
Amazon EKS: the good, the bad, and the ugly
 
Kubernetes, Terraform, Vault, and Consul
Kubernetes, Terraform, Vault, and ConsulKubernetes, Terraform, Vault, and Consul
Kubernetes, Terraform, Vault, and Consul
 
SIG Multicluster and the Path to Federation
SIG Multicluster and the Path to FederationSIG Multicluster and the Path to Federation
SIG Multicluster and the Path to Federation
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
 
Operator SDK for K8s using Go
Operator SDK for K8s using GoOperator SDK for K8s using Go
Operator SDK for K8s using Go
 
How to Handle your Kubernetes Upgrades
How to Handle your Kubernetes UpgradesHow to Handle your Kubernetes Upgrades
How to Handle your Kubernetes Upgrades
 
Kubernetes and Cloud Native Meetup - March, 2019
Kubernetes and Cloud Native Meetup - March, 2019Kubernetes and Cloud Native Meetup - March, 2019
Kubernetes and Cloud Native Meetup - March, 2019
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Prometheus and Thanos

  • 1. Ticketmaster confidential. Do not distribute. Ticketmaster confidential. Do not distribute. Prometheus + Thanos Thanos long term storage for Prometheus
  • 2. Ticketmaster confidential. Do not distribute. About me... • Rocking it at Ticketmaster • Student in Master of Business Administration (MBA) Strategic project management • Bachelor of Applied Science (B.A.Sc.) in Computer Science • Devops, Cloud, Kubernetes, Kafka, etc. • Sport lover (Hockey, Dek Hockey, Ultimate Frisbee, Football…) 2 SECTION
  • 3. Ticketmaster confidential. Do not distribute. Where Prometheus & Thanos are in the CNCF landscape... 3 Prometheus
  • 4. Ticketmaster confidential. Do not distribute. Insert photography and crop to size of grey box. (24.25 in. x 7.21 in.) What is Prometheus? 4 Prometheus Prometheus was the second project to be graduated of the Cloud Native Computing Foundation.
  • 5. Ticketmaster confidential. Do not distribute. Prometheus Architecture
  • 6. Ticketmaster confidential. Do not distribute. How we are using Prometheus at Ticketmaster? • We are using the Prometheus-Operator made by CoreOS (RedHat (IBM)) • We have created a Helm chart that created common pulling jobs, settings and exporters • Scrape EC2 instances based on Tags, Kubernetes • Exporters like cloudwatch-metrics, kafka, blackbox • Ingresses • Thanos • Federations 6 Prometheus
  • 7. Ticketmaster confidential. Do not distribute. Why moving away from the federation? • Calculate the right ingestion rate of our scrape job is not really easy • When the disk is full • Prometheus stop working • We needed to delete the PVC to recreated it bigger. • Long term storage is costly SSD == $$$$ • Single point of failure • Availability • Operator error • Hardware failure • Rollout 7 Prometheus
  • 8. Ticketmaster confidential. Do not distribute. Solution
  • 9. Ticketmaster confidential. Do not distribute. Goals 9 Thanos - Easy Deployment model - Minimal number of dependencies - Minimal baseline cost Have a global view Seamless integration with Prometheus Increase retentionHave a HA in place
  • 10. Ticketmaster confidential. Do not distribute. Global view 10 Thanos
  • 11. Ticketmaster confidential. Do not distribute. Global view + HA 11 Thanos
  • 12. Ticketmaster confidential. Do not distribute. Increase retention (Persist data) 12 Thanos
  • 13. Ticketmaster confidential. Do not distribute. Increase retention (Querying) • A series is made up of one or more “chunks” • A chunk contains ~120 samples each • Chunks can be retrieved through HTTP byte range queries Example: • 1000 series @ 30s scrape interval • Query 1 year • 8.7 million chunks/range queries • Chunks of the same series are aligned • Similar series are aligned due to same metric name This reduce request count by 4=6 orders of magnitude. 8.7 million requests turned into O(20) requests • 13 Thanos
  • 14. Ticketmaster confidential. Do not distribute. Increase retention (Querying) 14 Thanos group chunk same series group by same metric name
  • 15. Ticketmaster confidential. Do not distribute. Compaction 15 Thanos
  • 16. Ticketmaster confidential. Do not distribute. Full Architecture 16 Thanos
  • 17. Ticketmaster confidential. Do not distribute. Deployment Model( Example) • Federation through Store API 17 Thanos
  • 18. Ticketmaster confidential. Do not distribute. Cost • Store + Query node + Compaction ~ Savings on Prometheus side (+/- 0) • Fewer SSD space on Prometheus side (Savings) • Basically we are only paying for your data stored in S3/GCS/etc + requests 18 Thanos
  • 19. Ticketmaster confidential. Do not distribute. Cortex 19 Alternatives
  • 20. Ticketmaster confidential. Do not distribute. Cortex - Reason that we didn’t choose Cortex • Documentation is not really good • Need to maintain another dataset (NoSQL DB) • Interfere with the datapath 20 Alternatives
  • 21. Ticketmaster confidential. Do not distribute. Question?
  • 22. Ticketmaster confidential. Do not distribute.