SlideShare a Scribd company logo
1 of 28
GO Monitoring at Lazada
Rinat Takhautdinov & Ninh Dang
Senior Software Engineers at PDP Vertical Team – Lazada
Golang meetup # 11
About Us – PDP Vertical Department
- We’re doing product development at Lazada
- Product Detail Page
- Home Page
- Login/Signup
- Work mostly with Golang and PHP
Golang meetup # 11
- Overview of time-series metrics monitoring
- Big picture
- Prometheus
- Grafana
- How monitoring helps us at Lazada
- Use case 1: Product Ratings & Reviews performance optimization
- Use case 2: Shaping the product for Buyer-Seller Communication
- Conclusions
Agenda
Golang meetup # 11
- Overview of time-series metrics monitoring
- Big picture
- Prometheus
- Grafana
- How monitoring helps us at Lazada
- Use case 1: Product Ratings & Reviews performance optimization
- Use case 2: Shaping the product for Buyer-Seller Communication
- Conclusions
Agenda
Golang meetup # 11
Monitoring: General Architecture
http://service:port/metrics
Golang meetup # 11
What does Prometheus do?
• Time series database
• Written in Go - no external
dependencies
• Powerful functional query
language
• Pull, not push
• Thousands of targets
• Hundreds of thousands of
samples per second
• Millions of time series
• Distributed (or not)
a) can be run as a single node
b) can be run as a cluster for
redundancy & performance
Golang meetup # 11
Exporters and direct instrumentation
Client libraries
https://prometheus.io/docs/instrumenting/clientlibs/
Official
Go (https://github.com/prometheus/client_golang)
Java (JVM)
Ruby
Python
Unofficial
.NET / C#
Node.js
Bash
Exporters
https://prometheus.io/docs/instrumenting/exporters/
Official
Node/system metrics exporter
Graphite exporter
Collectd exporter
JMX exporter
HAProxy exporter
StatsD bridge
AWS CloudWatch exporter
Hystrix metrics publisher
Mesos task exporter Consul exporter
Direct instrumentation
Kubernetes
Kubernetes-Mesos
Etcd
gokit
go-metrics instrumentation library
Golang meetup # 11
Metrics and labels
Metric types:
- counter
- gauge
- histogram
- summary
var httpResponsesTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "my_super_puper_production",
Subsystem: "http_server",
Name: "http_responses_total",
Help: "The count of http responses issued, classified by
code and method.",
},
[]string{"code", "method"},
)
Listing 1
CAUTION: Remember that every unique combination of key-
value label pairs represents a new time series, which can
dramatically increase the amount of data stored. Do not use
labels to store dimensions with high cardinality (many different
label values), such as user IDs, email addresses, or other
unbounded sets of values.
Golang meetup # 11
Powerful query language
topk(3, sum(rate(bazooka_instance_cpu_time_ns[5m])) by (app, proc))
sort_desc(sum(bazooka_instance_memory_limit_bytes -
bazooka_instance_memory_usage_bytes) by (app, proc))
Expression browser
Golang meetup # 11
Grafana
Golang meetup # 11
Grafana
Golang meetup # 11
Grafana
Golang meetup # 11
- Overview of time-series metrics monitoring
- Big picture
- Prometheus
- Grafana
- How monitoring helps us at Lazada
- Use case 1: Product Ratings & Reviews performance optimization
- Use case 2: Shaping the product for Buyer-Seller Communication
- Conclusions
Agenda
Golang meetup # 11
Product Ratings & Reviews
Golang meetup # 11
Problem – High level
- Lazada has many campaigns throughout the year
- How do we prepare for the campaigns?
- Stress tests + monitoring
- Find & fix bottlenecks
- Verify results
- What happened to us?
- Site was broken, checkout not working
- Backend service alert (Review API)
Golang meetup # 11
Review API architecture
Golang meetup # 11
What do we monitor for our Review API?
- CPU / memory
- Goroutines
- Avg RT, RT per handler
- Error rates
- Throughput [per clients]
- DB connection
- Cache size (number of items)
- Cache hits / misses (utilization)
- …
Golang meetup # 11
What do we monitor for our Review API?
Golang meetup # 11
What do we monitor for our Review API?
Golang meetup # 11
What we found in our case?
- Number of cache items went up very high
- Review API version fragmentation
- Many calls on old version API endpoints
Golang meetup # 11
Optimization in action
1. Reduce number of calls from clients.
2. Force all clients move to use the latest versions.
3. Deprecate outdated handlers (keep multi-version of API is expensive)
Golang meetup # 11
Feel rock?
Golang meetup # 11
Business metrics monitoring for product development
Product delivering focused team
Many kind of KPIs how to check product health need to tracked
A lot of ideas how to make our products better
Limited capacity to do it
Smooth rollout and looking for better way
Golang meetup # 11
Business metrics monitoring for product development
Golang meetup # 11
Business metrics monitoring for product development
- Helps us to validate and evolve our MVP
- Transparent for everyone (e.g. dev team, country PMs)
- Real time tracking product health
Golang meetup # 11
Conclusions
- Having monitoring is extremely critical for any business
- If you have many teams, having standard is important
- Define most important metrics for each microservice
- Have some TV screens with some graphs near your team
corner!
Golang meetup # 11
Q&A
Golang meetup # 11
- WE ARE HIRING!
- 4 x Senior Backend Developers (GO/PHP)
- Contact us for more information:
- ninh.dang@lazada.com
- rinat.takhautdinov@lazada.com

More Related Content

Similar to Monitoring at-lazada

Performance profiling and testing of symfony application 2
Performance profiling and testing of symfony application 2Performance profiling and testing of symfony application 2
Performance profiling and testing of symfony application 2Andrew Yatsenko
 
improving the performance of Rails web Applications
improving the performance of Rails web Applicationsimproving the performance of Rails web Applications
improving the performance of Rails web ApplicationsJohn McCaffrey
 
Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...
Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...
Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...3camp
 
An Introduction to Microservices
An Introduction to MicroservicesAn Introduction to Microservices
An Introduction to MicroservicesAd van der Veer
 
apidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuseapidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuseapidays
 
Google App Engine - unusual application plaform for your next SaaS Project
Google App Engine - unusual application plaform for your next SaaS ProjectGoogle App Engine - unusual application plaform for your next SaaS Project
Google App Engine - unusual application plaform for your next SaaS ProjectAlek Kowalczyk
 
Shriraam-performance test engineer 5.4 years
Shriraam-performance test engineer 5.4 yearsShriraam-performance test engineer 5.4 years
Shriraam-performance test engineer 5.4 yearsshriraam ms
 
Splunk for Developers Breakout Session
Splunk for Developers Breakout SessionSplunk for Developers Breakout Session
Splunk for Developers Breakout SessionSplunk
 
SplunkLive! Seattle - Splunk for Developers
SplunkLive! Seattle - Splunk for DevelopersSplunkLive! Seattle - Splunk for Developers
SplunkLive! Seattle - Splunk for DevelopersGrigori Melnik
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps EngineersMárton Kodok
 
How to Improve Performance Testing Using InfluxDB and Apache JMeter
How to Improve Performance Testing Using InfluxDB and Apache JMeterHow to Improve Performance Testing Using InfluxDB and Apache JMeter
How to Improve Performance Testing Using InfluxDB and Apache JMeterInfluxData
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for DevelopersSplunk
 
Performance Testing ISV Apps to Scale
Performance Testing ISV Apps to ScalePerformance Testing ISV Apps to Scale
Performance Testing ISV Apps to ScaleSalesforce Partners
 
Performance Testing ISV Apps to Scale 11/9/2016
Performance Testing ISV Apps to Scale 11/9/2016Performance Testing ISV Apps to Scale 11/9/2016
Performance Testing ISV Apps to Scale 11/9/2016Salesforce Partners
 
Sudheer_SAP_ABAP_Resume
Sudheer_SAP_ABAP_ResumeSudheer_SAP_ABAP_Resume
Sudheer_SAP_ABAP_ResumeSudheer babu
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for DevelopersSplunk
 

Similar to Monitoring at-lazada (20)

Performance profiling and testing of symfony application 2
Performance profiling and testing of symfony application 2Performance profiling and testing of symfony application 2
Performance profiling and testing of symfony application 2
 
improving the performance of Rails web Applications
improving the performance of Rails web Applicationsimproving the performance of Rails web Applications
improving the performance of Rails web Applications
 
Mobile App Development for Startups | Phase Specific Presentation
Mobile App Development for Startups | Phase Specific PresentationMobile App Development for Startups | Phase Specific Presentation
Mobile App Development for Startups | Phase Specific Presentation
 
Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...
Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...
Google App Engine – niekonwencjonalna platforma aplikacji SaaS do Twojego nas...
 
An Introduction to Microservices
An Introduction to MicroservicesAn Introduction to Microservices
An Introduction to Microservices
 
apidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuseapidays LIVE Paris - GraphQL meshes by Jens Neuse
apidays LIVE Paris - GraphQL meshes by Jens Neuse
 
Google App Engine - unusual application plaform for your next SaaS Project
Google App Engine - unusual application plaform for your next SaaS ProjectGoogle App Engine - unusual application plaform for your next SaaS Project
Google App Engine - unusual application plaform for your next SaaS Project
 
Updated resume
Updated resumeUpdated resume
Updated resume
 
Shriraam-performance test engineer 5.4 years
Shriraam-performance test engineer 5.4 yearsShriraam-performance test engineer 5.4 years
Shriraam-performance test engineer 5.4 years
 
Splunk for Developers Breakout Session
Splunk for Developers Breakout SessionSplunk for Developers Breakout Session
Splunk for Developers Breakout Session
 
SplunkLive! Seattle - Splunk for Developers
SplunkLive! Seattle - Splunk for DevelopersSplunkLive! Seattle - Splunk for Developers
SplunkLive! Seattle - Splunk for Developers
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
 
How to Improve Performance Testing Using InfluxDB and Apache JMeter
How to Improve Performance Testing Using InfluxDB and Apache JMeterHow to Improve Performance Testing Using InfluxDB and Apache JMeter
How to Improve Performance Testing Using InfluxDB and Apache JMeter
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for Developers
 
Performance Testing ISV Apps to Scale
Performance Testing ISV Apps to ScalePerformance Testing ISV Apps to Scale
Performance Testing ISV Apps to Scale
 
Performance Testing ISV Apps to Scale 11/9/2016
Performance Testing ISV Apps to Scale 11/9/2016Performance Testing ISV Apps to Scale 11/9/2016
Performance Testing ISV Apps to Scale 11/9/2016
 
Sudheer_SAP_ABAP_Resume
Sudheer_SAP_ABAP_ResumeSudheer_SAP_ABAP_Resume
Sudheer_SAP_ABAP_Resume
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for Developers
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Monitoring at-lazada

  • 1. GO Monitoring at Lazada Rinat Takhautdinov & Ninh Dang Senior Software Engineers at PDP Vertical Team – Lazada
  • 2. Golang meetup # 11 About Us – PDP Vertical Department - We’re doing product development at Lazada - Product Detail Page - Home Page - Login/Signup - Work mostly with Golang and PHP
  • 3. Golang meetup # 11 - Overview of time-series metrics monitoring - Big picture - Prometheus - Grafana - How monitoring helps us at Lazada - Use case 1: Product Ratings & Reviews performance optimization - Use case 2: Shaping the product for Buyer-Seller Communication - Conclusions Agenda
  • 4. Golang meetup # 11 - Overview of time-series metrics monitoring - Big picture - Prometheus - Grafana - How monitoring helps us at Lazada - Use case 1: Product Ratings & Reviews performance optimization - Use case 2: Shaping the product for Buyer-Seller Communication - Conclusions Agenda
  • 5. Golang meetup # 11 Monitoring: General Architecture http://service:port/metrics
  • 6. Golang meetup # 11 What does Prometheus do? • Time series database • Written in Go - no external dependencies • Powerful functional query language • Pull, not push • Thousands of targets • Hundreds of thousands of samples per second • Millions of time series • Distributed (or not) a) can be run as a single node b) can be run as a cluster for redundancy & performance
  • 7. Golang meetup # 11 Exporters and direct instrumentation Client libraries https://prometheus.io/docs/instrumenting/clientlibs/ Official Go (https://github.com/prometheus/client_golang) Java (JVM) Ruby Python Unofficial .NET / C# Node.js Bash Exporters https://prometheus.io/docs/instrumenting/exporters/ Official Node/system metrics exporter Graphite exporter Collectd exporter JMX exporter HAProxy exporter StatsD bridge AWS CloudWatch exporter Hystrix metrics publisher Mesos task exporter Consul exporter Direct instrumentation Kubernetes Kubernetes-Mesos Etcd gokit go-metrics instrumentation library
  • 8. Golang meetup # 11 Metrics and labels Metric types: - counter - gauge - histogram - summary var httpResponsesTotal = prometheus.NewCounterVec( prometheus.CounterOpts{ Namespace: "my_super_puper_production", Subsystem: "http_server", Name: "http_responses_total", Help: "The count of http responses issued, classified by code and method.", }, []string{"code", "method"}, ) Listing 1 CAUTION: Remember that every unique combination of key- value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
  • 9. Golang meetup # 11 Powerful query language topk(3, sum(rate(bazooka_instance_cpu_time_ns[5m])) by (app, proc)) sort_desc(sum(bazooka_instance_memory_limit_bytes - bazooka_instance_memory_usage_bytes) by (app, proc)) Expression browser
  • 10. Golang meetup # 11 Grafana
  • 11. Golang meetup # 11 Grafana
  • 12. Golang meetup # 11 Grafana
  • 13. Golang meetup # 11 - Overview of time-series metrics monitoring - Big picture - Prometheus - Grafana - How monitoring helps us at Lazada - Use case 1: Product Ratings & Reviews performance optimization - Use case 2: Shaping the product for Buyer-Seller Communication - Conclusions Agenda
  • 14. Golang meetup # 11 Product Ratings & Reviews
  • 15. Golang meetup # 11 Problem – High level - Lazada has many campaigns throughout the year - How do we prepare for the campaigns? - Stress tests + monitoring - Find & fix bottlenecks - Verify results - What happened to us? - Site was broken, checkout not working - Backend service alert (Review API)
  • 16. Golang meetup # 11 Review API architecture
  • 17. Golang meetup # 11 What do we monitor for our Review API? - CPU / memory - Goroutines - Avg RT, RT per handler - Error rates - Throughput [per clients] - DB connection - Cache size (number of items) - Cache hits / misses (utilization) - …
  • 18. Golang meetup # 11 What do we monitor for our Review API?
  • 19. Golang meetup # 11 What do we monitor for our Review API?
  • 20. Golang meetup # 11 What we found in our case? - Number of cache items went up very high - Review API version fragmentation - Many calls on old version API endpoints
  • 21. Golang meetup # 11 Optimization in action 1. Reduce number of calls from clients. 2. Force all clients move to use the latest versions. 3. Deprecate outdated handlers (keep multi-version of API is expensive)
  • 22. Golang meetup # 11 Feel rock?
  • 23. Golang meetup # 11 Business metrics monitoring for product development Product delivering focused team Many kind of KPIs how to check product health need to tracked A lot of ideas how to make our products better Limited capacity to do it Smooth rollout and looking for better way
  • 24. Golang meetup # 11 Business metrics monitoring for product development
  • 25. Golang meetup # 11 Business metrics monitoring for product development - Helps us to validate and evolve our MVP - Transparent for everyone (e.g. dev team, country PMs) - Real time tracking product health
  • 26. Golang meetup # 11 Conclusions - Having monitoring is extremely critical for any business - If you have many teams, having standard is important - Define most important metrics for each microservice - Have some TV screens with some graphs near your team corner!
  • 27. Golang meetup # 11 Q&A
  • 28. Golang meetup # 11 - WE ARE HIRING! - 4 x Senior Backend Developers (GO/PHP) - Contact us for more information: - ninh.dang@lazada.com - rinat.takhautdinov@lazada.com

Editor's Notes

  1. -rps:1k