Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1

Share

Download to read offline

Observability and its application

Download to read offline

- Discuss the role of Observability (Logging; Tracing; and Metric) in modern architecture.
- How to implement observability in Golang using OpenCensus.
- The 4 golden signals when designing the metrics.
- How to apply observability into the process.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Observability and its application

  1. 1. Observability Huynh Quang Thao Trusting Social
  2. 2. Observability
  3. 3. What is the logging ?
  4. 4. What is the distributed tracing ? - tracing involves a specialized use of logging to record information about a program's execution.
  5. 5. What is the metric ? - tracing involves a specialized use of logging to record information about a program's execution. - Example metrics: - Average time to query products table.
  6. 6. Logging: ELK / EFK
  7. 7. Tracing Microservice 1 Exporter BMicroservice 2 Exporter Microservice 3 Exporter Backend
  8. 8. Tracing Microservice 1 Exporter BMicroservice 2 Exporter Microservice 3 Exporter Backend
  9. 9. Tracing Microservice 1 Exporter BMicroservice 2 Exporter Microservice 3 Exporter Backend
  10. 10. Metric Microservice 1 Exporter BMicroservice 2 Exporter Microservice 3 Exporter Backend
  11. 11. Metric Microservice 1 Exporter BMicroservice 2 Exporter Microservice 3 Exporter Backend
  12. 12. /metrics API for Prometheus
  13. 13. Language & Exporter Matrix
  14. 14. Why use OpenCensus / OpenTracing - Standardize format with backends (Jaeger, Zipkin, …) - Abstract logic code.
  15. 15. Why use OpenCensus / OpenTracing - Standard format with backend (Jaeger, Zipkin, …) - Abstract logic code. _, span := trace.StartSpan(r.Context(), "child") defer span.End() span.Annotate([]trace.Attribute{trace.StringAttribute("key", "value")}, “querying") span.AddAttributes(trace.StringAttribute("hello", "world")) je, err := jaeger.NewExporter(jaeger.Options{ AgentEndpoint: agentEndpoint, CollectorEndpoint: collectorEndpoint, ServiceName: service, }) trace.RegisterExporter(je)
  16. 16. Opencensus Architecture
  17. 17. Export directly to the backend Microservice 1 Exporter BMicroservice 2 Exporter Microservice 3 Exporter Backend
  18. 18. Export directly to the backend OpenCensus local z-pages
  19. 19. Export directly to the backend - Coupling between each microservice with the backend. - If we want to change the backend, we must update code on every service. - If we want to change some conWigurations, we must update code on service. - Scaling exporter languages (i.e Jaeger: must be written for all supported languages Golang, Java, Python, …) - Manage ports on some backends such as Prometheus.
  20. 20. OpenCensus Service
  21. 21. OpenCensus Service
  22. 22. OpenCensus Service
  23. 23. Jaeger Architecture
  24. 24. OpenCensus Service - Decoupling between services and tracing/metric backends. - OpenCensus collector supports intelligent sampling. (tail-based approach) - Preprocess data (annotate span, update tags …) before come to another backends. - Don’t have much documentation now. But we can get reference to Jaeger for similar deployment.
  25. 25. Opencensus Concepts
  26. 26. Tracing
  27. 27. Trace - A trace is a tree of spans. - Every request which sends from the client will generate a TraceID. - Showing the path of the work through the system.
  28. 28. Trace - A trace is a tree of spans. - Every request which sends from the client will generate a TraceID. - Showing the path of the work through the system.
  29. 29. Span - A span represents a single operation in a trace. - A span could be representative of an HTTP request, a RPC call, a database query. - User deWines code path: start and end.
  30. 30. Span doSomeWork(); // sleep 3s _, span := trace.StartSpan(r.Context(), "parent span") defer span.End() doSomeWork(); _, childrenSpan := trace.StartSpan(r.Context(), "children span") defer childrenSpan.End() doSomeWork();
  31. 31. Tag - Tag is the key-value pair of data which associated with each trace. - Helpful for the reporting, searching, Wiltering …
  32. 32. Tag - Tag is the key-value pair of data which associated with each trace. - Helpful for the reporting, searching, Wiltering …
  33. 33. Tag _, childrenSpan := trace.StartSpan(r.Context(), "children span") defer childrenSpan.End() childrenSpan.AddAttributes(trace.StringAttribute("purpose", "test"))
  34. 34. Trace Sampling There are 4 levels: - Always - Never - Probabilistic - Rate limiting - Should be Probabilistic / Rate limiting - Never for un-sampling request.
  35. 35. Trace Sampling There are 4 levels: - Always - Never - Probabilistic - Rate limiting - Should be Probabilistic / Rate limiting - Never for un-sampling request. trace.ApplyConfig(trace.Config{DefaultSampler: trace.AlwaysSample()}) trace.ApplyConfig(trace.Config{DefaultSampler: trace.ProbabilitySampler(0.7)}) Global Con:iguration Via Span _, span := trace.StartSpan(r.Context(), "child", func(options *trace.StartOptions) { options.Sampler = trace.AlwaysSample() })
  36. 36. OpenCensus sample rules The OpenCensus use the head-based sampling with following rules: 1. If the span is a root Span: • If a "span-scoped" Sampler is provided, use it to determine the sampling decision. • Else use the global default Sampler to determine the sampling decision. 2. If the span is a child of a remote Span: • If a "span-scoped" Sampler is provided, use it to determine the sampling decision. • Else use the global default Sampler to determine the sampling decision. 3. If the span is a child of a local Span: • If a "span-scoped" Sampler is provided, use it to determine the sampling decision. • Else keep the sampling decision from the parent.
  37. 37. OpenCensus sample rules The OpenCensus use the head-based sampling with following rules: 1. If the span is a root Span: • If a "span-scoped" Sampler is provided, use it to determine the sampling decision. • Else use the global default Sampler to determine the sampling decision. 2. If the span is a child of a remote Span: • If a "span-scoped" Sampler is provided, use it to determine the sampling decision. • Else use the global default Sampler to determine the sampling decision. 3. If the span is a child of a local Span: • If a "span-scoped" Sampler is provided, use it to determine the sampling decision. • Else keep the sampling decision from the parent. Disadvantages: - Might lost some useful data. - Can be Wixed by using the tail-based approach on the OpenCensus collector. References: - https://github.com/census-instrumentation/opencensus-specs/blob/master/ trace/Sampling.md - https://sWlanders.net/2019/04/17/intelligent-sampling-with-opencensus/
  38. 38. Metrics
  39. 39. Measure - A measure represents a metric type to be recorded. - For example, request latency is in µs and request size is in KBs. - A measure includes 3 Wields: Name - Description - Unit - Measure supports 2 type: Wloat and int GormQueryCount = stats.Int64( // Type: Integer GormQueryCountName, // name "Number of queries started", // description stats.UnitDimensionless, // Unit )
  40. 40. Measurement - Measurement is a data point produced after recording a quantity by a measure. - A measurement is just a raw statistic. measurement := GormQueryCount.M(1) // M creates a new int64 measurement. // Use Record to record measurements. func (m *Int64Measure) M(v int64) Measurement { return Measurement{ m: m, desc: m.desc, v: float64(v), } } stats.Record(wrappedCtx, GormQueryCount.M(1))
  41. 41. View - Views are the coupling of an Aggregation applied to a Measure and optionally Tags. - Supported aggregation function: Count / Distribution / Sum / LastValue. - Multiple views can use same measure but only when different aggregation. - The various tags used to group and Wilter collected metrics later on. GormQueryCountView = &view.View{ Name: GormQueryCountName, Description: "Count of database queries based on Table and Operator", TagKeys: []tag.Key{GormOperatorTag, GormTableTag}, Measure: GormQueryCount, Aggregation: view.Count(), }
  42. 42. Metric Sampling Stats are NOT sampled to be able to represent uncommon cases hence, stats are ALWAYS recorded unless dropped.
  43. 43. Context Propagation
  44. 44. Context Propagation: B3 Standard Header Data: X-B3-Sampled:[1] X-B3-Spanid:[dacdb2208f874447] X-B3-Traceid:[9ca4a513af5f299a856dec51336a051b] var requestOption = comm.RequestOption{ Transport: &ochttp.Transport{ Propagation: &b3.HTTPFormat{}, Base: &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }, }, }
  45. 45. Context Propagation: OpenTracing Standard Header Data: Traceparent:[00-a9f4dc05b7a78f6f2f717d7396d9450f-187065dac4cd685c-01] var requestOption = comm.RequestOption{ Transport: &ochttp.Transport{ Propagation: &tracecontext.HTTPFormat{}, Base: &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }, }, }
  46. 46. Implementation
  47. 47. 1. HTTP Handler mux := http.NewServeMux() mux.HandleFunc("/first", firstAPI) // wrap handler inside OpenCensus handler for tracing request och := &ochttp.Handler{ Handler: mux, } // start if err := http.ListenAndServe(address, och); err != nil { panic(err) }
  48. 48. 1. HTTP Handler // vite/tracing // WrapHandlerWithTracing wraps handler inside OpenCensus handler for tracing func WrapHandlerWithTracing(handler http.Handler, option OptionTracing) (http.Handler, error) { // processing option here // ... handler = &ochttp.Handler{ Propagation: propagationFormat, IsPublicEndpoint: option.IsPublicEndpoint, StartOptions: startOptions, Handler: handler, } return handler, nil } Wrap normal http.Handler with ochttp.Handler
  49. 49. 2. HTTP Transport Layer var DefaultTransport = &ochttp.Transport{ Propagation: &tracecontext.HTTPFormat{}, Base: &http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, }, } var DefaultTransport = http.Transport{ TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, } Before After
  50. 50. 3. Callback: GORM func RegisterGormCallbacksWithConfig(db *gorm.DB, cfg *GormTracingCfg) { db.Callback().Create() .Before(“gorm:create") .Register("instrumentation:before_create", cfg.beforeCallback(CreateOperator)) db.Callback().Create().After(“gorm:create") .Register("instrumentation:after_create", cfg.afterCallback()) //more callbacks here } func MigrateDB() { testDB = createDBConnection() RegisterGormCallbacks(testDB) } Register all necessary callbacks for GORM
  51. 51. 3. Callback: GORM func GormWithContext(ctx context.Context, origGorm *gorm.DB) *gorm.DB { return origGorm.Set(ScopeContextKey, ctx) } Wrap Gorm Object with context before calling database operator orm := tracing.GormWithContext(r.Context(), testDB) product, _ := GetFirstProductWithContext(orm) func GetFirstProductWithContext(db *gorm.DB) (*Product, error) { r := &Product{} if err := db.First(r, 1).Error; err != nil { if gorm.IsRecordNotFoundError(err) { return nil, nil } log.Println(vite.MarkError, err) return nil, err } return r, nil }
  52. 52. 4. Callback: Redis // takes a vanilla redis.Client and returns trace instrumented version func RedisWithContext(ctx context.Context, origClient *redis.Client) *redis.Client { client := origClient.WithContext(ctx) client.WrapProcess(perCommandTracer(ctx, &redisDefaultCfg)) return client } // perCommandTracer provides the instrumented function func perCommandTracer(ctx context.Context, cfg *RedisTracingCfg, ) func(oldProcess func(cmd redis.Cmder) error) func(redis.Cmder) error { return func(fn func(cmd redis.Cmder) error) func(redis.Cmder) error { return func(cmd redis.Cmder) error { span := cfg.startTrace(ctx, cmd) defer cfg.endTrace(span, cmd) err := fn(cmd) return err } } }
  53. 53. 4. Callback: Redis // wrap redis object before calling redis operator wrapRedis := tracing.RedisWithContext(r.Context(), Redis.Client) readKeyWithContext(wrapRedis, "service", "StackOverFlow") func readKeyWithContext(client *redis.Client, key string) string { return client.Get(key).String() } Client side: wrap again Redis client with context before calling Redis operator.
  54. 54. 5. Exporter func RunJaegerExporter(service string, agentEndpoint string, collectorEndpoint string) (*jaeger.Exporter, error) { je, err := jaeger.NewExporter(jaeger.Options{ AgentEndpoint: agentEndpoint, CollectorEndpoint: collectorEndpoint, ServiceName: service, }) if err != nil { return nil, err } trace.RegisterExporter(je) trace.ApplyConfig(trace.Config{DefaultSampler: trace.ProbabilitySampler(0.2)}) return je, nil } _, err := tracing.RunJaegerExporter( "trusting_social_demo", "localhost:6831", "http://localhost:14268/api/traces", ) Export to Jaeger
  55. 55. 5. Exporter Export to Console _, err = tracing.RunConsoleExporter() if err != nil { panic(err) } // Start starts the metric and span data exporter. func (exporter *LogExporter) Start() error { exporter.traceExporter.Start() exporter.viewExporter.Start() err := exporter.metricExporter.Start() if err != nil { return err } return nil }
  56. 56. 5. Exporter Export to Prometheus func RunPrometheusExporter(namespace string) (*prometheus.Exporter, error) { pe, err := prometheus.NewExporter(prometheus.Options{ Namespace: namespace, }) view.RegisterExporter(pe) return pe, nil } // add api endpoint for prometheus app.Mux.Handle("/metrics", pe) scrape_configs: - job_name: 'trustingsocial_ocmetrics' scrape_interval: 5s static_configs: - targets: ['host.docker.internal:3000'] Create entry point /metrics for prometheus service call Sample prometheus conWiguration:
  57. 57. 6. Register views Register all database views err := tracing.RegisterAllDatabaseViews() if err != nil { panic(err) } defer tracing.UnregisterAllDatabaseViews() // RegisterAllDatabaseViews registers all database views func RegisterAllDatabaseViews() error { return view.Register(GormQueryCountView) } Register all Redis views err = tracing.RegisterAllRedisViews() if err != nil { panic(err) } defer tracing.UnregisterAllRedisViews()
  58. 58. Write custom exporter
  59. 59. Export trace func (exporter *TraceExporter) ExportSpan(sd *trace.SpanData) { var ( traceID = hex.EncodeToString(sd.SpanContext.TraceID[:]) spanID = hex.EncodeToString(sd.SpanContext.SpanID[:]) parentSpanID = hex.EncodeToString(sd.ParentSpanID[:]) ) // RunJaegerExporter exports trace to Jaeger } func (exporter *TraceExporter) Start() { trace.RegisterExporter(exporter) } 1. Implement ExportSpan function 2. Call trace.RegisterExporter
  60. 60. Export view // ExportView implements view.Exporter's interface func (exporter *ViewExporter) ExportView(vd *view.Data) { for _, row := range vd.Rows { } } // Start starts printing log func (exporter *ViewExporter) Start() { view.RegisterExporter(exporter) } 1. Implement ExportView function 2. Call view.RegisterExporter
  61. 61. Export metric // ExportMetrics implements metricexport.Exporter's interface. func (exporter *MetricExporter) ExportMetrics(ctx context.Context, metrics []*metricdata.Metric) error { for _, metric := range metrics { // process each metric } return nil } // Start starts printing log func (exporter *MetricExporter) Start() error { exporter.initReaderOnce.Do(func() { exporter.intervalReader, _ = metricexport.NewIntervalReader( exporter.reader, exporter, ) }) exporter.intervalReader.ReportingInterval = exporter.reportingInterval return exporter.intervalReader.Start() } 1. Implement ExportMetrics function 2. Interval polling to get latest metric data
  62. 62. How to deWine useful metrics
  63. 63. The four golden signals 1. Latency 2. TrafWic 3. Errors 4. Saturations
  64. 64. Latency 1. Latency • The time it takes to service a request. • important to distinguish between the latency of successful requests and the latency of failed requests. • it’s important to track error latency, as opposed to just Wiltering out errors. Example: • Database: time to query to database server. • HTTP request: time from the beginning to the end of the request.
  65. 65. TrafWic 1. Traf:ic • A measure of how much demand is being placed on your system, measured in a high- level system-speciWic metric. Example: • HTTP request: HTTP Requests per second • Database: Successfully / Fail queries per second. • Redis: Successfully / Fail queries (without not found) queries per second.
  66. 66. Error 1. Error • The rate of requests that fail, either explicitly, implicitly or policy. • Explicit: request with http status code 500. • Implicit: an HTTP 200 success response, but coupled with the wrong content) • Policy: If you committed to one-second response times, any request over one second is an error   Example: • HTTP request: request with status code not 200 • Redis: Queries that return error code (without not found). • Database: Queries that return error code (without not found)
  67. 67. Saturation 1. Error • How "full" your service is: Explicit, Implicit or Policy Explicit: request with http status code 500. Implicit: an HTTP 200 success response, but coupled with the wrong content) Policy: If you committed to one-second response times, any request over one second is an error • Latency increases are often a leading indicator of saturation. • Measuring your 99th percentile response time over some small window can give a very early signal of saturation.   Example: • HTTP request: System loads such as CPU, RAM … • Redis: Idle / Active / Inactive connections in connection pool. • Database: Idle / Active / Inactive connections in connection pool.
  68. 68. The four golden signals Already implemented in tracing repository, in 4 packages: redis / gorm / http Must read - How to measure on production environment - Systematically way to resolving production issues …
  69. 69. Tracing Repository Repository: https://github.com/tsocial/tracing • Implemented callbacks for Redis, Gorm and HTTP Handler. • DeWined and implemented observability for each package. • Implemented some exporters (e.g: Jaeger, Prometheus, …). • Implemented console exporter and simple exporter for testing. • Example project to demonstrate the usage. • Decoupling with the Telco platform. Open Source ?
  70. 70. Sample project 1. Repository: https://github.com/tsocial/distributed_tracing_demo - Test with Gorm/Redis - Test tracing with console exporter - Test with Jaeger /Prometheus - Call external service - Call internal service - TODO: test with OpenCensus service 2. Repository: https://github.com/census-instrumentation/opencensus-service/blob/master/demos/trace/ docker-compose.yaml - Test with OpenCensus service - Multiple internal services - Jaeger / Prometheus / Zipkin …
  71. 71. References - Documentation: https://opencensus.io - Examples: https://github.com/census-instrumentation/opencensus-go/tree/master/examples - How not to measure latency: https://www.youtube.com/watch?v=lJ8ydIuPFeU - SpeciWication for B3 format: https://github.com/apache/incubator-zipkin-b3-propagation - SpeciWication for OpenTracing format: • https://www.w3.org/TR/trace-context/#dfn-distributed-traces • https://github.com/opentracing/speciWication/issues/86 - Logging architecture: https://kubernetes.io/docs/concepts/cluster-administration/logging/ - Nice post about OpenCensus vs OpenTracing: https://github.com/gomods/athens/issues/392 - OpenCensus service Design: https://github.com/census-instrumentation/opencensus-service/blob/master/DESIGN.md - Distributed tracing at Uber: https://eng.uber.com/distributed-tracing/ - Tracing HTTP request latency: https://medium.com/opentracing/tracing-http-request-latency-in-go-with- opentracing-7cc1282a100a - Context propagation: https://medium.com/jaegertracing/embracing-context-propagation-7100b9b6029a - Only book about distributed tracing: https://www.amazon.com/Mastering-Distributed-Tracing-performance-microservices/ dp/1788628462 - https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals
  72. 72. Q&A
  • hqt

    Dec. 28, 2019

- Discuss the role of Observability (Logging; Tracing; and Metric) in modern architecture. - How to implement observability in Golang using OpenCensus. - The 4 golden signals when designing the metrics. - How to apply observability into the process.

Views

Total views

173

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

10

Shares

0

Comments

0

Likes

1

×