2. Introduction
- Containers, Serverless, Microservice Architecture change the way the software is
built
- The systems are more distributed, and more ephemeral
- No Complex system is ever fully healthy
- Better Resilience and Fault Tolerance is the goal
- Ease of debugging is a cornerstone to maintain and evolve robust systems
3. Observability
- Internal states of the system should be inferred by
its external outputs
- Reduce MTTD and MTTR
- Verifying the health of the service proactively
- To know what’s broken, and why?
- Provides the all-important feedback that drives
future iterations
4.
5. Our Business Case
- To Collect logs, traces and metrics from Mobile/Web Browser
- Get insights of the application
- Understanding the user behavior patterns
- Monitor application performance
6. Front-end Logging Service
- Exposed a REST Endpoint
- Spring boot application which accepts the
compressed log message
- Decompress and Validate the Payload
- Forward it to the application’s log
destination (Splunk)
Requirements:
- 20000 Transactions per second
- 1 second latency
Internet
Logging
Service
AWS Account
Compressed Batched Logs
7. Latency Improvement
We split the service into two microservices.
Producer:
- Receives request and Validate the sender
- Accepts the payload
- Puts the data to queue
Consumer:
- Polls the data from queue
- Extract the payload and Validate the data
- Sends it to log destination
Logging
Service -
Producer
Logging
Service -
Consumer
SQS
9. Well Architected Framework
Five pillars :
- Operational excellence
- Security
- Reliability
- Performance efficiency
- Cost optimization
10. EC2 Setup
Producer:
- Compute Intensive (c5.2xlarge)
- No of instances : 3 to 20
Consumer:
- Memory Intensive (m5.2xlarge)
- No of Instances : 3 to 20
Alarms:
- Based on JVM metrics sent to Cloud watch
12. Route 53
- Expose the producer ELB through Route 53
- Route 53 endpoint is hosted behind Intuit API
gateway
- Disaster recovery through multiple CName across
region
EC2 EC2 EC2
17. Target Groups
- With auto scaling and load balancers involved, target groups will route
requests to EC2s and microservices
- Requests are being sent to new targets as soon as the registration is
complete and initial health check is passed
20. AMI Restack
Background:
- Intuit compliance team applies security patches and new baseline images are
released every 2 weeks
- App teams must either use these AMIs or derive AMIs from those baseline images
- Automated this entire process by using CW Rule and Codebuild services
22. Code build logs - Baking Logging service AMI
- Launch the new EC2 instance from Baseline AMI
- Copy chef recipes required to install software like java etc.. and
configuration required for Splunk forwarder and log rotation
- Bake logging service AMI
- Publish cloud watch event with the AMI id
24. CW rule on Baked AMI
- Cloud watch rule configured to trigger on baked logging service AMI
- We have 2 targets configured on this CW Rule
- Lambda function: Creates new launch config with new AMI and updates
ASG
- Code pipeline: CD service to automate the steps to release logging
service