We’ll share an overview of leveraging serverless architectures to support high performance data intensive applications. Fulfillment by Amazon (FBA) built the Seller Inventory Authority Platform (IAP) using Amazon DynamoDB Streams, AWS Lambda functions, Amazon Elasticsearch Service, and Amazon Redshift to improve results and reduce costs. Scopely will share how they used a flexible logging system built on Kinesis, Lambda, and Amazon Elasticsearch to provide high-fidelity reporting on hotkeys in Memcached and DynamoDB, and drastically reduce the incidence of hotkeys. Both of these customers are using managed services and serverless architecture to build scalable systems that can meet the projected business growth without a corresponding increase in operational costs.
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
Similar to AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Results and Reduced Costs with a Serverless Architecture (DAT309) (20)
2. What to Expect from the Session
• Overview of serverless architectures
• Serverless patterns in high performance data use cases
• Fulfillment by Amazon: Serverless stream processing
• Scopely: Serverless logging & hot key detection
3. What is a Serverless Architecture?
Build applications and services
4. What is a Serverless Architecture?
Build applications and services
… without managing infrastructure
5. What is a Serverless Architecture?
Build applications and services
… without managing infrastructure
• Function as a unit of scale
• Decrease complexity, abstracted
language runtime
• Run code when it’s needed
• Enable increased agility
9. Data Enablement
• Data is always part of the architecture
• Data as a trigger for serverless processing
• Data as the object of serverless processing
• Data as the result of serverless processing
10. Data Processing Abstraction
• Lambda functions as event handlers:
one function per event type
• Lambda functions as serverless back ends:
one function per API / path
• Lambda functions for stream/data processing
one function per record or data type
Helps maintainability - hard to end up with monolithic/spaghetti code
11. Active Role
• Processing data at scale
• Event-based invocation
• ETL orchestration
• On-demand processing
Advantages:
• Reduce operational complexities
• Operational cost optimization
Serverless High Performance Data Patterns
Support Role
• Optimization of pre-existing
workloads and database utilization
• Management, monitoring
• Improved anomaly detection and
reaction
Advantages:
• Improved resilience
• Operational cost optimization
12. Serverless High Performance Data Patterns
FBA Seller Inventory Authority
Platform
Serverless Logging &
Hot Key Detection
15. How Does FBA Work?
1 Send
Inventory
2 Receive & Store
4 Pick, Pack & Ship
5 Customer Service
6 Customer Returns
Customer
Orders Product
3
16.
17. Data Platform Goals
• Single source of truth for seller’s inventory
• Reconciled view of inventory
• Surface and track discrepancies
18. Design Requirements
• Should handle high volume of input messages (> 10000 tps)
• Should handle hot keys in input messages
• E.g., Received inventory message for a very large shipment
• Should handle duplicate and out of order input messages
• Must maintain an audit trail for every inventory quantity
change
20. Launch Results
• 22 dev weeks savings in operational costs across 11
fleets
• Design to launch < 4 months
• Improved the accuracy of inbound quantities between
5% to 10%
• Reduced cost of business operations (for example,
seller contacts)
• Developers in the team enjoyed using AWS managed
services
21. Best Practices that helped
• Container reuse
• Instrument, measure and monitor
• Abstract Launch scaffolding from entity processing
• Canary and Dashboards
22. Best Practices that helped
• Container reuse
• Instrument, measure and monitor
• Abstract Launch scaffolding from entity processing
• Canary and Dashboards
23. Metrics and Instrumentation
if (adjustmentManager.doesAdjustmentAlreadyExist(metrics, adjustment)) {
// this adjustment already exists in our store. Nothing further needs to be done.
metrics.addCount(METRIC_IDEMPOTENT_CHECK_FAILED, 1, Unit.ONE);
log.debug("Adjustment already exists in DB. " + adjustment);
return null;
}
metrics.addCount(adjustment.getType().name(), 1, Unit.ONE);
metrics.addCount(String.join("#", adjustment.getSource().name(),
adjustment.getType().name()), 1, Unit.ONE);
To mitigate the lack of debug/tracing capabilities with Lambda, we
placed significant emphasis in instrumenting our code to help
understand how the code was performing during runtime.
24. Best Practices that helped
• Container reuse
• Instrument, measure and monitor
• Abstract Launch scaffolding from entity processing
• Canary and Dashboards
25. LambdaLaunchHelper
Invokes the appropriate event handler with the metrics instance and the
input
Event Handler is responsible for the deserialization of the items in the
request payload and invokes the Entity processor with the metrics instance
and the deserialized entity.
Common error/fault handling – writing to S3 and skipping the individual
entity at fault or fail the entire batch.
Common metrics
• Batch Size
• Total Time to process the batch
• Time to process an individual entity
• Entity Count in the batch
• Fault Count
26. Best Practices that helped
• Container reuse
• Instrument, measure and monitor
• Abstract Launch scaffolding from entity processing
• Canary and Dashboards
27. Canary and Availability of Pipeline
• Regular submission of synthetic transactions and events
to the pipeline to monitor the health of the pipeline.
• Use different keys to ensure that we monitor the various
shards at random
• Also helps us monitor to ensure we are within SLA for end to
end processing time
• Use Kinesis shard-level metrics to identify any potential
backlog or issues in the stream processing pipeline.
33. Our Hot Key Problem
What are hot keys?
• Heavily requested keys/objects
localized on a single
partition/shard/node
• Subject to capacity of that single
partition/shard/node
Amazon DynamoDB table
symptoms:
• Total consumed capacity is less
than provisioned capacity
• And experiencing throttle events
34. Our Hot Key Detection Process
Application uses Count Min Sketch to track top 10 keys then logs them
periodically (once per minute)
Count Min Sketch
• Probabilistic algorithm
• Fixed size of memory to track unlimited operations
• Small chance of a error
memcached-get-count topkey[1]: key=3.user.41401542 count=354 totalCount=68912 frequency=.00514
39. We Had More Bugs Than We Thought We Did!
Special case user ids
Key: “user.-1”
Configuration objects with a single
key retrieved on every request
Key: “dice-game-settings”
Didn’t notice because it wasn’t
causing problems (yet...).
ElastiCache Memcached Nodes
One node is sending out much more
data than the rest
40. Large Keys Are Also a Problem
We can look for unusually large keys if we increment by the size of objects
rather than 1
You can get the total bandwidth consumed by a specific key by summing the
events over a time period.
memcached-get-count topkey[1]: key=3.Dice.inv.21645428 size=20354 totalCount=1008912 frequency=.0203
41. We Caught Some Bots and Spammers
Initially shut bad users down
by deleting accounts.
Later implemented per-user
request rate limiting via
Memcached counters
42. Discovered Poor Client Behavior
• Hot keys were discovered on
several users
• Filtering by client platform revealed
that they were all from Android
• Android client made a request for
every game in the game list which
was huge for some small number
of users (making their user ids hot
keys)
46. Related Sessions
ARC402 - Serverless Architectural Patterns and Best Practices
- Drew Dennis & Maitreya Ranganath, AWS Solutions Architects; Ajoy Kumar, Architect,
BMC Software
DAT304 - Deep Dive on Amazon DynamoDB
- Rick Houlihan, AWS Principal TPM, DBS NoSQL
DAT306 – ElastiCache Deep Dive: Best Practices and Usage Patterns
- Michael Labib, AWS In-Memory NoSQL Solutions Architect; Brian Kaiser, CTO, Hudl