Amazon DynamoDB is a fully managed NoSQL database service for applications that need consistent, single-digit millisecond latency at any scale. This talk explores DynamoDB capabilities and benefits in detail and discusses how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We also explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, Streams, and more.
17. Hot key issues manifest after you scale
Client
Client
Table
Partition
Table
Partition
Client
Client
Client
Client
Partition
Partition
Partition
Partition
18. A bad choice for a partition key
f(x)
Partition 1 Partition 2 Partition 3 Partition 4
Partition key: “07-07-2016”
Range key: “Session Attendee X”
Partition key: “07-07-2016”
Range key: “Session Attendee Y”
Table: SummitSessionAttendance
19. But I have random partition keys!
Keys/partition is important but also other outliers:
- Frequency (Hot keys)
- Size (Large objects or collections)
- Table history (partitions are not merged)
?
20. Partition key value Uniformity
User ID, where the application has many users and each
user has similar activity levels.
Status code, where there are only a few possible status
codes.
Device ID, where each device accesses data at relatively
similar intervals
Device ID, where one device generates a lot more traffic
than any other device
21. What a hot partition problem looks like
Read Capacity Throttled read requests
provisioned
consumed
22. Troubleshooting hot partitions
- CloudWatch
- AWS Support
- Access logs
- ReturnConsumedCapacity
- Sampling works well
- GSIs
- must also have enough write capacity
- uniformity requirement also applies
25. Query rather than scan
Query
- Specify partition key name
- Condition on sort key
- Cheap with high cardinality
keys
Scan
- Reads all data
- Conditions available
through filters
- Expensive for large tables
Partition Sort Atribute1 … Attribute N
26. When you have to scan a table
• Scans constrained by single
partition throughput
• Use parallel Scans if
table>20GB
• Avoid sudden bursts vs
provisioned capacity
• Offload to S3, HDFS,
Redshift, ElasticSearch or
second table
29. Partition 1
2000 RCUs
Partition K
2000 RCUs
Partition M
2000 RCUs
Partition 50
2000 RCU
Scaling bottlenecks
Product A Product B
Shoppers
ProductCatalog Table
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"
30. Partition 1 Partition 2
ProductCatalog Table
User
DynamoDB
User
Cache
popular items
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"
36. Trade off read cost for write scalability
Consider throughput per partition key
Shard write-heavy partition keys
Your write workload is not horizontally
scalable
38. Auto Scaling
• Cost saving technique
• Open Source solutions
• Set minimums and maximums
• Scale up proactively, scale down conservatively
• Scale up time can be from minutes to hours
• Implement a circuit-breaker
40. A mix of hot and cold data
Events_tableil
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N RCUs = 10000
WCUs = 10000Current table
Antipattern:
• Mix of hot and cold data
• Old data rarely accessed
• Unbounded data (partition) growth
• Partition dilution
• Scan costs increase with table size
• Deletes of old data not trivial or cheap
41. Time series tables
Events_table_2015_April
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
Events_table_2015_March
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
Events_table_2015_Feburary
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
Events_table_2015_January
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
RCUs = 1000
WCUs = 1
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
RCUs = 10
WCUs = 1
Current table
Older tables
HotdataColddata
Don’t mix hot and cold data; archive cold data to Amazon S3
42. Use a table per time period
Precreate daily, weekly, monthly tables
Provision required throughput for current table
Writes go to the current table
Turn off (or reduce) throughput for older tables
Cheaper scans – free deletes
Dealing with time series data
44. GameId Date Host Opponent Status
d9bl3 2014-10-02 David Alice DONE
72f49 2014-09-30 Alice Bob PENDING
o2pnb 2014-10-08 Bob Carol IN_PROGRESS
b932s 2014-10-03 Carol Bob PENDING
ef9ca 2014-10-03 David Bob IN_PROGRESS
Games table
Hierarchical data structures
45. Query for incoming game requests
DynamoDB indexes provide partition and sort
What about queries for two equalities and a sort?
SELECT * FROM Game
WHERE Opponent='Bob‘
AND Status=‘PENDING'
ORDER BY Date DESC
(hash)
(range)
(?)
46. Secondary index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Approach 1: Query filter
BobPartition key Sort key
47. Secondary Index
Approach 1: Query filter
Bob
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
(filtered out)
49. Send back less data “on the wire”
Simplify application code
Simple SQL-like expressions
• AND, OR, NOT, ()
Use query filter
Your index isn’t entirely selective
51. Secondary Index
Approach 2: Composite key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Partition key Sort key
52. Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary index
Approach 2: Composite key
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'
54. Sparse indexes
CustomerId
(Partition)
OrderId
(Sort)
Total Date Open
1 234234 $100 2016-07-01
1 526346 $10 2016-07-02
2 746346 $200 2016-07-02
1 23462 $300 2016-07-05 X
3 635245 $150 2016-07-05
4 245362 $80 2016-07-07
Customer Orders
CustomerId
(Partition)
Open
(Sort)
Total OrderId Date
1 X $300 23462 2016-07-05
OpenOrders-GSI
55. Concatenate attributes to form useful
secondary index keys
Take advantage of sparse indexes
Replace filter with indexes
You want to optimize a query as much
as possible
Status + Date
57. Messages
table
Messages app
David
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
Inbox
SELECT *
FROM Messages
WHERE Sender ='David'
LIMIT 50
ORDER BY Date DESC
Outbox
58. Recipient Date Sender Message
David 2014-10-02 Bob …
… 48 more messages for David …
David 2014-10-03 Alice …
Alice 2014-09-28 Bob …
Alice 2014-10-01 Carol …
Large and small attributes mixed
(Many more messages)
David
Messages table
50 items × 256 KB each
Partition key Sort key
Large message bodies
Attachments
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
Inbox
59. Computing inbox query cost
Items evaluated by query
Average item size
Conversion ratio
Eventually consistent reads
50 * 256KB * (1 RCU / 4KB) * (1 / 2) = 1600 RCU
All those RCUs against one partition key
60. Recipient Date Sender Subject MsgId
David 2014-10-02 Bob Hi!… afed
David 2014-10-03 Alice RE: The… 3kf8
Alice 2014-09-28 Bob FW: Ok… 9d2b
Alice 2014-10-01 Carol Hi!... ct7r
Separate the bulk data
Inbox-GSI Messages table
MsgId Body
9d2b …
3kf8 …
ct7r …
afed …
David
1. Query Inbox-GSI: 1 RCU
2. BatchGetItem Messages: 1600 RCU
(50 separate items at 256 KB)
(50 sequential items at 128 bytes)
64. Reduce one-to-many item sizes
Configure secondary index projections
Use GSIs to model M:N relationship
between sender and recipient
Distribute large items
Querying many large items at once
InboxMessagesOutbox
68. DynamoDB Streams
Open Source Cross-
Region Replication Library
Asia Pacific (Sydney) EU (Ireland) Replica
US East (N. Virginia)
Cross-region replication