This webinar provides an overview of Amazon DynamoDB, a fast, flexible, and fully managed NoSQL database service for Mobile, Web, AdTech, IOT and Gaming applications that need consistent, single-digit millisecond latency at any scale.The webinar will cover key topics around general architecture of DynamoDB, data types, throughput provisioning, querying and indexing, and recent features.
The webinar includes a live demo of the basic operations used to read and write data to a DynamoDB table, and how the concept of provisioned IO affects the throughput of these operations.
Learning Objectives:
Enable users to understand how DynamoDB works so that they can evaluate and use DynamoDB as the data store for their application
5. What is NoSQL?
NoSQL is a term to describe data stores that trade full ACID
compliance for high availability and scale.
A
C
I
D
solation
urability
onsistency
tomicity Single row/single item only
Eventual consistency
Dirty Read
Data replication on commodity storage
6. Why NoSQL?
• Dirty Reads?
• Eventual Consistency?
• Single row transactions only?
• Why would anybody trade ACID compliance for this?
7. NoSQL – Availability and Scale
Traditional SQL NoSQL
DB
Primary Secondary
Scale Up
DB
DB
DBDB
DB DB
Scale Out
9. The CAP Theorem
Network partitions will happen in
distributed systems:
DB
DBDB
DB DB
Consistency
Availability
Partition Tolerance
C A
P
CA
APCP
10. Why NoSQL?
• Horizontal Scaling allows for infinite scalability
• Cheaper to scale out than to scale up
• Full consistency or availability that can survive a network
partition
• Full ACID compliance is often not needed
12. What is a Managed Service?
• A managed service is a web service in which consumers
of the service never need to interact directly with the
underlying compute, storage, and network resources.
14. DynamoDB is a Managed Service
• AWS runs all the database infrastructure for you!
• All the benefits and none of the operational overhead of running a
distributed system:
• Infinitely scalable read and write I/O
• High availability within a region
• Data durably stored in 3 availability zones
• Cross-region replication
• Easily export data to S3
• Triggers using Lambda functions
• Tight integration with Kinesis, Lambda, EMR, and Redshift
• Pay only for what you use, when you need it
17. Data types
String (S)
Number (N)
Binary (B)
String Set (SS)
Number Set (NS)
Binary Set (BS)
Boolean (BOOL)
Null (NULL)
List (L)
Map (M)
Used for storing nested JSON documents
18. 00 55 A954 AA FF
Hash table
Hash key uniquely identifies an item
Hash key is used for building an unordered hash index
Table can be partitioned for scale
00 FF
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Engg
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space
19. Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
20. Hash-range table
• Hash key and range key together uniquely identify an Item.
• Within unordered hash index, data is sorted by the range key.
• No limit on the number of items (∞) per hash key.
• Unless you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AA
Partition 1 Partition 2 Partition 3
21. Local Secondary Index (LSI)
alternate range key + same hash key
index and table data is co-located (same partition)
10 GB max per hash key, i.e.
LSIs limit the # of range keys!
22. Global Secondary Index
any attribute indexed as
new hash and/or range key
RCUs/WCUs
provisioned separately
for GSIs
Online indexing
23. LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use
GSI!
27. Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strongly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU
28. Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/partition = 10/3 = 3.33 GB
RCUs and WCUs are uniformly
spread across partitions
# of partitions (IO capacity) = 5000/3000 RCU + 500/1000 WCU = 2.17
# of partitions (storage) = 8/10 GB = 0.8
# of partitions = ceiling(max(2.17, 0.8)) = 3