AWS July Webinar Series - Getting Started with Amazon DynamoDB

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Nate Slater, AWS Solutions Architect
July 30, 2015
Introduction to DynamoDB

Agenda
• What is DynamoDB?
• DynamoDB Fundamentals
• Typical Workloads and Use Cases
• Demo

What is DynamoDB?
DynamoDB is a fully managed, NoSQL document and key-
value data store.

What is NoSQL?
NoSQL is a term to describe data stores that trade full ACID
compliance for high availability and scale.
A
C
I
D
solation
urability
onsistency
tomicity Single row/single item only
Eventual consistency
Dirty Read
Data replication on commodity storage

Why NoSQL?
• Dirty Reads?
• Eventual Consistency?
• Single row transactions only?
• Why would anybody trade ACID compliance for this?

NoSQL – Availability and Scale
Traditional SQL NoSQL
DB
Primary Secondary
Scale Up
DB
DB
DBDB
DB DB
Scale Out

The CAP Theorem
Network partitions will happen in
distributed systems:
DB
DBDB
DB DB
Consistency
Availability
Partition Tolerance
C A
P
CA
APCP

Why NoSQL?
• Horizontal Scaling allows for infinite scalability
• Cheaper to scale out than to scale up
• Full consistency or availability that can survive a network
partition
• Full ACID compliance is often not needed

What is a Managed Service?
• A managed service is a web service in which consumers
of the service never need to interact directly with the
underlying compute, storage, and network resources.

DynamoDB is a Managed Service
• AWS runs all the database infrastructure for you!
• All the benefits and none of the operational overhead of running a
distributed system:
• Infinitely scalable read and write I/O
• High availability within a region
• Data durably stored in 3 availability zones
• Cross-region replication
• Easily export data to S3
• Triggers using Lambda functions
• Tight integration with Kinesis, Lambda, EMR, and Redshift
• Pay only for what you use, when you need it

DynamoDB Table
Table
Items
Attributes
Hash
Key
Range
KeyMandatory
Key-value access pattern
Determines data
distribution
Optional
Model 1:N relationships
Enables rich query capabilities
All items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top/bottom N values
paged responses

Data types
String (S)
Number (N)
Binary (B)
String Set (SS)
Number Set (NS)
Binary Set (BS)
Boolean (BOOL)
Null (NULL)
List (L)
Map (M)
Used for storing nested JSON documents

00 55 A954 AA FF
Hash table
Hash key uniquely identifies an item
Hash key is used for building an unordered hash index
Table can be partitioned for scale
00 FF
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Engg
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space

Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N

Hash-range table
• Hash key and range key together uniquely identify an Item.
• Within unordered hash index, data is sorted by the range key.
• No limit on the number of items (∞) per hash key.
• Unless you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AA
Partition 1 Partition 2 Partition 3

Local Secondary Index (LSI)
alternate range key + same hash key
index and table data is co-located (same partition)
10 GB max per hash key, i.e.
LSIs limit the # of range keys!

Global Secondary Index
any attribute indexed as
new hash and/or range key
RCUs/WCUs
provisioned separately
for GSIs
Online indexing

LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use
GSI!

CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
UpdateItem
DeleteItem
BatchWriteItem
GetItem
Query
Scan
BatchGetItem
ListStreams
DescribeStream
GetShardIterator
GetRecords
TableAPIItemAPI
New
DynamoDB
API
Stream API

DynamoDB Streams and AWS Lambda

Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strongly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/partition = 10/3 = 3.33 GB
RCUs and WCUs are uniformly
spread across partitions
# of partitions (IO capacity) = 5000/3000 RCU + 500/1000 WCU = 2.17
# of partitions (storage) = 8/10 GB = 0.8
# of partitions = ceiling(max(2.17, 0.8)) = 3

Typical Workloads and Use-
Cases

DynamoDB table examples
case class CameraRecord(
cameraId: Int, // hash key
ownerId: Int,
subscribers: Set[Int],
hoursOfRecording: Int,
...
)
case class Cuepoint(
cameraId: Int, // hash key
timestamp: Long, // range key
type: String,
...
)HashKey RangeKey Value
Key Segment 1234554343254
Key Segment1 1231231433235

Typical Workloads
• Ad-tech
• IoT
• Gaming
• Web Analytics
• Mobile Applications
• Large Scale Websites
…And much more!

AWS July Webinar Series - Getting Started with Amazon DynamoDB

AWS July Webinar Series - Getting Started with Amazon DynamoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to AWS July Webinar Series - Getting Started with Amazon DynamoDB

Similar to AWS July Webinar Series - Getting Started with Amazon DynamoDB (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS July Webinar Series - Getting Started with Amazon DynamoDB