Deep Dive into DynamoDB

Andrey Zaychikov, Specialist SA – DB Migrations, EMEA
Deep Dive to Amazon
DynamoDB

Agenda
• Why NoSQL?
• What is DynamoDB?
• Programming with DynamoDB
• Tables, Items, Indexes, Queries
• Provisioning, Performance & Scaling
• Shema modelling techniques
• In-Memory Acceleration with DAX
• Streams, Triggers & TTL
• Security & Control
• Recap of Cost Optimization

What was the reason for NoSQL rise?
Complexity of
data schemes
Data volume Uptime

Database per workload approach

Database per workload using AWS services

What is DynamoDB?
• Based on Dynamo Model first
published by Amazon back in 2007
• Key-Value NoSQL Database as a
Service
• Low latency performance
• Almost infinite capacity
• No need to worry about underlying
hardware
• Seamless scalability
• High Durability & Availability
• Easy Administration
• Easy Planning (via throughput
parameters)
• Available via API

AdRoll use case
• Adroll Uses AWS to grow by more
than 15000% a year
• Needed high-performance, flexible
platform to swiftly sync data for
worldwide audience
• Processes 50 TB of data a day
• Serves 50 billion impressions a day
• Stores 1.5 PB of data
• Worldwide deployment minimizes
latency

VPC
Endpoints
April 2017
Auto
Scaling
June 2017
DynamoDB
Accelerator (DAX)
April 2017
Time to
Live (TTL)
February 2017
Global Tables
N E W !
Backup and
Restore
N E W !
Amazon DynamoDB
D e l i v e r i n g o n c u s t o m e r n e e d s
Encryption at rest
Das Bild kann nicht angezeigt werden.
N E W !
February 2018November
2017
November
2017

DynamoDB API Overview
•CreateTable
•DeleteTable
•DescribeLimits
•DescribeTable
•DescribeTimeToLive
•ListTables
•UpdateTable
•UpdateTimeToLive
•BatchGetItem
•BatchWriteItem
•DeleteItem
•GetItem
•PutItem
•UpdateItem
•Query
•Scan
•ListTagsOfResource
•TagResource
•UntagResource
DynamoDB Streams API
•DescribeStream
•GetRecords
•GetShardIterator
•ListStreams
Tables Items Tags

Local version of DynamoDB
The downloadable version of DynamoDB is provided as an executable .jar file. The
application runs on Windows, Linux, macOS X, and other platforms that support
Java.

<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>DynamoDBLocal</artifactId>
<version>[1.11,2.0)</version>
</dependency>
</dependencies>

<repositories>
<repository>
<id>dynamodb-local-oregon</id>
<name>DynamoDB Local Release Repository</name>
<url>https://s3-us-west-2.amazonaws.com/dynamodb-
local/release</url>
</repository>
</repositories>
To use DynamoDB in your application as a
dependency (add to POM)
•-cors value (you must provide a comma-
separated "allow" list of specific domains)
•-dbPath value
•-delayTransientStatuses
•-help
•-inMemory
•-optimizeDbBeforeStartup
•-port (8000 by default)
•-sharedDb
Main options for local version of
DynamoDB

DynamoDB overview:
Tables, Items, Indexes, Queries

DynamoDB: Tables, Items, Data Types

Data Distribution: Partition Key

Data Distribution: Partition & Sort Key

Why do we need indexes?
• Functions & predicates
support
• ==, >, <, <=, >=
• “between”
• “in”
• “contains”
• sorted results
• Counts
• top / bottom values
• Queries on the values different
from Partition & Sort Key for
the table
• Faster reads LSI only (No need
to scan the entire table and
go through all partitions)

GSI or LSI?
Global Secondary Index Local Secondary Index
• No Limits for index size
• Separate allocation of Read &
Write Capacity Units
• Eventual consistency only
• Index is stored together with the
partition, so it’s size is limited to
10 GB
• Uses RCU & WCU allocated to
the table itself
• Strong consistency available

DynamoDB: Provisioning,
Performance & Scaling

Write & Read Capacity Units
Provisioned at the table level / at
the GSI level
• Write capacity units (WCUs) are
measured in 1 KB per second
• Read capacity units (RCUs) are
measured in 4 KB per second
• RCUs measure strictly consistent
reads
• Eventually consistent reads cost
½ of consistent reads
• Read and Write throughput are
independent

Partitioning math
Number of partitions
By Capacity (Total RCU / 3000) + (Total WCU / 1000)
By Size Total Size / 10 GB
Total Partitions CEILING(MAX(Capacity, Size))

RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/Partition = 3.33 GB
As RCUs and WCUs
are uniformly
spread across
partitions
Table size = 8 GB, RCUs = 5000, WCUs = 500
Partitioning math: Example
Number of partitions
By Capacity (5000 / 3000) + (500 / 1000) = 2.17
By Size 8 / 10 = 0.8
Total Partitions CEILING(MAX(2.17, 0.8)) = 3

What happens when the data volume grows?

Primary key selection can affect performance
Heat map shows
that the data is
evenly distributed
But some
partitions can be
SLOW (usage
pattern)
Used from:
http://segment.com/
blog

How your table adapts to changes?

Bursting
• DynamoDB retains up to 300 seconds of unused read and write
capacity of a partition’s throughput to be able to burst the
throughtput
• Burst occurs automatically and during occasional burst extra
capacity can be consumed very quickly

Throttling
Throttling occurs if sustained
throughput goes beyond
provisioned throughput per
partition. The main reasons
for throttling are:
• Non-uniform workloads
• Hot keys/hot partitions
• Very large items
RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/Partition = 3.33 GB
Table size = 8 GB, RCUs = 5000, WCUs
= 500
If the load goes above 1666 RCU / 166
WCU throttling will occur
The most obvious solution is to
increase the throughput

Design For Uniform Data Access
• Two main factors:
• The primary key selection
• The workload patterns for individual
items
• Analyzing data access pattern (DYI
ELK solution as an option)
• Track & analyze hot keys
• Track & analyze partitions size
• Track index utilization
• Choosing the right partition key
• DeviceID (well defined time series)
• UserID ?
• Mitigate
• Block hot keys
• Throttle requests on API Level
• Add salt to key http://pezcame.com/d3JvbmcgZG9vcg/

Calculating capacity for Tables & Indexes
1. Is it RCU or WCU
2. Calculate number of items
per second as reads &
writes are provisioned by
second
3. Calculate number of
actions per item
4. Multiply items per second
times Actions per item
5. In case of reads define if it
is eventually consistent
! Do not forget to include any
LSI in this calculations & do
separate calculations for GSIs
900 reads per minute and the size of
the item is 7 KB eventual consistency
for reads is ok
Example
1. Read (4 KB per operation)
2. 900 / 60 = 15 (items per second)
3. Each item needs 2 operations
4. 2 operations per item * 15 items
per second = 30
5. 30 / 2 (as we are using eventual
consistency) = 15 RCU

Basic Limits for DynamoDB
Limit Per Table Per Account
Max WCU (default / max) 10000 / none 20000 / none
Max RCU (default / max) 10000 / none 20000 / none
Number of tables per region - 256
Max Item Size 400 KB -
Number of Secondary Indexes 5 -
Size of Partition Key 2048 bytes -
Size of Sort Key 1024 bytes -
Default limits for DynamoDB for US.East-1(N.Virginia) are different:
40000 RCU & WCU per table and 80000 RCU & WCU per account

Auto-Scaling for DynamoDB
• DynamoDB has a
functionality to scale up and
down in response to the
traffic pattern
• To use Auto-Scaling you
need to define a scaling
policy (target utilization &
min / max provisioned
capacity)
• You can define auto-scaling
policy for reads and writes
separately. In addition, you
can auto-scale GSI

Approach for Scaling
• You can increase capacity as many times as you want
• You can decrease the capacity 4 times per day (GMT timezone)
+ 1 decrease for each 1 hour of stable load (so maximum, you
can decrease up to 28 times)

Monitoring DynamoDB performance
• Monitor Retries on App side
• Capture keys & metrics for request
with particular keys
Metrics to alert on:
• SuccessfulRequestLatency (root
cause: network issue / table design)
• ConsumedReadCapacityUnits &
ConsumedWriteCapacityUnits (alert
when close to 80% or less)
• ReadThrottleEvents &
WriteThrottleEvents (should always
be equal zero)

1:1
• Simplest Case
• Just use a table
with single Partition
Key
• Examples:
• Users
• Partition Key =
UserId
• Games
• Partition Key =
GameId
• Retrieve data by Id
or create an Index

1:N
• Most often case
• Use a table with Partition Key & Sort Key
• Examples (One User can play multiple games):
• Partition Key = UserId
• Sort Key = GameId
• Advanced queries available with the use of Sort Key
• Index can be a good option as well

N:M
• Two tables with inverted Partition
& Sort Key
• Application / “stored
procedures” is responsible for
data consistency
• Use GSIs for query data

Multi-tenancy
• Use tenant id as the
hash key
• Application / “stored
procedures” is
responsible for data
consistency
• Use GSIs for query
data

Working with large items
• Use One-to-Many Tables Instead Of
Large Set Attributes
• Compress Large Attribute Values
• For instance, you can compress these
items using DynamoDB SDK for Java or .NET
• Store Large Attribute Values in Amazon
S3
• Use S3 Object metadata & tags to store
relevant data
• Utilize Lambda & Triggers to manage
references between Items in DynamoDB
and objects in S3
• Break Up Large Attributes Across
Multiple Items

In-Memory Acceleration with DAX

What is DAX?
AX is a DynamoDB-compatible caching
service.
• It reduces the response times of
eventually-consistent read workloads
• DAX reduces operational and
application complexity
• DAX provides increased throughput
and potential operational cost
savings reduces need for over-
provision read capacity units
• DAX provides automatic failover both
for master & read replicas

How DAX works?
• GetItem
• BatchGetItem
• Query
• Scan
• PutItem
• UpdateItem
• DeleteItem
• BatchWriteItem

Consistency Models in DAX
• Read
• Eventual consistency by default
• Operation which require strong
consistency are served by
DynamoDB
• Consistency for this case depends
on the way of how DynamoDB
Tables are used by different apps
• TTL is very important and should be
adopted to the use case
• Write
• Eventual consistency for writes &
possibility of deviations
• Write-Through
• Write-Around

When not to use DAX?
• Applications that require strongly
consistent reads.
• Applications that do not require
microsecond response times for
reads.
• Applications that are write-
intensive, or that do not perform
much read activity.
• Applications that are already
using a different caching solution
with DynamoDB, and are using
their own client-side logic for
working with that caching solution.

DAX provisioning and management
• Create a Subnet Group
• Create IAM service role
• Define DynamoDB tables by the
role permissions
• Create DAX cluster
• Define Subnet Group
• Define Instance types
• Define number of Read Replicas
• Configure Security Groups
• Open port 8111
• Adjust additional parameters

Configuring additional parameters with DAX
• Parameter Groups
• Security Groups
• Cluster ARN
• arn:aws:dax:region:accountID:cache/clusterName
• Cluster Endpoint
• myDAXcluster.2cmrwl.clustercfg.dax.use1.cache.amazonaws.com:8111
• Node Endpoint
• myDAXcluster-a.2cmrwl.clustercfg.dax.use1.cache.amazonaws.com:8111
• Subnet Groups
• Events
• Maintenance Window

Scaling DAX cluster
• DAX cluster
• Master (one master at any
time)
• Read Replicas (up to 10 read
replicas)
• Vertical Scaling
• dax.r3.large (13 GiB)
• dax.r3.xlarge (26 GiB)
• dax.r3.2xlarge (54 GiB)
• Horizontal Scaling (adding
additional Read Replicas)

Monitoring DAX
• CPU Utilization
• Item Cache Hits / Misses
• Query Cache Hits / Misses
• Scan Cache Hits / Misses
• Total Requests
• Failed Requests (indicates
server side errors)
• Error Requests (indicates user
errors)
• Fault Requests
• per Operation Requests count

DynamoDB Streams
• DynamoDB Streams captures a time-ordered sequence of
item-level modifications (stored up to 24 hours)
• A DynamoDB stream is an ordered flow of information about
changes to items in an Amazon DynamoDB table
DynamoDB Streams guarantees the following:
• Each stream record appears exactly once in the stream.
• For each item that is modified in a DynamoDB table, the
stream records appear in the same sequence as the actual
modifications to the item.

Enabling a Stream
• StreamEnabled—specifies whether a
stream is enabled (true) or disabled
(false) for the table.
• StreamViewType—specifies the
information that will be written to the
stream whenever data in the table is
modified:
• KEYS_ONLY—only the key attributes of the
modified item.
• NEW_IMAGE—the entire item, as it
appears after it was modified.
• OLD_IMAGE—the entire item, as it
appeared before it was modified.
• NEW_AND_OLD_IMAGES—both the new
and the old images of the item.
Each stream has it’s own
unique ARN

How Stream is organized?
• Child / Parent Shard
• Shards scaling
• Shard wiped out
after 24 hours
• App working with
shards via SDK

Working with the Stream
Connect to Stream using it’s
endpoint and than:
• Use DynamoDB Streams SDK to
work with Streams in your
application
• or use DynamoDB Streams Kinesis
Adapter to process Streams data
with Kinesis
• or trigger Lambda function to
process data produced by
DynamoDB Stream
• you can also catch TTL events and
process the items deleted by TTL

Triggers & AWS Lambda: Short Intro
EVENT SOURCE FUNCTION SERVICES
(ANYTHING)
Changes in
data state
Requests to
endpoints
Change in
resource
• Java
• Python
• Node.js
• C# Core
• More
coming …

Amazon DynamoDB and AWS Lambda
integration
Stream-based model – AWS Lambda polls the stream 4 times per
second and, when it detects new records, invokes your Lambda
by passing the update event as parameter.
You maintain event source mapping. It describes which stream
maps to which Lambda function.
Synchronous invocation – AWS Lambda invokes a Lambda
function using the RequestResponse invocation type (synchronous
invocation).
Event structure – The event your Lambda function receives is the
table update information AWS Lambda reads from your stream.

Example: Indexing data in DynamoDB

Security & Control
IAM for access
management
• IAM Users
• IAM Roles
• Conditions
• STS for applications
VPC
• Subnet Groups & Security
Groups for DAX
• VPC Endpoints for
DynamoDB

Using Conditions
You can specify conditions that
determine how a permissions policy
takes effect. You can:
• Grant permissions on a table, but
restrict access to specific items in
that table based on certain
primary key values.
• Hide information so that only a
subset of attributes are visible to
the user.
• You use the IAM Condition
element to implement a fine-
grained access control policy.
"Sid": "AllowAccessToOnlyItemsMatchingUserID",
"Effect": "Allow",
"Action":
[ "dynamodb:GetItem",
"dynamodb:BatchGetItem", "dynamodb:Query",
"dynamodb:PutItem", "dynamodb:UpdateItem",
"dynamodb:DeleteItem",
"dynamodb:BatchWriteItem" ],
"Resource":
[ "arn:aws:dynamodb:us-west-
2:123456789012:table/GameScores" ],
"Condition":
{ "ForAllValues:StringEquals": {
"dynamodb:LeadingKeys": [
"${www.amazon.com:user_id}" ],
"dynamodb:Attributes": [ "UserId",
"GameTitle", "Wins", "Losses", "TopScore",
"TopScoreDateTime" ] },
"StringEqualsIfExists":
{ "dynamodb:Select": "SPECIFIC_ATTRIBUTES" }

Cost optimization with DynamoDB

Cost optimization with DynamoDB
• Proper key selection /
monitoring and avoiding hot
keys
• Avoiding storing & processing
large attributes (store
references to S3 only)
• DAX for Read-Heavy
workloads
• Auto-Scaling with small
increments for spiky workloads

Deep Dive into DynamoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Dive into DynamoDB

Similar to Deep Dive into DynamoDB (20)

More from AWS Germany

More from AWS Germany (20)

Recently uploaded

Recently uploaded (20)

Deep Dive into DynamoDB