SlideShare a Scribd company logo
1 of 45
Cloud databases in Amazon
Web Services
Roman Gomolko
roman@userreport.com
October 2015
Ciklum Speakers Corner
Let’s get acquired
UserReport
Developing products that allow to learn the audience
Started using AWS more than 5 years ago
Fully migrated to AWS more than 1.5 years ago
Processing 3 billions requests monthly
Generating reports based on 8 billions of requests with batched reports
Online reports on 300 millions of records
Used ~50% of services provided by AWS
Totally happy regarding using AWS
A database is an organized collection of data
RDS
Relational Databases hosted and maintained by Amazon
Different Engines & Editions & Versions
Captain Obvious’s notes
● RDS doesn’t host particular DB but it hosts RDMS
● Create your root user, create separate users for each
database/application
● Your instance is firewalled with security groups
● Advanced configuration is available through parameter groups
Multi A-Z deployments for production workloads
● SLA 99.95% monthly uptime
● Doubles prices
● Allows to maintain your database without downtime
○ Minor updates
○ Major updates
○ Disk resize
○ EC2 upgrade
● No support for MS SQL Web, Express, Standard
Pricing
RDS price = EC2 + ELB + license
On-Demand or Reserved purchases with up-front payment
Backups
● Automated with automated rotation
● Restore to point of time
● Restore will create new instance and deploy desired version. It takes a
while
● Manual backup via Snapshots
Advanced optimizations
● Read replicas
○ you can create on the fly high available read-only copies of your data
● Using ElastiCache for performance boost
○ Using memcache will massively boost your queries
Downsides
● No control over EC2 for very advanced optimizations
● Backup works over instance
○ One RDS per DB
○ Or custom backups
● No Active Directory integration
● No Cross-region replication
Aurora
MySQL compatible database by Amazon with cloud in the mind
Aurora
Available and Durable
Amazon Aurora is designed to offer greater than 99.99% availability,
replicating 6 copies of data across 3 Availability Zones and backing up
data continuously to Amazon S3. Recovery from physical storage failures
is transparent and instance restarts typically require less than a minute.
Aurora
Highly Scalable
You can use Amazon RDS to scale your Amazon Aurora database
instance up to 32 vCPUs and 244GiB Memory. You can also add up to 15
Amazon Aurora Replicas across three availability zones to further scale
read capacity. Amazon Aurora automatically grows storage as needed,
from 10GB up to 64TB.
DynamoDB
Document database with biscuits by Amazon
DynamoDB overview
● Operates with tables
● Table definition consist of
○ key (required)
○ sort (range) key (optional)
○ indexes (optional)
● Table contains items
● Items is described by
○ key
DynamoDB item overview
● Max 64 KB
● Unlimited number of attributes
● Attribute types
○ string
○ string array
○ number
○ number array
○ binary
DynamoDB operations
● Put - insert or update
● Get
● Delete
● Scan
● Query
Demo time
DynamoDB show-case
DynamoDB performance
● You provision read and write capacity
● DynamoDB is divided into shards. Each shard has following limits:
○ 2 Gb of data
○ 3000 Read Capacity Units
○ 2000 Write Capacity Units
● Your requests can be throttled (API cares about retry-logic in most cases)
● You can setup autoscale of DynamoDB
DynamoDB Streams
● Triggers on data changes
● Cross-region replication
● ElasticSearch integration to allow to search among your data
https://aws.amazon.com/blogs/aws/new-logstash-plugin-search-
dynamodb-content-using-elasticsearch/
Backups and maintenance
● All data is replicated on three nodes - no backup required
● Change of provisioned throughput does not downgrade performance
● You can setup AutoScale for DynamoDB
https://github.com/sebdah/dynamic-dynamodb
*hit happens
DynamoDB had massive outage (high error rate on API request) in N. Verginia
that affected:
● SQS
● CloudWatch
● AutoScale Groups
● SNS
https://aws.amazon.com/message/5467D2/
Application design best practices
ElastiCache
Key-value store is also database
Redis
● Extremely fast in-memory database
● Different data structures
○ Sets
○ Lists
○ Ordered sets
○ HyperLogLog
○ HashSets
○ Geo data
Redis hosted in AWS
● Different versions supported
● Multi AZ master/slave configuration maintained by Amazon
● Automated backups
● Monitoring with CloudWatch
● No chance to patch Redis for your needs (geeks like custom operations)
Example 1. Calculating unique visitors
PFADD visitors.20151001 xxx
PFCOUNT visitors.20151001
INC pageviews.20151001
GET pageviews.20151001
Example 2. Working with sets
# users 1 and 2 add item to basket
SADD added_item_to_cart id1
SADD added_item_to_cart id2
SADD begin_checkout id1
# users haven’t began checkout
SDIFFSTORE no_checkout added_item_to_cart begin_checkout
# users with email and haven’t started checkout
SINTER known_email no_checkout
Example 3. Top scored users
ZADD gamescore 1 user1
ZADD gamescore 4 user2
ZADD gamescore 2 user3
ZREVRANGE gamescore 0 9
user2
user3
user1
Learn more
Redshift
It’s like PostgreSQL but for peta-bytes
Redshift
● Multiple-node cluster deployment that scales up to petabytes
● $1000/Tb/year
● Good for data mining
● Query execution minutes or hours
Table design
● HashKey - how data will be distributed across nodes
● SortKey - how data will be sorted within node
● Primary key, foreign keys, constraints - they are hints to query optimizer
Uploading data
● From CSV
● From DynamoDB
● From EMR
● Bulk insert
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_example
s.html
Loading data from S3
copy table
from 's3://mybucket/data/table.txt'
credentials 'aws_access_key_id=<access-key-
id>;aws_secret_access_key=<secret-access-key>'
csv [gzip] [delimiter "|"];
Query Execution
● PostgreSQL compatible syntax with many disabled features
● No views
● No stored procedures
● Recently deployed scalar custom functions
● 10 parallel queries
Getting query results
unload ('select * from mytable)
to 's3://mybucket/unload/result/'
credentials
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-
key>';
S3 + EMR
Why don’t query files?
S3 as storage
● CSV
● JSON
● XML
● Parquet
EMR
EMR can launch Elastic Map Reduce cluster so
● Hadoop
● Spark
● Hive
● Presto
Distributed SQL Query Engine for Big Data
Demo time
One size fits all principle does not work here
"Cloud databases amazon web services" by Roman Gomolko

More Related Content

More from Ciklum Ukraine

Developing high load systems using C++
Developing high load systems using C++Developing high load systems using C++
Developing high load systems using C++Ciklum Ukraine
 
Collection view layout
Collection view layoutCollection view layout
Collection view layoutCiklum Ukraine
 
Introduction to auto layout
Introduction to auto layoutIntroduction to auto layout
Introduction to auto layoutCiklum Ukraine
 
Unit Testing: Special Cases
Unit Testing: Special CasesUnit Testing: Special Cases
Unit Testing: Special CasesCiklum Ukraine
 
Model-View-Controller: Tips&Tricks
Model-View-Controller: Tips&TricksModel-View-Controller: Tips&Tricks
Model-View-Controller: Tips&TricksCiklum Ukraine
 
Future of Outsourcing report published in The Times featuring Ciklum's CEO To...
Future of Outsourcing report published in The Times featuring Ciklum's CEO To...Future of Outsourcing report published in The Times featuring Ciklum's CEO To...
Future of Outsourcing report published in The Times featuring Ciklum's CEO To...Ciklum Ukraine
 
Михаил Попчук "Cкрытые резервы команд или 1+1=3"
Михаил Попчук "Cкрытые резервы команд или 1+1=3"Михаил Попчук "Cкрытые резервы команд или 1+1=3"
Михаил Попчук "Cкрытые резервы команд или 1+1=3"Ciklum Ukraine
 
"To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod...
"To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod..."To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod...
"To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod...Ciklum Ukraine
 
"Marmalade" presentation at Ciklum event "Defining your Mobile Strategy"
"Marmalade" presentation at Ciklum event "Defining your Mobile Strategy""Marmalade" presentation at Ciklum event "Defining your Mobile Strategy"
"Marmalade" presentation at Ciklum event "Defining your Mobile Strategy"Ciklum Ukraine
 
Ciklum Mobile Development Capability: Project Clients' References
Ciklum Mobile Development Capability: Project Clients' ReferencesCiklum Mobile Development Capability: Project Clients' References
Ciklum Mobile Development Capability: Project Clients' ReferencesCiklum Ukraine
 
Mecom Group's Digital Innovation and IT Sourcing Strategy
Mecom Group's Digital Innovation and IT Sourcing StrategyMecom Group's Digital Innovation and IT Sourcing Strategy
Mecom Group's Digital Innovation and IT Sourcing StrategyCiklum Ukraine
 
Journey and lessons from launching a new SaaS based marketing platform
Journey and lessons from launching a new SaaS based marketing platform Journey and lessons from launching a new SaaS based marketing platform
Journey and lessons from launching a new SaaS based marketing platform Ciklum Ukraine
 
Marmalade: more platforms, more possibilities
Marmalade: more platforms, more possibilitiesMarmalade: more platforms, more possibilities
Marmalade: more platforms, more possibilitiesCiklum Ukraine
 

More from Ciklum Ukraine (20)

Developing high load systems using C++
Developing high load systems using C++Developing high load systems using C++
Developing high load systems using C++
 
Collection view layout
Collection view layoutCollection view layout
Collection view layout
 
Introduction to auto layout
Introduction to auto layoutIntroduction to auto layout
Introduction to auto layout
 
Groovy on Android
Groovy on AndroidGroovy on Android
Groovy on Android
 
Unit Testing: Special Cases
Unit Testing: Special CasesUnit Testing: Special Cases
Unit Testing: Special Cases
 
Material design
Material designMaterial design
Material design
 
Kanban development
Kanban developmentKanban development
Kanban development
 
Mobile sketching
Mobile sketching Mobile sketching
Mobile sketching
 
More UX in our life
More UX in our lifeMore UX in our life
More UX in our life
 
Model-View-Controller: Tips&Tricks
Model-View-Controller: Tips&TricksModel-View-Controller: Tips&Tricks
Model-View-Controller: Tips&Tricks
 
Unit Tesing in iOS
Unit Tesing in iOSUnit Tesing in iOS
Unit Tesing in iOS
 
Future of Outsourcing report published in The Times featuring Ciklum's CEO To...
Future of Outsourcing report published in The Times featuring Ciklum's CEO To...Future of Outsourcing report published in The Times featuring Ciklum's CEO To...
Future of Outsourcing report published in The Times featuring Ciklum's CEO To...
 
Михаил Попчук "Cкрытые резервы команд или 1+1=3"
Михаил Попчук "Cкрытые резервы команд или 1+1=3"Михаил Попчук "Cкрытые резервы команд или 1+1=3"
Михаил Попчук "Cкрытые резервы команд или 1+1=3"
 
"To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod...
"To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod..."To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod...
"To be, rather than to seem” interview with Ciklum VP of HR Marina Vyshegorod...
 
Why to join Ciklum?
Why to join Ciklum?Why to join Ciklum?
Why to join Ciklum?
 
"Marmalade" presentation at Ciklum event "Defining your Mobile Strategy"
"Marmalade" presentation at Ciklum event "Defining your Mobile Strategy""Marmalade" presentation at Ciklum event "Defining your Mobile Strategy"
"Marmalade" presentation at Ciklum event "Defining your Mobile Strategy"
 
Ciklum Mobile Development Capability: Project Clients' References
Ciklum Mobile Development Capability: Project Clients' ReferencesCiklum Mobile Development Capability: Project Clients' References
Ciklum Mobile Development Capability: Project Clients' References
 
Mecom Group's Digital Innovation and IT Sourcing Strategy
Mecom Group's Digital Innovation and IT Sourcing StrategyMecom Group's Digital Innovation and IT Sourcing Strategy
Mecom Group's Digital Innovation and IT Sourcing Strategy
 
Journey and lessons from launching a new SaaS based marketing platform
Journey and lessons from launching a new SaaS based marketing platform Journey and lessons from launching a new SaaS based marketing platform
Journey and lessons from launching a new SaaS based marketing platform
 
Marmalade: more platforms, more possibilities
Marmalade: more platforms, more possibilitiesMarmalade: more platforms, more possibilities
Marmalade: more platforms, more possibilities
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

"Cloud databases amazon web services" by Roman Gomolko

  • 1. Cloud databases in Amazon Web Services Roman Gomolko roman@userreport.com October 2015 Ciklum Speakers Corner
  • 3. UserReport Developing products that allow to learn the audience Started using AWS more than 5 years ago Fully migrated to AWS more than 1.5 years ago Processing 3 billions requests monthly Generating reports based on 8 billions of requests with batched reports Online reports on 300 millions of records Used ~50% of services provided by AWS Totally happy regarding using AWS
  • 4. A database is an organized collection of data
  • 5. RDS Relational Databases hosted and maintained by Amazon
  • 6. Different Engines & Editions & Versions
  • 7. Captain Obvious’s notes ● RDS doesn’t host particular DB but it hosts RDMS ● Create your root user, create separate users for each database/application ● Your instance is firewalled with security groups ● Advanced configuration is available through parameter groups
  • 8. Multi A-Z deployments for production workloads ● SLA 99.95% monthly uptime ● Doubles prices ● Allows to maintain your database without downtime ○ Minor updates ○ Major updates ○ Disk resize ○ EC2 upgrade ● No support for MS SQL Web, Express, Standard
  • 9. Pricing RDS price = EC2 + ELB + license On-Demand or Reserved purchases with up-front payment
  • 10. Backups ● Automated with automated rotation ● Restore to point of time ● Restore will create new instance and deploy desired version. It takes a while ● Manual backup via Snapshots
  • 11. Advanced optimizations ● Read replicas ○ you can create on the fly high available read-only copies of your data ● Using ElastiCache for performance boost ○ Using memcache will massively boost your queries
  • 12. Downsides ● No control over EC2 for very advanced optimizations ● Backup works over instance ○ One RDS per DB ○ Or custom backups ● No Active Directory integration ● No Cross-region replication
  • 13. Aurora MySQL compatible database by Amazon with cloud in the mind
  • 14. Aurora Available and Durable Amazon Aurora is designed to offer greater than 99.99% availability, replicating 6 copies of data across 3 Availability Zones and backing up data continuously to Amazon S3. Recovery from physical storage failures is transparent and instance restarts typically require less than a minute.
  • 15. Aurora Highly Scalable You can use Amazon RDS to scale your Amazon Aurora database instance up to 32 vCPUs and 244GiB Memory. You can also add up to 15 Amazon Aurora Replicas across three availability zones to further scale read capacity. Amazon Aurora automatically grows storage as needed, from 10GB up to 64TB.
  • 16. DynamoDB Document database with biscuits by Amazon
  • 17. DynamoDB overview ● Operates with tables ● Table definition consist of ○ key (required) ○ sort (range) key (optional) ○ indexes (optional) ● Table contains items ● Items is described by ○ key
  • 18. DynamoDB item overview ● Max 64 KB ● Unlimited number of attributes ● Attribute types ○ string ○ string array ○ number ○ number array ○ binary
  • 19. DynamoDB operations ● Put - insert or update ● Get ● Delete ● Scan ● Query
  • 21. DynamoDB performance ● You provision read and write capacity ● DynamoDB is divided into shards. Each shard has following limits: ○ 2 Gb of data ○ 3000 Read Capacity Units ○ 2000 Write Capacity Units ● Your requests can be throttled (API cares about retry-logic in most cases) ● You can setup autoscale of DynamoDB
  • 22. DynamoDB Streams ● Triggers on data changes ● Cross-region replication ● ElasticSearch integration to allow to search among your data https://aws.amazon.com/blogs/aws/new-logstash-plugin-search- dynamodb-content-using-elasticsearch/
  • 23. Backups and maintenance ● All data is replicated on three nodes - no backup required ● Change of provisioned throughput does not downgrade performance ● You can setup AutoScale for DynamoDB https://github.com/sebdah/dynamic-dynamodb
  • 24. *hit happens DynamoDB had massive outage (high error rate on API request) in N. Verginia that affected: ● SQS ● CloudWatch ● AutoScale Groups ● SNS https://aws.amazon.com/message/5467D2/
  • 27. Redis ● Extremely fast in-memory database ● Different data structures ○ Sets ○ Lists ○ Ordered sets ○ HyperLogLog ○ HashSets ○ Geo data
  • 28. Redis hosted in AWS ● Different versions supported ● Multi AZ master/slave configuration maintained by Amazon ● Automated backups ● Monitoring with CloudWatch ● No chance to patch Redis for your needs (geeks like custom operations)
  • 29. Example 1. Calculating unique visitors PFADD visitors.20151001 xxx PFCOUNT visitors.20151001 INC pageviews.20151001 GET pageviews.20151001
  • 30. Example 2. Working with sets # users 1 and 2 add item to basket SADD added_item_to_cart id1 SADD added_item_to_cart id2 SADD begin_checkout id1 # users haven’t began checkout SDIFFSTORE no_checkout added_item_to_cart begin_checkout # users with email and haven’t started checkout SINTER known_email no_checkout
  • 31. Example 3. Top scored users ZADD gamescore 1 user1 ZADD gamescore 4 user2 ZADD gamescore 2 user3 ZREVRANGE gamescore 0 9 user2 user3 user1
  • 33. Redshift It’s like PostgreSQL but for peta-bytes
  • 34. Redshift ● Multiple-node cluster deployment that scales up to petabytes ● $1000/Tb/year ● Good for data mining ● Query execution minutes or hours
  • 35. Table design ● HashKey - how data will be distributed across nodes ● SortKey - how data will be sorted within node ● Primary key, foreign keys, constraints - they are hints to query optimizer
  • 36. Uploading data ● From CSV ● From DynamoDB ● From EMR ● Bulk insert http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_example s.html
  • 37. Loading data from S3 copy table from 's3://mybucket/data/table.txt' credentials 'aws_access_key_id=<access-key- id>;aws_secret_access_key=<secret-access-key>' csv [gzip] [delimiter "|"];
  • 38. Query Execution ● PostgreSQL compatible syntax with many disabled features ● No views ● No stored procedures ● Recently deployed scalar custom functions ● 10 parallel queries
  • 39. Getting query results unload ('select * from mytable) to 's3://mybucket/unload/result/' credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access- key>';
  • 40. S3 + EMR Why don’t query files?
  • 41. S3 as storage ● CSV ● JSON ● XML ● Parquet
  • 42. EMR EMR can launch Elastic Map Reduce cluster so ● Hadoop ● Spark ● Hive ● Presto Distributed SQL Query Engine for Big Data
  • 44. One size fits all principle does not work here