Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive on Amazon Aurora

Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is disruptive technology in the database space, bringing a new architectural model and distributed systems techniques to provide far higher performance, availability and durability than previously available using conventional monolithic database techniques. In this session, we will do a deep-dive into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share early customer experience from the field.

  • Login to see the comments

Deep Dive on Amazon Aurora

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. April 19, 2016 Amazon Aurora Deep Dive KD Singh, AWS Solution Architect Scott Ward, AWS Solution Architect
  2. 2. MySQL-compatible relational database Performance and availability of commercial databases Simplicity and cost-effectiveness of open-source databases What is Amazon Aurora?
  3. 3. Fastest growing service in AWS history Business applications Web and mobile Content management E-commerce, retail Internet of Things Search, advertising Business intelligence, analytics Games, media Common customer use cases
  4. 4. Expedia: Online travel marketplace  Real-time business intelligence and analytics on a growing corpus of online travel marketplace data.  Current Microsoft SQL Server–based architecture is too expensive. Performance degrades as data volume grows.  Cassandra with Solr index requires large memory footprint and hundreds of nodes, adding cost. Aurora benefits:  Aurora meets scale and performance requirements with much lower cost.  25,000 inserts/sec with peak up to 70,000. 30 millisecond average response time for write and 17 millisecond for read, with 1 month of data. World’s leading online travel company, with a portfolio that includes 150+ travel sites in 70 countries.
  5. 5. Thomas Publishing: Connecting buyers and suppliers  Have been using Oracle for production database.  Rapidly growing volumes of data, need to increase efficiency and deliver results on shorter timelines.  Required significant upfront investment in both infrastructure and Oracle license expense. Aurora benefits:  Migrated their production database from Oracle to Amazon Aurora using the AWS Database Migration Service and Schema Conversion Tool.  Entire migration process was completed in less than 4 weeks. In business for over a century, connecting buyers and suppliers across all industrial sectors, evolving from an industrial trade print publisher into industry’s most respected group of digital-friendly businesses.
  6. 6. ISCS: Insurance claims processing  Have been using Oracle and SQL Server for operational and warehouse data.  Cost and maintenance of traditional commercial database has become the biggest expenditure and maintenance headache. Aurora benefits:  The cost of a “more capable” deployment on Aurora has proven to be about 70% less than ISCS’s SQL Server deployments.  Eliminated backup window with Aurora’s continuous backup; exploiting linear scaling with number of connections; continuous upload to Amazon Redshift using Aurora Replicas. Provides policy management, claim, billing solutions for casualty and property insurance organizations.
  7. 7. Alfresco: Enterprise content management  Scaling Alfresco document repositories to billions of documents.  Support user applications that require sub- second response times. Aurora benefits:  Scaled to 1 billion documents with a throughput of 3 million per hour, which is 10 times faster than their current environment.  Moving from large data centers to cost-effective management with AWS and Aurora. Leading the convergence of enterprise content management and business process management. More than 1,800 organizations in 195 countries rely on Alfresco, including leaders in financial services, healthcare, and the public sector.
  8. 8. Relational databases were not designed for cloud Multiple layers of functionality all in a monolithic stack SQL Transactions Caching Logging
  9. 9. Not much has changed in the last 30 years Even when you scale it out, you’re still replicating the same stack SQL Transactions Caching Logging SQL Transactions Caching Logging Application SQL Transactions Caching Logging SQL Transactions Caching Logging Application SQL Transactions Caching Logging SQL Transactions Caching Logging Storage Application
  10. 10. Reimagining the relational database What if you were inventing the database today? You wouldn’t design it the way we did in 1970 You’d build something  that can scale out …  that is self-healing …  that leverages existing AWS services …
  11. 11. A service-oriented architecture applied to databases Moved the logging and storage layer into a multitenant, scale-out database-optimized storage service. Integrated with other AWS services like Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations. Integrated with Amazon S3 for continuous backup with 99.999999999% durability. Control PlaneData Plane Amazon DynamoDB Amazon SWF Amazon Route 53 Logging + Storage SQL Transactions Caching Amazon S3 1 2 3
  12. 12. SQL benchmark results 4 client machines with 1,000 connections each WRITE PERFORMANCE READ PERFORMANCE Single client machine with 1,600 connections Using MySQL SysBench with Amazon Aurora R3.8XL with 32 cores and 244 GB RAM
  13. 13. Reproducing these results https ://d0.a wsstat ic . com /product -m ark eting/Aurora /R DS_ Auro ra_Perf orm ance_Assessm ent_Benchm ark ing_v 1-2 .pdf AMAZON AURORA R3.8XLARGE R3.8XLARGE R3.8XLARGE R3.8XLARGE R3.8XLARGE • Create an Amazon VPC (or use an existing one). • Create 4 EC2 R3.8XL client instances to run the SysBench client. All 4 should be in the same Availability Zone (AZ). • Enable enhanced networking on your clients. • Tune Linux settings (see whitepaper referenced below). • Install SysBench version 0.5. • Launch a r3.8xlarge Amazon Aurora DB instance in the same VPC and AZ as your clients. • Start your benchmark! 1 2 3 4 5 6 7
  14. 14. Beyond benchmarks If only real-world applications saw benchmark performance. POSSIBLE DISTORTIONS Real-world requests contend with each other. Real-world metadata rarely fits in the data dictionary cache. Real-world data rarely fits in the buffer cache. Real-world production databases need to run at high availability.
  15. 15. Scaling user connections SysBench OLTP workload 250 tables Connections Amazon Aurora Amazon RDS MySQL 30 K IOPS (single AZ) 50 40,000 10,000 500 71,000 21,000 5,000 110,000 13,000 8x UP TO FA STER
  16. 16. Scaling table count SysBench write-only workload 1,000 connections, default settings Tables Amazon Aurora MySQL I2.8XL local SSD MySQL I2.8XL RAM disk RDS MySQL 30 K IOPS (single AZ) 10 60,000 18,000 22,000 25,000 100 66,000 19,000 24,000 23,000 1,000 64,000 7,000 18,000 8,000 10,000 54,000 4,000 8,000 5,000 11x UP TO FA STER Number of write operations per second
  17. 17. Scaling dataset size SYSBENCH WRITE-ONLY DB Size Amazon Aurora RDS MySQL 30 K IOPS (single AZ) 1GB 107,000 8,400 10GB 107,000 2,400 100GB 101,000 1,500 1TB 26,000 1,200 67x U P TO FA STER DB Size Amazon Aurora RDS MySQL 30K IOPS (single AZ) 80GB 12,582 585 800GB 9,406 69 CLOUDHARMONY TPC-C 136x U P TO FA STER
  18. 18. Running with read replicas SysBench write-only workload 250 tables Updates per second Amazon Aurora RDS MySQL 30 K IOPS (single AZ) 1,000 2.62 ms 0 s 2,000 3.42 ms 1 s 5,000 3.94 ms 60 s 10,000 5.38 ms 300 s 500x UP TO LOW ER LA G
  19. 19. Do fewer I/Os Minimize network packets Cache prior results Offload the database engine DO LESS WORK Process asynchronously Reduce latency path Use lock-free data structures Batch operations together BE MORE EFFICIENT How do we achieve these results? DATABASES ARE ALL ABOUT I/O NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND HIGH-THROUGHPUT PROCESSING DOES NOT ALLOW CONTEXT SWITCHES
  20. 20. Aurora cluster Amazon S3 AZ 1 AZ 2 AZ 3 Aurora Primary instance Cluster volume spans 3 AZs
  21. 21. Aurora cluster with replicas Amazon S3 AZ 1 AZ 2 AZ 3 Aurora Primary instance Cluster volume spans 3 AZs Aurora Replica Aurora Replica
  22. 22. I/O traffic in RDS MySQL BINLOG DATA DOUBLE-WRITELOG FRM FILES T Y P E O F W R IT E MYSQL WITH STANDBY EBS mirrorEBS mirror AZ 1 AZ 2 Amazon S3 EBS Amazon Elastic Block Store (EBS) Primary Instance Standby Instance 1 2 3 4 5 Issue write to Amazon EBS—EBS issues to mirror, acknowledge when both done. Stages write to standby instance using storage level replication. Issues write to EBS on standby instance. I/O FLOW Steps 1, 3, 5 are sequential and synchronous. This amplifies both latency and jitter. Many types of write operations for each user operation. Have to write data blocks twice to avoid torn write operations. OBSERVATIONS 780 K transactions. 7,388 K I/Os per million transactions (excludes mirroring, standby). Average 7.4 I/Os per transaction. PERFORMANCE 30 minute SysBench write-only workload, 100 GB dataset, RDS Single AZ, 30 K PIOPS
  23. 23. I/O traffic in Aurora (database) AZ 1 AZ 3 Primary Instance Amazon S3 AZ 2 Replica Instance AMAZON AURORA ASYNC 4/6 QUORUM DISTRIBUTED WRITES BINLOG DATA DOUBLE-WRITELOG FRM FILES T Y P E O F W R IT E S 30 minute SysBench writeonly workload, 100GB dataset IO FLOW Only write redo log records; all steps asynchronous. No data block writes (checkpoint, cache replacement). 6x more log writes, but 9x less network traffic. Tolerant of network and storage outlier latency. OBSERVATIONS 27,378 K transactions 35x MORE 950K I/Os per 1M transactions (6x amplification) 7.7x LESS PERFORMANCE Boxcar redo log records—fully ordered by LSN. Shuffle to appropriate segments—partially ordered. Boxcar to storage nodes and issue write operations.
  24. 24. I/O traffic in Aurora (storage node) LOG RECORDS Primary Instance INCOMING QUEUE STORAGE NODE S3 BACKUP 1 2 3 4 5 6 7 8 UPDATE QUEUE ACK HOT LOG DATA BLOCKS POINT IN TIME SNAPSHOT GC SCRUB COALESCE SORT GROUP PEER TO PEER GOSSIPPeer Storage Nodes All steps are asynchronous. Only steps 1 and 2 are in the foreground latency path. Input queue is 46x less than MySQL (unamplified, per node). Favors latency-sensitive operations. Use disk space to buffer against spikes in activity. OBSERVATIONS I/O FLOW ① Receive record and add to in-memory queue. ② Persist record and acknowledge. ③ Organize records and identify gaps in log. ④ Gossip with peers to fill in holes. ⑤ Coalesce log records into new data block versions. ⑥ Periodically stage log and new block versions to S3. ⑦ Periodically garbage-collect old versions. ⑧ Periodically validate CRC codes on blocks.
  25. 25. Asynchronous group commits Read Write Commit Read Read T1 Commit (T1) Commit (T2) Commit (T3) LSN 10 LSN 12 LSN 22 LSN 50 LSN 30 LSN 34 LSN 41 LSN 47 LSN 20 LSN 49 Commit (T4) Commit (T5) Commit (T6) Commit (T7) Commit (T8) LSN GROWTH Durable LSN at head node COMMIT QUEUE Pending commits in LSN order TIME GROUP COMMIT TRANSACTIONS Read Write Commit Read Read T1 Read Write Commit Read Read Tn TRADITIONAL APPROACH AMAZON AURORA Maintain a buffer of log records to write out to disk. Issue write operations when buffer is full, or time out waiting for write operations. First writer has latency penalty when write rate is low. Request I/O with first write, fill buffer till write picked up. Individual write durable when 4 of 6 storage nodes acknowledge. Advance DB durable point up to earliest pending acknowledgement.
  26. 26. Re-entrant connections multiplexed to active threads. Kernel-space epoll() inserts into latch-free event queue. Dynamically size threads pool. Gracefully handles 5000+ concurrent client sessions on r3.8xl. Standard MySQL—one thread per connection. Doesn’t scale with connection count. MySQL EE—connections assigned to thread group. Requires careful stall threshold tuning. CLIENTCONNECTION CLIENTCONNECTION LATCH FREE TASK QUEUE epoll() MYSQL THREAD MODEL AURORA THREAD MODEL Adaptive thread pool
  27. 27. I/O Traffic in Aurora (read replica) PAGE CACHE UPDATE Aurora Master 30% Read 70% Write Aurora Replica 100% New Reads Shared Multi-AZ Storage MySQL Master 30% Read 70% Write MySQL Replica 30% New Reads 70% Write SINGLE-THREADED BINLOG APPLY Data Volume Data Volume Logical: Ship SQL statements to replica. Write workload similar on both instances. Independent storage. Can result in data drift between master and replica. Physical: Ship redo from master to replica. Replica shares storage. No writes performed. Cached pages have redo applied. Advance read view when all commits seen. MYSQL READ SCALING AMAZON AURORA READ SCALING
  28. 28. Availability “Performance only matters if your database is up”
  29. 29. Storage node availability Quorum system for read/write; latency tolerant. Peer-to-peer gossip replication to fill in holes. Continuous backup to S3 (designed for 11 9s durability). Continuous scrubbing of data blocks. Continuous monitoring of nodes and disks for repair. 10 GB segments as unit of repair or hotspot rebalance to quickly rebalance load. Quorum membership changes do not stall write operations. AZ 1 AZ 2 AZ 3 Amazon S3
  30. 30. Traditional databases Have to replay logs since the last checkpoint. Typically 5 minutes between checkpoints. Single-threaded in MySQL; requires a large number of disk accesses. Amazon Aurora Underlying storage replays redo records on demand as part of a disk read. Parallel, distributed, asynchronous. No replay for startup. Checkpointed Data Redo Log Crash at T0 requires a re-application of the SQL in the redo log since last checkpoint. T0 T0 Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously. Instant crash recovery
  31. 31. Survivable caches We moved the cache out of the database process. Cache remains warm in the event of a database restart. Lets you resume fully loaded operations much faster. Instant crash recovery + survivable cache = quick and easy recovery from DB failures. SQL Transactions Caching SQL Transactions Caching SQL Transactions Caching Caching process is outside the DB process and remains warm across a database restart.
  32. 32. Faster, more predictable failover App RunningFailure Detection DNS Propagation Recovery Recovery DB Failure MYSQL App Running Failure Detection DNS Propagation Recovery DB Failure AURORA WITH MARIADB DRIVER 1 5 – 2 0 s e c . 3 – 2 0 s e c .
  33. 33. High availability with Read Replicas Amazon S3 AZ 1 AZ 2 AZ 3 Aurora Primary instance Cluster volume spans 3 AZs Aurora Replica Aurora Replica db.r3.8xlarge db.r3.2xlarge Priority: tier-1 db.r3.8xlarge Priority: tier-0
  34. 34. High availability with Read Replicas Amazon S3 AZ 1 AZ 2 AZ 3 Aurora Primary instance Cluster volume spans 3 AZs Aurora Replica Aurora Primary Instance db.r3.8xlarge db.r3.2xlarge Priority: tier-1 db.r3.8xlarge
  35. 35. ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}] ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN [DISK index | NODE index] FOR INTERVAL interval ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type [TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval Simulate failures using SQL To cause the failure of a component at the database node: To simulate the failure of disks: To simulate the failure of networking:
  36. 36. Thank you!