AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from Oracle to Amazon DynamoDB (ARC404)

1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Shreekant Mandke, Software Development Manager, Amazon Marketplace December 1, 2016 Migrating a Highly Available and Scalable Service from Oracle to Amazon DynamoDB ARC404

2. Audience Poll: Using Relational only Using mix of Relational and Non-Relational Considering moving from Relational to Non-Relational

3. What to Expect from This Session Case study of an actual product migration done successfully This talk is meant to be a template but not a prescription What it is not - Not a primer on Oracle or DynamoDB - It is not about picking sides between technologies or products - It is not a talk on how to do non-relational schema design

4. Agenda 1. Application Overview 2. Preparation for Migration 3. Schema Design 4. Migration Strategies 5. API Refactoring 6. Data Migration 7. Results

5. Application Overview Why we migrated Availability Scalability Ops Overhead Cost Document Ingestion Service Document Distribution Service Document Manager Service Oracle Document Metadata S3 Encrypted Documents Data Metadata DynamoDB Document Metadata

7. Preparation: Data and Traffic Pattern • Variances in persisted data

8. Preparation: Data and Traffic Pattern • Variances in persisted data • Database interactions

9. Preparation: Database Interactions (Sample) CreateDocumentVersion- -API Documents Table DocumentAttributes Table UserAttributes Table

10. Preparation: Data and Traffic Pattern • Variances in persisted data • Database interactions • Traffic patterns

11. Preparation: Analyze Traffic Pattern API Distribution CreateDocument A% GetDocumentVersion B% CreateToken C% ResolveToken D% GetMetadata E% SetMetadata F% Search G% --- other API’s -- Sessions Distribution Internal document upload P% External document upload Q% Automated document upload R% Update document metadata S% Search metadata T% Search and internal download U% Search and external download V% Automated document download W% Delete document Y% --- other sessions -- Z%

12. Preparation • Variances in persisted data • Database interactions • Traffic patterns • Froze schema - for duration of migration

13. Preparation • Variances in persisted data • Database interactions • Traffic patterns • Froze schema – for duration of migration • Communication – clients informed of changes and effects

15. Schema: Design Tenets • Optimize for scalability and latency. Not storage efficiency • Eventually consistent reads only. No read after write • Idempotent: Operations designed to work on unreliable systems • Immutability: Insert only pattern. Maintain every version for audit • Single table write per operation (aspirational) • First write has data to repair subsequent writes

16. Schema: Oracle Schema (Partial) Document PK Document_UUID FK1 DocumentClass_UUID Desc DocumentClass PK DocumentClass_UUID Desc DocumentVersion PK DocumentVersion_UUID FK1 Document_UUID Locator_UUID Metadata_A Metadata_B Metadata_C Metadata_D Metadata_E Locator PK Locator_UUID FK1 DocumentVersion_UUID Type DocumentVersion_Metadata PK,FK1 DocumentVersion_UUID PK Name Value LocatorMetadata PK,FK1 Locator_UUID PK Name Value

17. Schema: DynamoDB Schema (Partial) Documents DocumentVersionId (UUID partition key) DocumentId – (UUID GSI) DocumentLIfetime data DocumentClassId – (UUID) DocumentClass table data IndexedMetadata_A IndexedMetadata_B IndexedMetadata_C IndexedMetadata_D IndexedMetadata_E UnindexedMetadata LocatorID – (UUID) Locator table data Other fields Document PK Document_UUID FK1 DocumentClass_UUID Desc DocumentClass PK DocumentClass_UUID Desc DocumentVersion PK DocumentVersion_UUID FK1 Document_UUID Locator_UUID Metadata_A Metadata_B Metadata_C Metadata_D Metadata_E Locator PK Locator_UUID FK1 DocumentVersion_UUID Type DocumentVersion_Metadata PK,FK1 DocumentVersion_UUID PK Name Value LocatorMetadata PK,FK1 Locator_UUID PK Name Value

22. Schema: DynamoDB Schema (Partial) Documents DocumentVersionId (UUID partition key) DocumentId – (UUID GSI) DocumentLIfetime data DocumentClassId – (UUID) DocumentClass table data IndexedMetadata_A IndexedMetadata_B IndexedMetadata_C IndexedMetadata_D IndexedMetadata_E UnindexedMetadata LocatorID – (UUID) Locator table data Other fields UserAttributes DocumentVersionId (UUID partition key) HeadVersion (sort key) – HEAD CurrentVersion – UID PreviousVersion – UUID IndexedMetadata_A (GSI) IndexedMetadata_B (GSI) IndexedMetadata_C (GSI) IndexedMetadata_D (GSI) IndexedMetadata_E (GSI) UnindexedMetadata Other fields DocumentAttributes DocumentVersionId (UUID partition key) HeadVersion (sort key) – HEAD CurrentVersion – UUID PreviousVersion – UUID DocumentLifetime data Other fields

25. Schema: Multi-Table Write CreateDocumentVersion API Documents Table UserAttributes Table DocumentAttributes Table Parallel Calls

26. Schema: Multi-Table Write (Failure) CreateDocumentVersion API Documents Table UserAttributes Table DocumentAttributes Table Parallel Calls

27. Schema: Read Repair Pattern GetDocumentVersionAttributes API UserAttributes Table Documents Table On Failure ReadRepair

28. Schema: DynamoDB Schema (Partial) Documents DocumentVersionId (UUID partition key) DocumentId – (UUID GSI) Documents table data DocumentClassId – (UUID) DocumentClass table data IndexedMetadata_A IndexedMetadata_B IndexedMetadata_C IndexedMetadata_D IndexedMetadata_E UnindexedMetadata Locator table data Other fields UserAttributes DocumentVersionId (UUID partition key) HeadVersion (sort key) – HEAD CurrentVersion – UUID PreviousVersion – UUID IndexedMetadata_A (GSI) IndexedMetadata_B (GSI) IndexedMetadata_C (GSI) IndexedMetadata_D (GSI) IndexedMetadata_E (GSI) UnindexedMetadata Other fields

29. Schema: Update-only Pattern Document Version ID (Primary Key) Head Version ( Sort Key ) Row Version Previous version Metadata A ( GSI ) Versioned Metadata A UUID1 HEAD V2 V1 bbbb UUID1 V2 V1 bbbb UUID1 V1 NULL aaaa Document Version ID (Primary Key) Head Version ( Sort Key ) Row Version Previous version Metadata A ( GSI ) Versioned Metadata A UUID1 HEAD V2 V1 bbbb UUID1 V3 V2 cccc UUID1 V2 V1 bbbb UUID1 V1 NULL aaaa Document Version ID (Primary Key) Head Version ( Sort Key ) Row Version Previous version Metadata A ( GSI ) Versioned Metadata A UUID1 HEAD V3 V2 cccc UUID1 V3 V2 cccc UUID1 V2 V1 bbbb UUID1 V1 NULL aaaa

31. Migration Phases Phase 1: Data only to Oracle Phase 2: Oracle is primary. DynamoDB is secondary Phase 3: DynamoDB is primary. Oracle is secondary Phase 4: Data only to DynamoDB

32. Migration Considerations Application Consistency • Interfaces and expected response remain unchanged Correctness • Entity in both data stores match Completeness • All entities have been migrated Restartability • Fall back to last-known good state and restart process

33. Design-1: Workflow Model ( Did Not Implement )

34. Workflow Model Workflow Engine Oracle DynamoDB W-1 W-2 W-n Queue Workers Write Compare

35. Problems with Workflow Model Application consistency: • Reads might reflect old data if the workflow is blocked. Application Correctness: Workflow system not strictly FIFO. • E.g., Upd-1, Upd-2, Upd-3. • Order might not be preserved Validation checks for Upd-1 fails as Upd-2 or Upd-3 in progress Complexity Low Completeness True App Consistency Does not Meet Potential data loss Due to correctness Correctness Does not Meet Restart ability True Latency Low

36. Design-2: Single Master Model ( Not Implemented )

37. Single Master Model Improvements over workflow model • Synchronous changes: All changes done synchronously on client request • FIFO Order: Order preserved using Conditional puts. On error client has to redo operation

38. Single Master Model Service Oracle DynamoDB Entity - x Update for Entity X

39. Problems with Single Master Model Correctness: No baseline to check if data added to DynamoDB is correct and can result in a Potential Data loss Complexity High due to Search Completeness Does not Meet App Consistency Due to potential data loss Potential data loss Does not Meet Correctness Does not Meet Restart ability Due to Potential Data loss Latency Low

40. Design-3: Two Tracking flags Model ( Not Implemented )

41. Two Tracking Flags Model Improvements from Single Master model • Baseline for data comparison. Write to both stores • Flags to track which data needs to be migrated. How it works • Maintain extra flags in both Oracle and DynamoDB for tracking • IsMaster – At any time either Oracle or DynamoDB will be Master • IsBackfilled – True if row is backfilled to other store • Request • Clients write to master; IsMaster flag set • Secondary data store updated • IsBackfilled flag set after comparing data in both stores • Client gets success

42. Two Tracking Flags Model: Problems Global Secondary Index (GSI) on low cardinality field: IsBackfilled Hotspots and Limitation Application-specific problem due to 5 GSI /table limit in DynamoDB. There was no deterministic way we could do a perfect migration using this technique. Complexity High Completeness App Consistency Potential data loss Does not Meet Correctness Restart ability Due to Potential Data loss Latency 4 X

43. Design-4: Implemented Model

44. Implemented Model Improvements to Two Tracking flag model • No tracking flag in DynamoDB. No GSI, no hotspots How it works • Phase determines who is master. • Managed in code. • Phase-2 Master is Oracle. DyamoDB is secondary. • Phase-3 Master is DynamoDB. Oracle is secondary.

45. Implemented Model How it works. Minimize chances of rollback. • Read and Write operations work with both data stores. • Verify data in both data stores before marking operation successful. Migration in 1 direction only: Oracle  DynamoDB • Maintain extra flag only in Oracle.IsDDBBackfilled Limitations • Small chance of inaccessible data. • Small chance that search might be inconsistent.

46. Implemented Model – Write Operation Phase 1 Phase 2 Phase 3 Phase 4 Oracle.Write Oracle.Write + Oracle.IsDDBBackfilled = No DynamoDB.Write + Oracle.IsDDBBackfilled = Yes Compare Oracle and DynamoDB Entities B A C K F I L L DynamoDB.Write Oracle.Write + Oracle.IsDDBBackfilled = Yes Compare DynamoDB and Oracle Entities DynamoDB. Write

47. Implemented Model – Read Operation Phase 1 Phase 2 Phase 3 Phase 4 Oracle.Read Oracle.Read; if(Oracle.IsDDBBackfilled == No) { Dynamodb.Write; Oracle.IsDDBBackfilled== Yes DynamoDB.Read; Compare Oracle with DynamoDB Entities } B A C K F I L L DynamoDB.Read if(Entry Not Found) { Oracle.Read Dynamodb.Write; Oracle.IsDDBBackfilled = Yes DynamoDB.Read Compare Oracle with DynamoDB Entities } DynamoDB. Read

48. Implemented Model Complexity High Completeness True App Consistency Yes Potential data loss No Correctness Yes Restart ability Yes Latency Reads 4 x in Phase 2 1 x for Phase 3 Writes 3 x in Phase 2 2 x in Phase 3

50. API Refactoring: Adapter Pattern for Migration

52. Pre-migration Checklist • You have successfully run data migration on sample data

53. Smart Data Migration Tool • Migration traffic throttled down if latencies increase due to increased production traffic • Migration traffic stopped if latency thresholds crossed • Migration traffic scaled up as production traffic decreases

54. Pre-migration Checklist • You have successfully run data migration on sample data • You have tested out rollback scenarios in case you find errors in your migration

55. Migration and Rollback Testing 2 rounds of migration done in pre-prod environment • Happy Case: P1  P2  Backfill P3 P4 • Rollback Case: P1  P2  P1 • Rollback Case: P1  P2  Backfill P3  P2 Checked that there was no loss in data • Controlled test data in Oracle and clean tables in DynamoDB • 5 million rows migrated in backfill

56. Pre-migration Checklist • You have successfully run data migration on sample data • You have tested out rollback scenarios in case you find errors in your migration • You have run stress tests in each phase

57. Stress Testing Tool • Generates data for each session and then uses these values for the entire session • Simulates the same amount of traffic for each API as expected API Distribution CreateDocument A% GetDocumentVersion B% CreateToken C% ResolveToken D% GetMetadata E% SetMetadata F% Search G% --- other API’s --

58. Pre-migration Checklist • You have successfully run data migration on sample data • You have tested out rollback scenarios in case you find errors in your migration • You have run stress tests in each phase • Theoretical and practical throughput numbers for tables and GSI are matching for each phase

59. Sample: Phase 2 IOPS Calculations Writes IOPS Documents Table UserAttributes Table API TPS Size Multiplier Ops Multiplier Table GSI Size Multiplier Ops Multiplier Table GSI 1 GSI 2 GSI 3 CreateDocument 400 8 1 3200 400 1 2 800 400 400 400 CreateDocumentVersion 100 4 1 400 100 1 2 200 200 200 200 Other API - - - Total 3600 500 1000 600 600 600 Read IOPS Documents Table UserAttributes Table API TPS Size Multiplier Ops Multiplier Table GSI Size Multiplier Ops Multiplier Table GSI 1 GSI 2 GSI 3 GetDocument 100 1 1 100 100 1 1 100 Search 100 1 1 100 100 100 100 Other API - - - Total 1000 100 200 100 100 100

63. Pre-migration Checklist • Successful data migration on sample data • Tested out rollback scenarios • Run stress tests in each phase • Theoretical and practical throughput numbers for tables and GSI are matching for each phase • Operation alarms and dashboards for each phase • Clients informed about of latencies to adjust their alarms

64. Timeline • Migration takes time. • Duration 9 months. Schema frozen for 6 months. • Keep time buffer for data migration. • Took 2 months. 2x higher due to higher production traffic. • Code increased by 3x. Code verification took 3 weeks • Each API had 4 behaviors – 1 for each phase. • Application maintained for 2 weeks in Phase 3. • Precaution before moving to Phase 4.

66. Results Migration completed in Happy Path – no reset or rollbacks Data migrated without a single error or loss of data 500 M entities migrated without a single client issue

67. Results (Reason to Migrate) Availability: Maintained 100% availability for over 1 year Scalability: Application managing 10x more documents today; ample room to keep growing Operations overhead: No dedicated DBA; application dev team managing DynamoDB

68. Suggestions • Run migration from a different Oracle server • Prevent locking or overloading the primary • Avoid GSI on low cardinality attribute • Consider moving search to Amazon CloudSearch / Amazon Elasticsearch • Do not over scale throughput for migration. • DynamoDB does not reduce partitions when you descale your throughput

69. Q & A

70. Thank you!

71. Remember to complete your evaluations!

AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from Oracle to Amazon DynamoDB (ARC404)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from Oracle to Amazon DynamoDB (ARC404)

Similar to AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from Oracle to Amazon DynamoDB (ARC404) (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from Oracle to Amazon DynamoDB (ARC404)