Highlights of AWS ReInvent 2023 in Las Vegas. Contains new announcements, deep dive into existing services and best practices, recommended design patterns.
2. Amazon S3
• Data stored in S3 is sharded into multiple chunks, with multiple parity chunks
(duration coding) created and spread across variety of hard drives, across AZ,
racks, facilities. Get object reconstructs this data back.
• On hard drive fails, the entire object is not lost as we have enough shards to
restore it.
• Since customer data is spread across multiple facilities on hard drives which
allows to exercise huge parallels, of different resources to serve the requests.
• Previously client connected to single IP address (load balancer) from endpoint
DNS, but now based on single DNS name, multiple IP addresses are getting
returned, one primary and others secondary. The IP addresses representing
list of load balancers allow the request to be parallelize across many hosts,
thus paralyzing across endpoints.
• S3 supports multipart uploads which allows to upload objects in chunks.
5. S3 Object design for scaling
• S3 is designed as blob store and it scales based on prefixes.
• A prefix is like directory where they add the beginning of the key.
• The entire key space of an entire region is chopped up into individual
scaling blocks called prefix. Each prefix has 5500 TPS on read and 3500 TPS
on write.
• An entropy which is a salting or hashing mechanism to insert random
characters, is recommended to be added to the start of the prefix. Hence
the keys will end up in different scaling prefixes in S3, thus supporting
higher TPS.
• Aim for large objects that are 2–16 MB+ which reduces TPS and data
roundtrips.
• Prefer widely adopted object formats, like columnar formats which provide
compact sizes, better performance and predicate pushdowns
• Prefer open table formats e.g. Apache Iceberg, Apache Hudi, Delta instead
of traditional directory based hierarchiical format.
6. S3 Express One Zone (New)
• Scale to millions of requests per minute instantly without
throttling.
• Introduces S3 directory buckets which come pre-scaled
(100K TPS) and are scaled by bucket all at once (enable
high transaction workloads).
• One Zone architecture co-locates the storage with the
compute.
• Provides session based access for faster authorization.
• Ideal for request intensive applications and applications
sensitive to latency & tails.
9. Apache Iceberg
• Iceberg is an open standard for tables with SQL behavior.
• It has ACID semantics.
• High-performance design for S3.
• Iceberg provides a table level interface to the S3 storage.
• It has MERGE commands, transaction data lake queries, time
travel, hidden partitioning, compaction and optimization.
• Ability to add entropy at the start of the prefixes at start of
the table.
• FileIO abstraction over Hadoop FileSystem for more seamless
match for blob store semantics.
10. Amazon DynamoDB
• Simple primary key (partition key) & composite primary key (partition key +
sort key).
• When table is created in DymamoDB it segments data into different machines
as partitions (10 GB or smaller) and partition key is used to hash out the
partition.
• A partition has a Replica group with three storage nodes (1 leader, 2 replicas).
All writes originate at leader while reads can go to any node.
• To access item by only sort key we create secondary index on sort key
(secondary primary key), were data from the main table is copied into a
secondary index. We can specify new partition key + sort key and choose which
attributes are projected into secondary index (ALL, KEYS_ONLY, INCLUDE).
• Writes cannot be performed using secondary index.
• DynamoDB infrastructure shared across an entire Region with multiple
tenants.
• Billing - (Write) 1 WCU per 1KB written and (Read) 1 RCU per 4KB read.
11.
12. DynamoDB – Data Modeling
• Decide on your access patterns - write/read, conditions, frequency.
• Know your domain – Contraints, Data distribution, Items size.
• Know the DyanmoDB API - Single Item actions, Query (Find many),
Batch/Transaction operations.
• Try to keep static data outside DynamoDB.
• Not all problems break down neatly into database queries.
• Do the Math on consumption of capacity units.
• All write operations allow ConditionExpressions, when evaluated to False,
write operation is rejected.
• Structure the items to allow for ConditionExpressions to maintain
conditions.
• Avoid the read-modify-write cycle.
14. DynamoDB – Data Modeling
• Process multiple operations in single, atomic request.
• Structure your items to allow for direct operations.
• Each operation could have a condition expression.
• Single request transactions (not long running) supported
(TransactWriteItem).
• For long-running transactions, use client-side transactions
/ Step Functions.
• Use Amazon EventBridge or Amazon DynamoDB Streams
for asynchronous updates.
15. DynamoDB – Complex Filtering
•How to filter on 2+ attributes, each of which is
optional ?
•Fetch all / client-side filtering when target dataset is
small.
•Reduced projection into secondary index, when larger
items but with small amount of filterable attributes.
•Reduce search space where possible by requiring an
attribute (in search) that is useful for filtering.
•Integrate with an external system if you must.
16. Amazon RedShift (New)
• Multidimensional Data Layout allows to speed up repeatable
queries by sorting the table based on the incoming query filters.
• Amazon Redshift data sharing allows to securely share read access
to live data across Redshift clusters, workgroups, AWS accounts and
regions without manually moving or copying the data.
• Multi-data warehouse allows writes through data sharing to
different data warehouse.
• Redshift Serverless with AI-driven scaling and optimizations
(performance/cost).
• Directly ingest streaming data into data warehouse (materialized
view) from Kinesis Data streams and Amazon MSK.
17. Amazon RedShift (New)
• New SQL syntax Merge, Rollup, Cubing/Grouping sets, Qualify.
• Glue Data Catalog views are created once and can be queried from
Spark on EMR on EC2, RedShift, Athena without any access to
underlying S3 tables.
• Redshift ML support allows to use SQL to create and train LLM models.
• Redshift can query LLMs from SageMaker JumStart (endpoint) for
remote inference.
• Generative SQL in RedShift allows to generate SQL code, provide
recommendations (utilizing schema metadata, past query history) and
get insights.
• IAM identity center is unified identity across all AWS analytics services.
18. Zero-ETL Integrations
• Transactional data for business use cases need to be exposed
for analytics and BI.
• Zero ETL integrations with Amazon Redshift from Aurora
Postgres, RDS for MySQL, DynamoDB allows to send data to
Redshift cluster directly in few seconds (no data pipeline req).
• OpenSearch Service zero-ETL integration with Amazon S3,
allows to query data stored directly in S3 (no pipelines to
transfer data.)
• Amazon DynamoDB zero-ETL integration with OpenSearch
Service. (Data is replicated/duplicated into the cluster).
19. Amazon Kinesis Data Stream
• Kinesis Data Stream is real time streaming service and provides
short/long options for retaining streaming data.
• Kinesis Data Stream Service is billed at 25 KB per payload unit, so
compress and aggregate messages.
• Kinesis Data Stream Service also provides producer (KPL) and
consumer (KCL) libraries which auto compresses and aggregates
messages, helps with sharing events and checkpointing.
• Use enhanced fanout to get 2MB per second read throughput
when multiple consumers are reading from same stream.
• Start with on-demand mode and use provision mode only if on-
demand cannot provide the capacity required.
20. Amazon API Gateway
• Rest API and HTTP API. Always use HTTP API if you need no extra features.
• Types of APIs - Regional API, Edge-optimized API, Private API.
• Rest API Gateway supports authorization (IAM, Cognito or Lambd Authorizer), HTTP
API Gateway also supports JWT authorizer.
• Built in caching for API gateway which spins cache cluster which is charged hourly.
• Default throttling limits 10K requests per sec & 5K bursts per account.
• Custom throttling applies to stage, resource and method.
• API gateway supports multiple stages and stage variables. (prefer custom domains).
• API Gateway automatically creates canary releases; it deploys limited traffic to canary.
• Resource policies allows API access based on conditions (region,time,account, IP addr)
• AWS WAF protects APIs from XSS, block requests from IP/country, match patterns in
HTTP headers, block actions from specific user agents.
21. Lambda Functions
• Make functions modular and single purpose. (less code to load,
custom security).
• Single lambda function to catch API requests which branches
internally.
• Too many lambda functions can be an operational burden and
too few could be too broad security and resource issues.
• Group functions by bounded contexts, code dependencies, scope
of permissions etc.
• Distribution applications need Orchestration (StepFunctions)
and/or Choreography (EventBridge) for communication which
should be configured than writing own code.
22. Lambda Functions
•The fastest and lowest-cost Lambda function is the one
you remove and replace with a built-in integration.
•API Gateway using VTL can directly invoke
StepFunctions, DynamoDB, Queues and many other
AWS Services.
•EventBridge pipe can connect DynamoDB streams with
EventBridge Bus without the need for Lambda to
connect.
23. Lambda Functions
• Lambda exposes memory configuration control (128MB-
10GB), were memory allocation increases causes
proportional increase of CPU power and Network
bandwidth.
• ARM based processors (Graviton2) has 34% better
performance over x86 based AWS lambda.
• Lambda Power Tuning Tool and AWS Compute Optimizer.
• Cold start is time taken to bring new execution
environment in response to request/event. Varies from
<100ms to > 1sec.
25. Example Lambda Function
# Init code, outside handler
import boto3 # AWS SDK for Python
import cheese_burger
pre_handler_secret_getter(data)
function handler(event, context){
# Inside handler code
burger.no_bacon(event[“extras”]){
sub_function.add_bacon(event) return warning
} else {return success}
}
# Init code, outside handler
function pre_handler_secret_getter(data){ . . . }
# Business logic code, outside handler
function sub_function.add_bacon(data){ . . . }
26. Lambda – Prehandler INIT code Best Practices
• Import only what you need. Selectively import certain packages.
• Optimize dependencies, SDKs, and other libraries to the specific modules
required
• Reduce deployment package size.
• Avoid “monolithic” functions.
• Lazy initialize shared libraries based on necessity. (initialize S3 library in
function which its used).
• Handle reconnections in handler (not in init), keep alive in AWS SDKs.
• Keep state data (not secrets) which you need for subsequent invocations.
• Use provisioned concurrency or SnapStart (for Java applications).
• Code parsing large files impacts cold starts.
27.
28.
29. Writing Lambda - Best Practices
• Handler Layer - Parse Config, env variables, input validation,
authentication checks, call domain layer, serialize output.
• Domain Layer - Business logic only and can be shared by multiple
handlers. Calls integration layer and unaware of underlying DB/API.
• Integration layer - Adapter pattern (interface and implementations) and
contains API/DB code.
• Testing is isolated for each layer.
• Extend the process stream lambda handler with defaults.
• Python tools - Tuna (Import time), Py-spy (most freq code path),
Pyinstrument (select code areas).
30. Lambda - Best Practices
• Avoid using Java reflections.
• Upgrade your runtime version.
• Optimize logging by using structured JSON logging (EMF).
• Set retention policies on log groups.
• Control log level granularity.
• Separate log groups where retention policies vary.
• PowerTools for AWS Lambda helps to automate bunch of best practices
guidance in the function.
• Turn on CloudWatch Lambda Insights to investigate for brief period
(charged by usage).
31. Lambda - Concurrency
• Concurrency is number of requests that the function is serving at any
given time.
• A single AWS Lambda execution environment can process only a single
event at a time.
• Concurrent requests require new execution environments to be
created.
• Reserved concurrency allows to set the maximum concurrency for a
given function.
• Provisioned concurrency allows to set a minimum number of (pre-
warm) execution environments ready for usage. (Atleast 60% utilization
of function makes it cost effective).
32. AWS StepFunctions
• Step Functions first and always !
• Pay as you use, Fully managed, Auto scaling,
• Build Workflow (drag/drop), select actions and decision logics
(choice,parallel,retry)
• Export workflow JSON in ASL (Amazon States Language) leveraged
by Infra scripts.
• Integrates 220 AWS services, directly running their SDK actions.
• Supports calling external dependencies (APIs) using HTTPs
endpoints.
• Test input and output of each task for each request.
33. StepFunctions – Standard vs Express Workflow
•Standard Workflow – Long-lasting (1 yr),
Asynchronous, Exactly Once, Charged by no of state
transitions.
•Express Workflow – High throughput, At least once
(can have duplicates), Short duration (5 min), Cost-
effective (memory allocation & time to complete),
Synchronous, Asynchronous.
•Use standard workflow only when execution takes
over 5 mins or require exactly-once execution.
34. StepFunctions Task Tokens
•Task Tokens can pause a Step Function task
indefinitely until the task token is returned (by
called service).
•Only supported in standard step functions.
•Each task token is unique.
•Set a timeout for the task, extend the heartbeat
interval when a task takes longer.
35. StepFunctions Patterns
•Nester: Extract workflows which can run as express
workflow from standard workflow as nested workflows.
•Use Intrinsic functions for data transformations (arrays,
math, string, JSON, UUID).
•Reduce state transitions and duration with callback
pattern. Emit milestone events which invoke external
microservices, emit error on no response, emit timeout.
•Test API – Test individual task without running entire
workflow.
39. StepFunctions Failure Handling
•Use inherent error handling to roll back
sequential system failures for long running
transactions.
•Circuit breaker - Prevent caller service from
retrying another callee service call that has
previously caused repeated timeouts or failures.
•Redrive a workflow from the point of failure. (no
need to wait for long running tasks to re-run).
41. StepFunctions Parallel
•Parallel state executes multiple branches of
steps using the same input.
•Dynamic Parallelism executes same steps for
multiple entries of an input array/map (max 40).
•Distribute Map State allows to execute 10K
parallel executions.
•Overcome payload limits by breaking workloads
down into multiple child workflows.
43. EventBridge
• Amazon EventBridge is a serverless event bus that makes it easy to
connect applications with data from a variety of (multiple) sources.
• EventBridge allows to share events between multiple microservices each
emitting events.
• The filter and routing rules are core of Eventbridge as it enables to
identify which events need to be sent to which targets with any
transformations.
• EventBridge is best for microservices-scale refined events ingestion and
routing.
• Event payload size of up to 256 KB and order of events is not guaranteed.
• EventBridge archive allows to store indefinitely.
46. EventBridge - API Destinations
• API Destinations are HTTP endpoints that can be configured as
event targets of a rule.
• API destination consists of connection (BasicAuth, API Key,
OAuth) and Endpoint (Custom/Partner endpoint).
• They allow to natively integrate with applications using RESTful
API calls, eliminating the need for Lambda functions.
• EventBridge keeps the credentials in Secrets Manager and cost
included in EventBridge.
• API rate limit is 1 to 300 invocations per sec, 5 sec max timeout,
built in retry after timeout.
47. EventBridge
• An EventBridge event archive is a collection of events published onto a bus
that satisfy a filter pattern to archive.
• EventBridge supports replay of events from an archive for a given time
interval. Use multiple single-purpose archives instead of one archive for all
events.
• Have a status field in the custom event metadata which identifies a
retriable versus not retriable event.
• Separate the external communications with different event bus
(gatekeeper/external) within the bounded system context.
• Gatekeeper bus is a custom event bus that acts as the guarded event gate
of the application boundary, controlling over the flow of events in and out
of a domain boundary.
49. Vector Embeddings
• Vectors are fixed-length lists of numbers which encode all types of data like text, images,
media, graphs etc.
• Vectors are data points which capture the meaning and context of an asset/data.
• Vectors enable to carry out similarity search as a mathematical function.
50. OpenSearch as Vector Database
• Vector databases allow to store and index vectors and metadata, providing the
ability to use low-latency queries to discover assets by degree of similarity.
• OpenSearch, a distributed search and analytics platform, provides the vector
engine feature which extends it to provide contextually relevant information,
and ability to search across large set of vectors.
• OpenSearch supports k-NN algorithms like HNSW (Hierarchical Navigable Small
Worlds) and IVF (Inverted File System) for searching vectors.
• Select memory optimized EC2 instances (e.g. R5 family) for memory intensive
vector searches.
• Improve batch indexing performance by disabling refresh intervals and
disabling replicas (maintain offsite data copy).
• To improve search performance reduce segment count and use warm up index.
51. Amazon BedRock
• Bedrock is platform for accessing range of foundation models
(Amazon Titan, Jurassic-2,Claude-2,Command,Llama-2, Stable
diffussion) using single API.
• Bedrock provides an API which connects with a foundation models
and able to get responses, providing playground for testing.
• Privately customize FMs based on organization specific data.
• Ability to build agents that execute complex business tasks by
dynamically invoking APIs.
• Best performance and security without managing Infrastructure.
• Billing is based on number of input and output tokens in millions
which is different for each model.
52. Amazon SageMaker JumpStart
• JumStart allows to access the long-tail of open and closed
models; customize and optimize models and deployment.
• JumpStart allows to accelerate the time to fine-tune and (one-
click) deploy over 300 latest open source models.
• Supports API for python SDK based workflow.
• JumpStart allows to bring ML applications into the market using
pre-built solutions, ML models and algorithms from popular
model sites (PyTorch hub, TensorFlow hub, Hugging Face).
• Guides through the entire ML workflow for selected model with
examples using notebooks.
53. Amazon SageMaker Canvas
• SageMaker Canvas is no-code workspace for business teams to
build, customize and deploy ML and generative AI models.
• SageMaker Canvas integrates with AI services such as BedRock
(foundational models), Amazon Textract (intelligent document
processing service), Amazon Comprehend (NLP,sentiment
analysis), Amazon Rekognition (computer vision).
• SageMaker Canvas provides ready to use pre-trained models
including Foundation Models e.g. BedRock - Claude, Jurassic-2,
Command, Amazon Titan.
• SageMaker Jumpstart provides publicly available models Falcon,
Flan-T5, MPT, Dolly v2.
54. Amazon SageMaker Canvas
• SageMaker Canvas allows to prepare training data, build custom
models, train and deploy models.
• SageMaker Canvas allows to share models with SageMaker Studio
(IDE) users, who can customize it further using code.
• SageMaker Canvas allows to compare model response side by side.
• Extract insights from documents using generative AI.
• Allows to create a fine-tuned model from multiple Foundation
models (max 3) and train using custom dataset (provide
input/output columns to train). Ability to view stats and test the
models as well.
55. Amazon SageMaker Canvas
• Canvas offers 50+ data connectors to prepare data for training.
• Data insights powered by ML helps to decide if data needs to be
transformed/modified before used as training data.
• Built-in visualizations such as correlation metrics, charts etc help to analyze data.
• Supports 300+ build in (and custom code snippet) transformations to modify the
data to build machine learning models.
• Data preparation and visualizations can also be done using natural language.
• Preparation requires machine instance type and data can be saved into S3.
• Canvas supports different model types (Predictive analysis, text analysis, image
analysis and fine-tuned foundational model) for custom models.
• Canvas can generate highly accurate model predictions supporting patterns like
What-if analysis, automate predictions, one-click model deployment and share
predictions to Quicksight.
58. Amazon CodeWhisperer
•AI coding companion integrated in your IDE to
enhance developer productivity.
•Provides code recommendations on snippet or block
of code based on comments in natural language.
•Scans code to find vulnerabilities.
•Flags code that resembles open-source training data
or filter by default.
•Provides CLI completions and natural-language-to-
bash translation in the command line.
59. Amazon Q (Announcement)
• Explore AWS capabilities, learn AWS technologies.
• Expert in AWS well architected patterns, best practices, solution
implementations
• Helps to troubleshoot application errors with analysis and resolution.
• Troubleshoot network connections, resolving connectivity issues.
• Provides optimal solutions for use cases.
• Inside IDE with CodeWhisperer for developers, draft plans.
• Code transformation - Language version upgrades. (JDKs)
• Answers business questions after connecting with business apps (plugins).
• Integrates with QuickSight (Charts) and Amazon Connect (Support).
61. AWS Lake Formation
• AWS Lake Formation centralizes the governance of data analytics
workloads, as well as provide fine grain access control.
• Controls access to both data and metadata.
• Supports tag-based access control (TBAC) which helps in decreasing
access management costs.
• Database style access grants/revokes allows to express fine grained
access controls (table, column, row, cell level).
• Support decentralized data ownership, ownership delegation and audit
permissions/access through CloudTrail.
• AWS Lake Formation integrates with AWS QuickSight. Glue, Athena,
EMR, RedShift, SageMaker, Third-Party tools.
62. AWS ReInvent 2023
• Register for events beforehand – AWS Events App.
• WorkShops/Chalk Talks/Tech Talks more useful. (Avoid KeyNotes)
• Carry laptops on days you attending Workshop.
• Ready 10-15 minutes before event.
• Reserve 2-3 morning hours of one day to attend Expo.
• Avoid long distance events from different hotels.
• No need to take pictures/notes for recorded events.
• Don’t miss lunch hours between attending events.