SlideShare a Scribd company logo
1 of 27
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cecilia Deng
Software Engineer
08 04 2016
Real-time Data Processing Using
AWS Lambda
Agenda
AWS Services for Real Time Data
AWS Lambda
Amazon Kinesis
Architecture & Workflow for Streaming Data Processing
Streaming Data Processing Demo
Best Practices in Building Data Processing Solutions
AWS Services for Data
Processing
Amazon
Kinesis
AWS
Lambda
AWS Lambda: Overview
Lambda functions: a piece of code with stateless execution
Triggered by events:
• Direct Sync and Async API calls
• AWS Service integrations
• 3rd party triggers
• And many more …
Makes it easy to:
• Perform data-driven auditing, analysis, and notification
• Build back-end services that perform at scale
AWS Lambda: Serverless Compute in the Cloud
Compute service that runs your code in response to events
Easy to author, deploy, maintain, secure and manage
Allows for focus on business logic
Stateless, event-driven code with native support for
Node.js, Java, and Python languages
Compute & Code without managing infrastructure
like EC2 instances and auto scaling groups
Makes it easy to Build back-end services that
perform at scale
High performance at any scale;
Cost-effective and efficient
No Infrastructure to
manage
Pay only for what you use: Lambda
automatically matches capacity to
your request rate. Purchase compute
in 100ms increments.
Bring Your
Own Code
“Productivity focused compute platform to build powerful, dynamic,
modular applications in the cloud”
Run code in a choice of
standard languages. Use
threads, processes, files, and
shell scripts normally.
Focus on business logic, not
infrastructure. You upload code;
AWS Lambda handles everything
else.
Benefits of AWS Lambda for building a server-
less data processing engine
1 2 3
Amazon Kinesis: Overview
Managed services for streaming data ingestion and processing
Amazon Kinesis
Streams
Build your own
custom applications
that process or
analyze streaming
data
Amazon Kinesis
Firehose
Easily load massive
volumes of streaming
data into Amazon S3
and Redshift
Amazon Kinesis
Analytics
Easily analyze data
streams using
standard SQL queries
Amazon Kinesis: Streaming data done the AWS way
Makes it easy to capture, deliver, and process real-time data streams
Pay as you go, no up-front costs
Right services for your specific use cases
Real-time latencies
Easy to provision, deploy, and manage
Benefits of Amazon Kinesis for stream data
ingestion and continuous processing
Real-time Ingest
Highly Scalable
Durable
Replay-able Reads
Continuous Processing
GetShardIterator and GetRecords(ShardIterator)
Allows checkpointing/ replay
Enables multi concurrent processing
KCL, Firehose, Analytics, Lambda
Enable data movement into many Stores/ Processing Engines
Managed Service
Low end-to-end latency
Data Processing/Streaming
Architecture & Workflow
Smart
Devices
Click
Stream
Log
Data
AWS Lambda and Amazon Kinesis integration
How it Works
Stream-based model:
▪ Lambda polls the stream and batches available records
▪ Batches are passed for invocation to Lambda through function
param
▪ Kinesis mapped as Event source in Lambda
Synchronous invocation:
▪ Lambda invoked as synchronous RequestResponse type
▪ Lambda function is executed at least once
▪ Each shard blocks on in order synchronous invocation
Event structure:
▪ Event received by Lambda function is a collection of records from
Kinesis stream
▪ Customer defines max batch size, not effective batch size
Streaming Architecture Workflow: Lambda + Kinesis
Data Input Kinesis Action Lambda Data Output
IT application activity
Capture the
stream
Audit
Process the
stream
SNS
Metering records Condense Redshift
Change logs Backup S3
Financial data Store RDS
Transaction orders Process SQS
Server health metrics Monitor EC2
User clickstream Analyze EMR
IoT device data Respond Backend endpoint
Custom data Custom action Custom application
Common Architecture: Lambda + Kinesis
Real Time Data Processing
Amazon
Kinesis
AWS
Lambda 1
Amazon
CloudWatch
Amazon
DynamoD
B
AWS
Lambda 2 Amazon
S3
1. Real-time event data sent to Amazon
Kinesis, allows multiple AWS Lambda
functions to process the same events.
2. In AWS Lambda, Function 1 processes
and aggregates data from incoming
events, then stores result data in
Amazon DynamoDB
3. Lambda Function 1 also sends values to
Amazon CloudWatch for simple
monitoring of metrics.
4. In AWS Lambda function, Function 2
does data manipulation of incoming
events and stores results in Amazon S3
https://s3.amazonaws.com/awslambda-reference-
architectures/stream-processing/lambda-refarch-
streamprocessing.pdf
Common Architecture: Lambda + Kinesis
Data Processing for Data Storage/Analysis
Use Lambda to process and
“fan out” to other AWS services
i.e. Storage, Database, and
BI/analytics
Amazon Kinesis stream can
continuously capture and
store terabytes of data per
hour from hundreds of
thousands of sources
Grant AWS Lambda
permissions for the relevant
stream actions via IAM
(Execution Role) during
function creation
IAM
IAM
IAM
Data Processing:
Best Practices & Tips
Best Practices
Creating a Kinesis stream
Streams
▪ Made up of Shards
▪ Each Shard ingests data up to 1MB/sec
▪ Each Shard emits data up to 2MB/sec
Data
▪ All data is stored for 24 hours, Replay data inside of 24hr window
▪ A Partition Key is supplied by producer and used to distribute the PUTs across
Shards
▪ A unique Sequence # is returned to the Producer upon a successful PUT call
▪ Make sure partition key distribution is even to optimize parallel throughput
Best Practices
Creating Lambda functions
Memory:
▪ CPU and disk proportional to the memory configured
▪ Increasing memory makes your code execute faster (if CPU bound)
▪ Increasing memory allows for larger record sizes processed
Timeout:
▪ Increasing timeout allows for longer functions, but more wait in case of
errors
Retries:
▪ For Kinesis, Lambda retries until the data expires (default 24 hours)
Permission model:
• The execution role defined for Lambda must have permission to access
the stream
Best Practices
Configuring Lambda with Kinesis as an event source
Batch size:
▪ Max number of records that Lambda will send to one invocation
▪ Not equivalent to effective batch size
▪ Effective batch size is every 250 ms
MIN(records available, batch size, 6MB)
▪ Increasing batch size allows fewer Lambda function invocations with
more data processed per function
Best Practices
Configuring Lambda with Kinesis as an event source
Starting Position:
▪ The position in the stream where Lambda starts reading
▪ Set to “Trim Horizon” for reading from start of stream (all data)
▪ Set to “Latest” for reading most recent data (LIFO) (latest data)
Best Practices
Attaching a Lambda function to a Kinesis Stream
▪ One Lambda function concurrently invoked per Kinesis shard
▪ Increasing # of shards with even distribution allows increased concurrency
▪ Lambda blocks on ordered processing for each individual shard
▪ This makes duration of the Lambda function directly impact throughput
▪ Batch size may impact duration if the Lambda function takes longer to process
more records
… …
Source
Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Kinesis by adding shards
Waits for responsePolls a batch
Best Practices
Tuning throughput
▪ Maximum theoretical throughput :
# shards * 2MB / Lambda function duration (s)
▪ Effective theoretical throughput :
# shards * batch size (MB) / Lambda function duration (s)
▪ If put / ingestion rate is greater than the theoretical throughput, your processing
is at risk of falling behind
… …
Source
Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Kinesis by splitting or merging shards
Waits for responsePolls a batch
Best Practices
Tuning throughput
▪ Retry execution failures until the record is expired
▪ Retry with exponential backoff up to 60s
▪ Throttles and errors impacts duration and directly impacts throughput
▪ Effective theoretical throughput :
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
… …
Source
Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Kinesis by splitting or merging shards
Receives errorPolls a batch
Receives error
Receives success
Best Practices
Monitoring Kinesis Streams with Amazon Cloudwatch Metrics
•GetRecords (effective throughput) : bytes, latency, records etc
•PutRecord : bytes, latency, records, etc
•GetRecords.IteratorAgeMilliseconds: how old your last processed records were.
If high, processing is falling behind. If close to 24 hours, records are close to
being dropped.
Best Practices
Monitoring Lambda functions
•Monitoring: available in Amazon CloudWatch Metrics
• Invocation count
• Duration
• Error count
• Throttle count
•Debugging: available in Amazon CloudWatch Logs
• All Metrics
• Custom logs
• RAM consumed
• Search for log events
Best Practices
Create different Lambda functions for each task, associate to same Kinesis stream
Log to
CloudWatch
Logs
Push to SNS
Get Started: Data Processing with AWS
Next Steps
1. Create your first Kinesis stream. Configure hundreds of thousands
of data producers to put data into an Amazon Kinesis stream. Ex.
data from Social media feeds.
2. Create and test your first Lambda function. Use any third party
library, even native ones. First 1M requests each month are on us!
3. Read the Developer Guide, AWS Lambda and Kinesis Tutorial, and
resources on GitHub at AWS Labs
• http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html
• https://github.com/awslabs/lambda-streams-to-firehose
Thank You!
Cecilia Deng
Software Engineer

More Related Content

What's hot

Getting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and ServerlessGetting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and ServerlessAmazon Web Services
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Amazon Web Services
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to ServerlessNikolaus Graf
 
Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost ManagementAmazon Web Services
 
Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Apigee | Google Cloud
 
AWS tutorial-Part54:AWS Route53
AWS tutorial-Part54:AWS Route53AWS tutorial-Part54:AWS Route53
AWS tutorial-Part54:AWS Route53SaM theCloudGuy
 

What's hot (20)

Getting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and ServerlessGetting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and Serverless
 
AWS 101
AWS 101AWS 101
AWS 101
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
AWS Route53
AWS Route53AWS Route53
AWS Route53
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Deep Dive Amazon EC2
Deep Dive Amazon EC2Deep Dive Amazon EC2
Deep Dive Amazon EC2
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Introducing AWS Fargate
Introducing AWS FargateIntroducing AWS Fargate
Introducing AWS Fargate
 
Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost Management
 
AWS IAM
AWS IAMAWS IAM
AWS IAM
 
Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Serverless computing with AWS Lambda
Serverless computing with AWS Lambda
 
AWS tutorial-Part54:AWS Route53
AWS tutorial-Part54:AWS Route53AWS tutorial-Part54:AWS Route53
AWS tutorial-Part54:AWS Route53
 

Similar to Real-time Data Processing Using AWS Lambda

Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaAmazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Amazon Web Services
 
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)Amazon Web Services
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaAmazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Real-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaReal-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaAmazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 

Similar to Real-time Data Processing Using AWS Lambda (20)

Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
 
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Real-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaReal-Time Processing Using AWS Lambda
Real-Time Processing Using AWS Lambda
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Real-time Data Processing Using AWS Lambda

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cecilia Deng Software Engineer 08 04 2016 Real-time Data Processing Using AWS Lambda
  • 2. Agenda AWS Services for Real Time Data AWS Lambda Amazon Kinesis Architecture & Workflow for Streaming Data Processing Streaming Data Processing Demo Best Practices in Building Data Processing Solutions
  • 3. AWS Services for Data Processing Amazon Kinesis AWS Lambda
  • 4. AWS Lambda: Overview Lambda functions: a piece of code with stateless execution Triggered by events: • Direct Sync and Async API calls • AWS Service integrations • 3rd party triggers • And many more … Makes it easy to: • Perform data-driven auditing, analysis, and notification • Build back-end services that perform at scale
  • 5. AWS Lambda: Serverless Compute in the Cloud Compute service that runs your code in response to events Easy to author, deploy, maintain, secure and manage Allows for focus on business logic Stateless, event-driven code with native support for Node.js, Java, and Python languages Compute & Code without managing infrastructure like EC2 instances and auto scaling groups Makes it easy to Build back-end services that perform at scale
  • 6. High performance at any scale; Cost-effective and efficient No Infrastructure to manage Pay only for what you use: Lambda automatically matches capacity to your request rate. Purchase compute in 100ms increments. Bring Your Own Code “Productivity focused compute platform to build powerful, dynamic, modular applications in the cloud” Run code in a choice of standard languages. Use threads, processes, files, and shell scripts normally. Focus on business logic, not infrastructure. You upload code; AWS Lambda handles everything else. Benefits of AWS Lambda for building a server- less data processing engine 1 2 3
  • 7. Amazon Kinesis: Overview Managed services for streaming data ingestion and processing Amazon Kinesis Streams Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose Easily load massive volumes of streaming data into Amazon S3 and Redshift Amazon Kinesis Analytics Easily analyze data streams using standard SQL queries
  • 8. Amazon Kinesis: Streaming data done the AWS way Makes it easy to capture, deliver, and process real-time data streams Pay as you go, no up-front costs Right services for your specific use cases Real-time latencies Easy to provision, deploy, and manage
  • 9. Benefits of Amazon Kinesis for stream data ingestion and continuous processing Real-time Ingest Highly Scalable Durable Replay-able Reads Continuous Processing GetShardIterator and GetRecords(ShardIterator) Allows checkpointing/ replay Enables multi concurrent processing KCL, Firehose, Analytics, Lambda Enable data movement into many Stores/ Processing Engines Managed Service Low end-to-end latency
  • 10. Data Processing/Streaming Architecture & Workflow Smart Devices Click Stream Log Data
  • 11. AWS Lambda and Amazon Kinesis integration How it Works Stream-based model: ▪ Lambda polls the stream and batches available records ▪ Batches are passed for invocation to Lambda through function param ▪ Kinesis mapped as Event source in Lambda Synchronous invocation: ▪ Lambda invoked as synchronous RequestResponse type ▪ Lambda function is executed at least once ▪ Each shard blocks on in order synchronous invocation Event structure: ▪ Event received by Lambda function is a collection of records from Kinesis stream ▪ Customer defines max batch size, not effective batch size
  • 12. Streaming Architecture Workflow: Lambda + Kinesis Data Input Kinesis Action Lambda Data Output IT application activity Capture the stream Audit Process the stream SNS Metering records Condense Redshift Change logs Backup S3 Financial data Store RDS Transaction orders Process SQS Server health metrics Monitor EC2 User clickstream Analyze EMR IoT device data Respond Backend endpoint Custom data Custom action Custom application
  • 13. Common Architecture: Lambda + Kinesis Real Time Data Processing Amazon Kinesis AWS Lambda 1 Amazon CloudWatch Amazon DynamoD B AWS Lambda 2 Amazon S3 1. Real-time event data sent to Amazon Kinesis, allows multiple AWS Lambda functions to process the same events. 2. In AWS Lambda, Function 1 processes and aggregates data from incoming events, then stores result data in Amazon DynamoDB 3. Lambda Function 1 also sends values to Amazon CloudWatch for simple monitoring of metrics. 4. In AWS Lambda function, Function 2 does data manipulation of incoming events and stores results in Amazon S3 https://s3.amazonaws.com/awslambda-reference- architectures/stream-processing/lambda-refarch- streamprocessing.pdf
  • 14. Common Architecture: Lambda + Kinesis Data Processing for Data Storage/Analysis Use Lambda to process and “fan out” to other AWS services i.e. Storage, Database, and BI/analytics Amazon Kinesis stream can continuously capture and store terabytes of data per hour from hundreds of thousands of sources Grant AWS Lambda permissions for the relevant stream actions via IAM (Execution Role) during function creation IAM IAM IAM
  • 16. Best Practices Creating a Kinesis stream Streams ▪ Made up of Shards ▪ Each Shard ingests data up to 1MB/sec ▪ Each Shard emits data up to 2MB/sec Data ▪ All data is stored for 24 hours, Replay data inside of 24hr window ▪ A Partition Key is supplied by producer and used to distribute the PUTs across Shards ▪ A unique Sequence # is returned to the Producer upon a successful PUT call ▪ Make sure partition key distribution is even to optimize parallel throughput
  • 17. Best Practices Creating Lambda functions Memory: ▪ CPU and disk proportional to the memory configured ▪ Increasing memory makes your code execute faster (if CPU bound) ▪ Increasing memory allows for larger record sizes processed Timeout: ▪ Increasing timeout allows for longer functions, but more wait in case of errors Retries: ▪ For Kinesis, Lambda retries until the data expires (default 24 hours) Permission model: • The execution role defined for Lambda must have permission to access the stream
  • 18. Best Practices Configuring Lambda with Kinesis as an event source Batch size: ▪ Max number of records that Lambda will send to one invocation ▪ Not equivalent to effective batch size ▪ Effective batch size is every 250 ms MIN(records available, batch size, 6MB) ▪ Increasing batch size allows fewer Lambda function invocations with more data processed per function
  • 19. Best Practices Configuring Lambda with Kinesis as an event source Starting Position: ▪ The position in the stream where Lambda starts reading ▪ Set to “Trim Horizon” for reading from start of stream (all data) ▪ Set to “Latest” for reading most recent data (LIFO) (latest data)
  • 20. Best Practices Attaching a Lambda function to a Kinesis Stream ▪ One Lambda function concurrently invoked per Kinesis shard ▪ Increasing # of shards with even distribution allows increased concurrency ▪ Lambda blocks on ordered processing for each individual shard ▪ This makes duration of the Lambda function directly impact throughput ▪ Batch size may impact duration if the Lambda function takes longer to process more records … … Source Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Kinesis by adding shards Waits for responsePolls a batch
  • 21. Best Practices Tuning throughput ▪ Maximum theoretical throughput : # shards * 2MB / Lambda function duration (s) ▪ Effective theoretical throughput : # shards * batch size (MB) / Lambda function duration (s) ▪ If put / ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind … … Source Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Kinesis by splitting or merging shards Waits for responsePolls a batch
  • 22. Best Practices Tuning throughput ▪ Retry execution failures until the record is expired ▪ Retry with exponential backoff up to 60s ▪ Throttles and errors impacts duration and directly impacts throughput ▪ Effective theoretical throughput : ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) … … Source Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Kinesis by splitting or merging shards Receives errorPolls a batch Receives error Receives success
  • 23. Best Practices Monitoring Kinesis Streams with Amazon Cloudwatch Metrics •GetRecords (effective throughput) : bytes, latency, records etc •PutRecord : bytes, latency, records, etc •GetRecords.IteratorAgeMilliseconds: how old your last processed records were. If high, processing is falling behind. If close to 24 hours, records are close to being dropped.
  • 24. Best Practices Monitoring Lambda functions •Monitoring: available in Amazon CloudWatch Metrics • Invocation count • Duration • Error count • Throttle count •Debugging: available in Amazon CloudWatch Logs • All Metrics • Custom logs • RAM consumed • Search for log events
  • 25. Best Practices Create different Lambda functions for each task, associate to same Kinesis stream Log to CloudWatch Logs Push to SNS
  • 26. Get Started: Data Processing with AWS Next Steps 1. Create your first Kinesis stream. Configure hundreds of thousands of data producers to put data into an Amazon Kinesis stream. Ex. data from Social media feeds. 2. Create and test your first Lambda function. Use any third party library, even native ones. First 1M requests each month are on us! 3. Read the Developer Guide, AWS Lambda and Kinesis Tutorial, and resources on GitHub at AWS Labs • http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html • https://github.com/awslabs/lambda-streams-to-firehose

Editor's Notes

  1. To elaborate a bit more on how Lambda works. You can trigger invocations through direct API / SDK calls – sync or async. You can also use various AWS Service integrators like S3 (an event whenever an entry is created/updated/deleted) or SNS, third party triggers, etc Event sources like kinesis or ddb update streams make it easy to perform real time data-driven auditing, analysis, and notification with Lambda. However, with the help of API Gateway to create HTTP endpoints powered by Lambda functions, you can create a set of back-end microservices that perform at scale.
  2. Focus on business logic, not infrastructure. Upload your code; AWS Lambda handles Capacity Scaling Deployment Monitoring Logging Web service front end Security patching Fine grained pricing Buy compute time in 100ms increments Low request charge No hourly, daily, or monthly minimums No per-device fees Never pay for idle Bring your own code Create threads and processes, run batch scripts or other executables, and read/write files in /tmp. Include any library with your Lambda function code, even native libraries.
  3. continuously capture and store terabytes of data from hundreds or thousands of sources. This might include website clickstreams, financial transactions, social media feeds, application logs, and location-tracking events. fully managed to scale in the AWS cloud to support incoming data traffic, store them for up to 24 hours by default, synchronously replicating across AWS datacenters, supporting replayable retrievals of that data, in order, a within seconds of a put. The Kinesis team also provides Kinesis Firehose to automatically poll and emit data from your kinesis stream to S3 and Redshift and Elasticsearch, and Analytics to analyze using SQL queries * With amazon’s kinesis client library, you can build kinesis applications yourself, retrieving and processing data to power dashboards or feed into machine learning algorithms, emitting to services like EMR as well
  4. Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure. Real-Time: Collect real-time data streams and promptly respond to key business events and operational triggers. Flexible: Choose the service, or combination of services, for your specific data streaming use cases.
  5. Limits per shard, but can scale to many shards Data is replicated synchronously across AZs for resiliency and durability Multiple applications can process from the stream concurrently in order at different rates Data is available to be processed within seconds of a put Processing constructs include checkpointing and shard iteratorg.
  6. The nature of both of these services makes them a good fit to use together. With Kinesis to capture your data and Lambda to process opening possibilities for users to run real time streaming data processing workloads. You can run audit checks on your ID application activity whether on AWS or your private environment and alert ID team on policy violations. You can pump your live metering records to a Kinesis data stream and use Lambda to condense these before pushing to Redshift for a longer term
  7. typical use-case is to use Lambda to “fan out” to other services, such as storage, database, and BI/analytics Directed acyclic graph processing is also possible using Kinesis and Lambda – functions emit results and put the results on a stream. Firehose does not have flexible processing logic, just redirects
  8. Social Media data streams like Twitter for hashtag #serverless goes to Kinesis stream. Kinesis stream hooks up to Lambda, and searching for Lambda when it finds a match pushes to DynamoDB and show dashboard last 60 seconds. Any doing tweet. Less than 20 dollars a month. Python code samples are needed.
  9. Always FIFO Trim Horizon (start of stream) latest on reading current data.
  10. Always FIFO Trim Horizon (start of stream) latest on reading current data.