SlideShare a Scribd company logo
1 of 38
Download to read offline
November 13, 2014 | Las Vegas, NV 
Adi Krishnan, Sr. Product Manager Amazon Kinesis
Scenarios 
Accelerated Ingest-Transform-Load 
Continual Metrics/ KPI Extraction 
Responsive Data Analysis 
Data Types 
IT infrastructure,Applications logs, Social media, Fin.Market data, Web Clickstreams, Sensors, Geo/Location data 
Digital AdTech./ Marketing 
Advertising Data aggregation 
Advertising metrics like coverage,yield, conversion 
Analytics on Userengagement with Ads, Optimized bid/ buy engines 
Software/ Technology 
IT server , App logs ingestion 
IT operational metrics dashboards 
Devices/ Sensor Operational Intelligence 
Financial Services 
Market/ Financial Transaction order data collection 
Financial market data metrics 
Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data 
ConsumerOnline/ 
E-Commerce 
Onlinecustomer engagement data aggregation 
Consumerengagement metrics like page views, CTR 
Customer clickstream analytics, Recommendation engines 
Scenarios Across Industry Segments 
1 
2 
3
Amazon KinesisManaged Service for streaming data ingestion, and processing 
Amazon Web ServicesAZAZAZDurable, highly consistent storage replicates dataacross three data centers (availability zones) Aggregate andarchive to S3Millions ofsources producing100s of terabytesper hourFrontEndAuthenticationAuthorizationOrdered streamof events supportsmultiple readersReal-timedashboardsand alarmsMachine learningalgorithms or sliding windowanalyticsAggregate analysisin Hadoop or adata warehouseInexpensive: $0.028 per million puts
Real-time Ingest 
•Highly Scalable 
•Durable 
•Elastic 
•Replay-able Reads 
Continuous Processing FX 
•Elastic 
•Load-balancing incoming streams 
•Fault-tolerance, Checkpoint / Replay 
•Enable multiple processing apps in parallel 
Enable data movement into Stores/ Processing Engines 
Managed Service 
Low end-to-end latency
Kinesis Stream Managed Ability To Capture And Store Data
Putting Data into Kinesis 
Simple Put interface to store data in Kinesis
Best Practices: Putting Data in KinesisDetermine Your Partition Key Strategy 
•Kinesis as a managed buffer or a streaming map- reduce 
•Ensure a high cardinality for Partition Keys with respect to shards, to prevent a “hot shard” problem 
–Generate Random Partition Keys 
•Streaming Map-Reduce: Leverage Partition Keys for business specific logic as applicable 
–Partition Key per billing customer, per DeviceId, per stock symbol
Best Practices: Putting Data in KinesisProvisioning Adequate Shards 
•For ingress needs 
•Egress needs for all consuming applications: If more than 2 simultaneous consumers 
•Include head-room for catching up with data in stream in the event of application failures
Best Practices: Putting Data in KinesisPre-Batch before Puts for better efficiency
# KINESIS appender 
log4j.logger.KinesisLogger=INFO, KINESIS 
log4j.additivity.KinesisLogger=false 
log4j.appender.KINESIS=com.amazonaws.services.kinesis.log4j. KinesisAppender 
# DO NOT use a trailing %n unless you want a newline to be transmitted to KINESIS after every message 
log4j.appender.KINESIS.layout=org.apache.log4j.PatternLayout 
log4j.appender.KINESIS.layout.ConversionPattern=%m 
# mandatory properties for KINESIS appender 
log4j.appender.KINESIS.streamName=testStream 
#optional, defaults to UTF-8 
log4j.appender.KINESIS.encoding=UTF-8 
#optional, defaults to 3 
log4j.appender.KINESIS.maxRetries=3 
#optional, defaults to 2000 
log4j.appender.KINESIS.bufferSize=1000 
#optional, defaults to 20 
log4j.appender.KINESIS.threadCount=20 
#optional, defaults to 30 seconds 
log4j.appender.KINESIS.shutdownTimeout=30https://github.com/awslabs/kinesis-log4j- appender 
Best Practices: Putting Data in KinesisPre-Batch before Puts for better efficiency
•Retry if rise in input rate is temporary 
•Reshardto increase number of shards 
•Monitor CloudWatch metrics: PutRecord.Bytesand GetRecords.Bytesmetrics keep track of shard usage 
Metric 
Units 
PutRecord.Bytes 
Bytes 
PutRecord.Latency 
Milliseconds 
PutRecord.Success 
Count 
•Keep track of your metrics 
•Log hashkeyvalues generated by your partition keys 
•Log Shard-Ids 
•Determine which Shard receive the most (hashkey) traffic. 
StringshardId= putRecordResult.getShardId(); 
putRecordRequest.setPartitionKey 
(String.format( "myPartitionKey"));
Options: 
•stream-name -The name of the Stream to be scaled 
•scaling-action -The action to be taken to scale. Must be one of "scaleUp”, "scaleDown" or “resize" 
•count -Number of shards by which to absolutely scale up or down, or resize to or: 
•pct-Percentage of the existing number of shards by which to scale up or down 
https://github.com/awslabs/amazon- kinesis-scaling-utils
Sending & Reading Data from Kinesis Streams 
HTTP Post 
AWS SDK 
LOG4J 
Flume 
Fluentd 
Get* APIs 
Kinesis Client 
Library 
+ 
Connector Library 
Apache 
Storm 
Amazon Elastic 
MapReduce 
Sending Consuming 
AWS Mobile 
SDK
Building Kinesis Applications: Kinesis Client LibraryOpen Source library for fault-tolerant, continuous processing apps 
•Java client library, also available for Python Developers 
•Source available on Github 
•Build app with Kinesis Client Library 
•Deploy on your set of EC2 instances 
•Every KCL application includes these components: 
•Record processor factory: Creates the record processor 
•Record processor: The processing unit that processes data from a shard of a Kinesis stream 
•Worker: The processing unit that maps to each application instance
•The KCL uses the IRecordProcessor interface to communicate with your application 
•A Kinesis application must implement the KCL's IRecordProcessor interface 
•Contains the business logic for processing the data retrieved from the Kinesis stream
•One record processor maps to one shard and processes data records from that shard 
•One worker maps to one or more record processors 
•Balances shard-worker associations when worker / instance counts change 
•Balances shard-worker associations when shards split or merge
Moving data into Amazon S3, Redshift
Amazon Kinesis Connector LibraryCustomizable, Open Source Apps to Connect Kinesis with S3, Redshift, DynamoDB 
ITransformer 
•Defines the transformation of records from the Amazon Kinesis stream in order to suit the user- defined data model 
IFilter 
•Excludes irrelevant records from the processing. 
IBuffer 
•Buffers the set of records to be processed by specifying size limit (# of records)& total byte count 
IEmitter 
•Makes client calls to other AWS services and persists the records stored in the buffer. 
S3 
DynamoDB 
Redshift 
Kinesis
S3 Dynamo 
DB 
Redshift 
Kinesis 
Amazon Kinesis Connectors 
• S3 Connector 
– Batch writes files for archive into S3 
– Uses sequence-based file naming scheme 
• Redshift Connector 
– Once written to S3, loads to Redshift 
– Provides manifest support 
– Supports user defined transformers 
• DynamoDB Connector 
– BatchPut appends to a table 
– Supports user defined transformers
Best Practices: Processing Data From KinesisBuild applications as part of an Auto Scaling group 
•Simply helps with application availability 
•Scales in response to incoming spikes in-data volume, assuming Shards have been provisioned 
•Select scaling metrics based on nature of Kinesis application 
–Instance metrics: CPU, Memory, and others 
–Kinesis Metrics: PutRecord.Bytes, GetRecord.Bytes
Metric 
Units 
PutRecord.Bytes 
Bytes 
PutRecord.Latency 
Milliseconds 
PutRecord.Success 
Count 
GetRecords.Bytes 
Bytes 
GetRecords.IteratorAge 
Milliseconds 
GetRecords.Latency 
Milliseconds 
Getrecords.Success 
Count
Best Practices: Processing Data From KinesisBuild an flush-to-S3 consumer app 
•App can specify three conditions that can trigger a buffer flush: 
–Number of records 
–Total byte count 
–Time since last flush 
•The buffer is flushed and the data is emitted to the destination when any of these thresholds is crossed. 
# Flush when buffer exceeds 8 Kinesis records, 1 KB size limit or when time since last emit exceeds 10 minutes 
bufferSizeByteLimit = 1024 
bufferRecordCountLimit = 8 
bufferMillisecondsLimit = 600000
Best Practices: Processing Data From Kinesis 
•In KCL app, ensure data being processed is persisted to durable store like DynamoDB, or S3, prior to check-pointing. 
•Duplicates: Make the authoritative data repository (usually at the end of the data flow) resilient to duplicates. That way the rest of the system has a simple policy –keep retrying until you succeed. 
•Idempotent Processing: Use number of records since previous checkpoint, to get repeatable results when the record processors fail over.
•Creates a manifest file based on a custom set of input files 
•Use a manifest stream with only one shard 
•Adjust checkpoint frequency, connector buffer and filter to align with your redshift load models 
Best Practices: Processing Data From Kinesis
Amazon Kinesis Customer Scenarios
Collect all data of interest continuously
Faster time to market due to ease of deployment
Enable operators, partners get to valuable data quickly
http://bit.ly/awsevals

More Related Content

What's hot

What's hot (20)

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory Reporting
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Big data on aws
Big data on awsBig data on aws
Big data on aws
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
 

Viewers also liked

AWS Future Building Blocks - Werner Vogels - berlin 2010
AWS Future Building Blocks - Werner Vogels - berlin 2010AWS Future Building Blocks - Werner Vogels - berlin 2010
AWS Future Building Blocks - Werner Vogels - berlin 2010
Amazon Web Services
 
Best practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud frontBest practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud front
Amazon Web Services
 
AWS Summit Tel Aviv - Startup Track - Backend Use Cases
AWS Summit Tel Aviv - Startup Track - Backend Use CasesAWS Summit Tel Aviv - Startup Track - Backend Use Cases
AWS Summit Tel Aviv - Startup Track - Backend Use Cases
Amazon Web Services
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data Analytics
Amazon Web Services
 

Viewers also liked (20)

AWS Innovate 2016 : Closing Keynote - Glenn Gore
AWS Innovate 2016 : Closing Keynote - Glenn GoreAWS Innovate 2016 : Closing Keynote - Glenn Gore
AWS Innovate 2016 : Closing Keynote - Glenn Gore
 
AWS Innovate: Smart Deployment on AWS - Andy Kim
AWS Innovate: Smart Deployment on AWS - Andy KimAWS Innovate: Smart Deployment on AWS - Andy Kim
AWS Innovate: Smart Deployment on AWS - Andy Kim
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
AWS Innovate: Infrastructure Automation on AWS - Seungdo Yang
AWS Innovate: Infrastructure Automation on AWS - Seungdo YangAWS Innovate: Infrastructure Automation on AWS - Seungdo Yang
AWS Innovate: Infrastructure Automation on AWS - Seungdo Yang
 
AWS Innovate 2016- Planning a Phased Cloud Migration Strategy - Abhishek Mah...
AWS Innovate 2016- Planning a Phased Cloud Migration Strategy - Abhishek  Mah...AWS Innovate 2016- Planning a Phased Cloud Migration Strategy - Abhishek  Mah...
AWS Innovate 2016- Planning a Phased Cloud Migration Strategy - Abhishek Mah...
 
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
 
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Future Building Blocks - Werner Vogels - berlin 2010
AWS Future Building Blocks - Werner Vogels - berlin 2010AWS Future Building Blocks - Werner Vogels - berlin 2010
AWS Future Building Blocks - Werner Vogels - berlin 2010
 
Best practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud frontBest practices for content delivery using amazon cloud front
Best practices for content delivery using amazon cloud front
 
AWS Summit Tel Aviv - Startup Track - Backend Use Cases
AWS Summit Tel Aviv - Startup Track - Backend Use CasesAWS Summit Tel Aviv - Startup Track - Backend Use Cases
AWS Summit Tel Aviv - Startup Track - Backend Use Cases
 
From Development to Production
From Development to ProductionFrom Development to Production
From Development to Production
 
MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012
 
Modern Security and Compliance Through Automation
Modern Security and Compliance Through AutomationModern Security and Compliance Through Automation
Modern Security and Compliance Through Automation
 
Dev ops on aws deep dive on continuous delivery - Toronto
Dev ops on aws deep dive on continuous delivery - TorontoDev ops on aws deep dive on continuous delivery - Toronto
Dev ops on aws deep dive on continuous delivery - Toronto
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data Analytics
 
AWS Canberra WWPS Summit 2013 - AWS for Web Applications
AWS Canberra WWPS Summit 2013 - AWS for Web ApplicationsAWS Canberra WWPS Summit 2013 - AWS for Web Applications
AWS Canberra WWPS Summit 2013 - AWS for Web Applications
 
0. series overview
0. series overview0. series overview
0. series overview
 
Managing an Enterprise Class Hybrid Architecture
Managing an Enterprise Class Hybrid ArchitectureManaging an Enterprise Class Hybrid Architecture
Managing an Enterprise Class Hybrid Architecture
 

Similar to (SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014

찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 

Similar to (SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014 (20)

AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014

  • 1. November 13, 2014 | Las Vegas, NV Adi Krishnan, Sr. Product Manager Amazon Kinesis
  • 2.
  • 3.
  • 4. Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis Data Types IT infrastructure,Applications logs, Social media, Fin.Market data, Web Clickstreams, Sensors, Geo/Location data Digital AdTech./ Marketing Advertising Data aggregation Advertising metrics like coverage,yield, conversion Analytics on Userengagement with Ads, Optimized bid/ buy engines Software/ Technology IT server , App logs ingestion IT operational metrics dashboards Devices/ Sensor Operational Intelligence Financial Services Market/ Financial Transaction order data collection Financial market data metrics Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data ConsumerOnline/ E-Commerce Onlinecustomer engagement data aggregation Consumerengagement metrics like page views, CTR Customer clickstream analytics, Recommendation engines Scenarios Across Industry Segments 1 2 3
  • 5.
  • 6. Amazon KinesisManaged Service for streaming data ingestion, and processing Amazon Web ServicesAZAZAZDurable, highly consistent storage replicates dataacross three data centers (availability zones) Aggregate andarchive to S3Millions ofsources producing100s of terabytesper hourFrontEndAuthenticationAuthorizationOrdered streamof events supportsmultiple readersReal-timedashboardsand alarmsMachine learningalgorithms or sliding windowanalyticsAggregate analysisin Hadoop or adata warehouseInexpensive: $0.028 per million puts
  • 7. Real-time Ingest •Highly Scalable •Durable •Elastic •Replay-able Reads Continuous Processing FX •Elastic •Load-balancing incoming streams •Fault-tolerance, Checkpoint / Replay •Enable multiple processing apps in parallel Enable data movement into Stores/ Processing Engines Managed Service Low end-to-end latency
  • 8.
  • 9. Kinesis Stream Managed Ability To Capture And Store Data
  • 10. Putting Data into Kinesis Simple Put interface to store data in Kinesis
  • 11. Best Practices: Putting Data in KinesisDetermine Your Partition Key Strategy •Kinesis as a managed buffer or a streaming map- reduce •Ensure a high cardinality for Partition Keys with respect to shards, to prevent a “hot shard” problem –Generate Random Partition Keys •Streaming Map-Reduce: Leverage Partition Keys for business specific logic as applicable –Partition Key per billing customer, per DeviceId, per stock symbol
  • 12. Best Practices: Putting Data in KinesisProvisioning Adequate Shards •For ingress needs •Egress needs for all consuming applications: If more than 2 simultaneous consumers •Include head-room for catching up with data in stream in the event of application failures
  • 13. Best Practices: Putting Data in KinesisPre-Batch before Puts for better efficiency
  • 14. # KINESIS appender log4j.logger.KinesisLogger=INFO, KINESIS log4j.additivity.KinesisLogger=false log4j.appender.KINESIS=com.amazonaws.services.kinesis.log4j. KinesisAppender # DO NOT use a trailing %n unless you want a newline to be transmitted to KINESIS after every message log4j.appender.KINESIS.layout=org.apache.log4j.PatternLayout log4j.appender.KINESIS.layout.ConversionPattern=%m # mandatory properties for KINESIS appender log4j.appender.KINESIS.streamName=testStream #optional, defaults to UTF-8 log4j.appender.KINESIS.encoding=UTF-8 #optional, defaults to 3 log4j.appender.KINESIS.maxRetries=3 #optional, defaults to 2000 log4j.appender.KINESIS.bufferSize=1000 #optional, defaults to 20 log4j.appender.KINESIS.threadCount=20 #optional, defaults to 30 seconds log4j.appender.KINESIS.shutdownTimeout=30https://github.com/awslabs/kinesis-log4j- appender Best Practices: Putting Data in KinesisPre-Batch before Puts for better efficiency
  • 15. •Retry if rise in input rate is temporary •Reshardto increase number of shards •Monitor CloudWatch metrics: PutRecord.Bytesand GetRecords.Bytesmetrics keep track of shard usage Metric Units PutRecord.Bytes Bytes PutRecord.Latency Milliseconds PutRecord.Success Count •Keep track of your metrics •Log hashkeyvalues generated by your partition keys •Log Shard-Ids •Determine which Shard receive the most (hashkey) traffic. StringshardId= putRecordResult.getShardId(); putRecordRequest.setPartitionKey (String.format( "myPartitionKey"));
  • 16. Options: •stream-name -The name of the Stream to be scaled •scaling-action -The action to be taken to scale. Must be one of "scaleUp”, "scaleDown" or “resize" •count -Number of shards by which to absolutely scale up or down, or resize to or: •pct-Percentage of the existing number of shards by which to scale up or down https://github.com/awslabs/amazon- kinesis-scaling-utils
  • 17. Sending & Reading Data from Kinesis Streams HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Consuming AWS Mobile SDK
  • 18.
  • 19. Building Kinesis Applications: Kinesis Client LibraryOpen Source library for fault-tolerant, continuous processing apps •Java client library, also available for Python Developers •Source available on Github •Build app with Kinesis Client Library •Deploy on your set of EC2 instances •Every KCL application includes these components: •Record processor factory: Creates the record processor •Record processor: The processing unit that processes data from a shard of a Kinesis stream •Worker: The processing unit that maps to each application instance
  • 20. •The KCL uses the IRecordProcessor interface to communicate with your application •A Kinesis application must implement the KCL's IRecordProcessor interface •Contains the business logic for processing the data retrieved from the Kinesis stream
  • 21. •One record processor maps to one shard and processes data records from that shard •One worker maps to one or more record processors •Balances shard-worker associations when worker / instance counts change •Balances shard-worker associations when shards split or merge
  • 22. Moving data into Amazon S3, Redshift
  • 23. Amazon Kinesis Connector LibraryCustomizable, Open Source Apps to Connect Kinesis with S3, Redshift, DynamoDB ITransformer •Defines the transformation of records from the Amazon Kinesis stream in order to suit the user- defined data model IFilter •Excludes irrelevant records from the processing. IBuffer •Buffers the set of records to be processed by specifying size limit (# of records)& total byte count IEmitter •Makes client calls to other AWS services and persists the records stored in the buffer. S3 DynamoDB Redshift Kinesis
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. S3 Dynamo DB Redshift Kinesis Amazon Kinesis Connectors • S3 Connector – Batch writes files for archive into S3 – Uses sequence-based file naming scheme • Redshift Connector – Once written to S3, loads to Redshift – Provides manifest support – Supports user defined transformers • DynamoDB Connector – BatchPut appends to a table – Supports user defined transformers
  • 29. Best Practices: Processing Data From KinesisBuild applications as part of an Auto Scaling group •Simply helps with application availability •Scales in response to incoming spikes in-data volume, assuming Shards have been provisioned •Select scaling metrics based on nature of Kinesis application –Instance metrics: CPU, Memory, and others –Kinesis Metrics: PutRecord.Bytes, GetRecord.Bytes
  • 30. Metric Units PutRecord.Bytes Bytes PutRecord.Latency Milliseconds PutRecord.Success Count GetRecords.Bytes Bytes GetRecords.IteratorAge Milliseconds GetRecords.Latency Milliseconds Getrecords.Success Count
  • 31. Best Practices: Processing Data From KinesisBuild an flush-to-S3 consumer app •App can specify three conditions that can trigger a buffer flush: –Number of records –Total byte count –Time since last flush •The buffer is flushed and the data is emitted to the destination when any of these thresholds is crossed. # Flush when buffer exceeds 8 Kinesis records, 1 KB size limit or when time since last emit exceeds 10 minutes bufferSizeByteLimit = 1024 bufferRecordCountLimit = 8 bufferMillisecondsLimit = 600000
  • 32. Best Practices: Processing Data From Kinesis •In KCL app, ensure data being processed is persisted to durable store like DynamoDB, or S3, prior to check-pointing. •Duplicates: Make the authoritative data repository (usually at the end of the data flow) resilient to duplicates. That way the rest of the system has a simple policy –keep retrying until you succeed. •Idempotent Processing: Use number of records since previous checkpoint, to get repeatable results when the record processors fail over.
  • 33. •Creates a manifest file based on a custom set of input files •Use a manifest stream with only one shard •Adjust checkpoint frequency, connector buffer and filter to align with your redshift load models Best Practices: Processing Data From Kinesis
  • 35. Collect all data of interest continuously
  • 36. Faster time to market due to ease of deployment
  • 37. Enable operators, partners get to valuable data quickly