SlideShare a Scribd company logo
1 of 34
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Erik Swensson, Shree Kenghe & Erick Dame
January 26, 2016
Getting Started with Big Data
Analytic Options on AWS & Common Use Cases
Table of Contents
• Big Data Introduction for AWS
• Big Data Analytics Option on AWS
• Usage Patterns & Anti-Patterns
• Performance & Cost
• Durability & Scalability
• Interfaces
• Building Big Data Analytic Solutions – The AWS Approach
• Example Scenarios
Big Data on AWS
Immediate Availability. Deploy instantly. No hardware to
procure, no infrastructure to maintain & scale
Trusted & Secure. Designed to meet the strictest
requirements. Continuously audited, including certifications
such as ISO 27001, FedRAMP, DoD CSM, and PCI DSS.
Broad & Deep Capabilities. Over 50 services and 100s of
features to support virtually any big data application &
workload
Hundreds of Partners & Solutions. Get help from a
consulting partner or choose from hundreds of tools and
applications across the entire data management stack.
Real-time
Amazon Kinesis Firehose
Object Storage
Amazon S3
RDBMS
Amazon RDS
NoSQL
DynamoDB
Hadoop Ecosystem
Amazon EMR
Real-time
AWS Lambda
Amazon Kinesis Analytics
Data Warehousing
Amazon Redshift
Machine Learning
Amazon Machine
Learning
Business Intelligence &
Data Visualization
Amazon QuickSight
Real-time
Amazon Kinesis Streams
Elastic Search Analytics
Amazon ElasticSearch
Collect Store Process & Analyze Visualize
Data Import
Amazon Import/Export
Snowball
IoT
Amazon IoT
Broad, Tightly Integrated Capabilities
Petabyte scale
Massively parallel
Relational data warehouse
Fully managed, zero admin
As low as $1,000/TB/Year
a lot faster
a lot cheaper
a whole lot simpler
Amazon Redshift
Amazon Redshift
• Ideal Usage Patterns - Analyze
• Sales data
• Historical data
• Gaming data
• Social trends
• Ad data
• Performance
• Massively Parallel Processing
• Columnar Storage
• Data Compression
• Zone Maps
• Direct-attached Storage
• Cost model
• No upfront costs or long term commitments
• Free backup storage equivalent to 100% of
provisioned storage
With columnar storage, you
only read the data you need
Amazon Redshift
• Scalability & Elasticity
• Resize or scale - Number or type of nodes
can be changed with a few clicks
• Durability and Availability
• Replication
• Backup
• Automated recovery from failed drives &
nodes
• Interfaces
• JDBC/ODBC interface with BI/ETL tools
• Amazon S3 or DynamoDB
• Anti-patterns
• Small datasets
• OLTP
• Unstructured Data
• Blob Data
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Ingest streaming data
Process data in real-time
Store terabytes of data per hour
Amazon Kinesis
Amazon Kinesis Streams
• Ideal Usage Patterns – Streaming
data ingestion and processing
• Real-time data analytics
• Data feed intake and processing e.g. logs
• Real-time metrics and reporting
• Performance
• Throughput capacity in terms of shards
• Cost model
• No upfront costs or long term
commitments
• Pay as you go pricing
• Hourly charge per shard
• Charge for 1 million PUT transactions
Amazon Kinesis Streams
• Scalability & Elasticity
• Scale – increase number of shards
• Durability and Availability
• Replication
• Cursor preservation
• Interfaces
• Input – data coming in
• Output – data going out
• Kinesis Firehose
• Anti-patterns
• Small scale consistent throughput
• Long term data storage and analytics
Launch a cluster in minutes
Pay by the hour and save with spot
MapReduce, Apache Spark, Presto
Amazon EMR
Amazon EMR
• Ideal Usage Patterns
• Log processing and analytics
• Large ETL and data movement
• Risk modeling and threat analytics
• Ad targeting and click stream analytics
• Genomics
• Predictive analytics
• Ad-hoc data mining and analytics
• Performance – driven by
• Type of instance
• Number of instances
• Cost model
• Only pay for hours the cluster is up
• EC2 instance and EMR price
Amazon EMR
• Scalability & Elasticity
• Resize a running cluster
• Add more core or task nodes
• Durability and Availability
• Fault tolerant for slave node (HDFS)
• Backup to S3 for resilience against master
node failures
• Interfaces
• Hive, Pig, Spark, Hbase, Impala, Hunk,
Presto, other popular tools
• Anti-patterns
• Small data sets
• ACID (Atomicity, Consistency, Isolation and
Durability)
Amazon EMR Cluster
Amazon EMR Cluster
Amazon EMR Cluster
Fully managed NoSQL database
Single-Digit Millisecond latency at scale
Supports document and key-value
Amazon
DynamoDB
Amazon DynamoDB
• Ideal Usage Patterns
• Mobile apps, gaming, digital ad serving, live
voting, sensor networks, log ingestion
• Access control for web-based content, e-
commerce shopping carts
• Web session management
• Performance
• SSD
• Provision throughput by table
• Scalability & Elasticity
• No limit to the amount of data stored
• Dial-up or dial-down the read and write capacity
of a table
• Cost Model
• Pay as you go
• Provisioned throughput capacity (per hour)
• Indexed data storage (per GB per month)
• Data transfer in or out (per GB per month)
 Provisioned read/write performance per table.
 Predictable high performance scaled via console or API
Amazon DynamoDB
• Durability and Availability
• Three Availability Zones (AZ)
• Interfaces
• AWS Management Console
• API’s
• SDK’s
• Anti-patterns
• Application tied to traditional relational
database
• Joins and or complex transactions
• BLOB data
• Large data with low I/O rate
AZ-A
AZ-B
AZ-C
Managed service designed to make it easy
for developers of all levels to use machine
learning
Based on the same ML technology used for
years by Amazon’s internal data scientists
Amazon Machine Learning uses scalable
and robust implementations of industry-
standard ML algorithms
Amazon
Machine Learning
Amazon Machine Learning
• Ideal Usage Patterns
• Enable applications that flag suspicious
transactions
• Personalize application content
• Predict user activity
• Listen to social media
• Cost Model
• Pay for what you use
• No need to manage instances, only pay for
the service
• Performance
• Real-time predictions designed to return
within 100ms
• 200 transactions can be handled per second
by default (can be raised)
Amazon Machine Learning
• Durability and Availability
• No maintenance windows or scheduled
downtimes
• Designed across multiple availability
zones
• Scalability & Elasticity
• Model training up to 100GB
• Multiple ML jobs can run simultaneously
• Interfaces
• Create a data source from S3, RDS and
Redshift
• Interact with ML via console, SDKs, and
the ML API
• Anti-patterns
• Massive Data Sets for modeling >
100GB
• Sequence prediction or unsupervised
clustering task
Event driven, fully managed
compute
No Infrastructure to Manage
Automatic Scaling
AWS Lambda
AWS Lambda
• Ideal Usage Patterns
• Real-time file processing
• Extract, Transform, Load
• Performance
• Process events within
milliseconds
• Cost Model
• Pay for what you use
• No need to manage instances,
only pay for the service
• Lambda free tier includes 1M free
requests
1 2 3
Serverless Event-Driven Scale Subsecond Billing
AWS Lambda
• Durability and Availability
• No maintenance windows or
scheduled downtime
• Async functions are retried 3 times if
there is a failure
• Scalability & Elasticity
• Any number of concurrent functions that
can be run
• AWS Lambda will dynamically allocate
capacity to match the rate of incoming
events.
• Interfaces
• Lambda supports Java, Node.js, and
Python
• Trigger via event or schedule
• Anti-patterns
• Long running applications
• Stateful applications in Lambda
Setup Elasticsearch cluster in minutes
Integrated with Logstash and Kibana
Scale Elasticsearch cluster seamlessly
Amazon
Elasticsearch
Service
Amazon Elasticsearch
• Ideal Usage Patterns
• Analyze logs
• Analyze data stream updates from other AWS
services
• Provide customers a rich search and navigation
experience
• Usage monitoring for mobile applications
• Performance
• Depends on multiple factors including instance
type, workload, index, number of shards used, read
replicas
• Storage configurations –instance storage or EBS
storage
• Cost Model
• Pay as you go
• Only pay for compute and storage
Amazon Elasticsearch
• Durability and Availability
• Zone Awareness
• Automatic and Manual snapshots
• Scalability & Elasticity
• Add or remove instances
• Modify EBS volumes for data growth
• Interfaces
• AWS Management Console
• API’s
• SDK’s
• Kibana and Logstash (ELK Stack)
• Anti-patterns
• OLTP
• Workloads needing larger than 5TB of
storage requirements
Elasticsearch + Logstash + Kibana =
real-time analytics & visualization
Build visualizations
Perform ad-hoc analysis
Share and collaborate via storyboards
Native access on major mobile platforms
Amazon
QuickSight
Introducing Amazon QuickSight
Cloud-Powered Business Intelligence Service For
1/10th the Cost of Traditional BI Software
 No IT effort. No dimensional modeling
 Auto-discovery of all AWS data sources
 Super-fast, Parallel, In-memory Calculation Engine
(SPICE)
 Fully Managed
Available in Preview
aws.amazon.com/quicksight
Scale up or down as needed
Pay for what you use
Multiple options
Do-it-yourself big data applications
Amazon EC2
The AWS Approach
• Flexible Use the best tool for the job
• Data structure, latency, throughput, access patterns
• Scalable Immutable (append-only)
• Batch/speed/serving layer
• Minimum Admin Overhead Leverage AWS managed services
• No or very low admin
• Low Cost Big data ≠ big cost
Scenario 1: Enterprise Data Warehouse
Scenario 2: Capturing and Analyzing Sensor Data
Scenario 3: Sentiment Analysis of Social Media
Big Data
Scenarios
Scenario 1: Enterprise Data Warehouse
Data Warehouse Architecture
Data
Sources
Amazon
S3
Amazon
EMR
Amazon
S3
Amazon
Redshift
Amazon
QuickSight
Scenario 2: Capturing and Analyzing Sensor Data
Data
Sources
Amazon
S3
Amazon
Redshift
Amazon
QuickSight
Amazon
Kinesis
Enabled
App
Amazon
Kinesis
Enabled
App
Amazon
DynamoDB
Reporting
Dashboard
Customer
Access
Amazon
Kinesis
1
2 3 4 5
6 7 8 9
Scenario 3: Sentiment Analysis of Social Media
Social
Media Data
Amazon
EC2
Amazon
Lambda
Amazon
ML
Amazon
Kinesis
Amazon
S3
Amazon
SNS
1 2 4 5 6
3 7
Next Steps
• Subscribe to the AWS Big Data Blog
blogs.aws.amazon.com/bigdata
• Learn more, check the tutorials, guides, and self-paced labs
aws.amazon.com/big-data
• Register for the next Big Data Webinar
Building Smart Applications with Amazon Machine Learning
aws.amazon.com/about-aws/events/monthlywebinarseries
Thu, Jan 28 2016 | 9AM PST

More Related Content

Viewers also liked

AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
 
Managing Your Infrastructure as Code
Managing Your Infrastructure as CodeManaging Your Infrastructure as Code
Managing Your Infrastructure as CodeAmazon Web Services
 
AWS re:Invent 2016: State of the Union: Containers (CON316)
AWS re:Invent 2016: State of the Union:  Containers (CON316)AWS re:Invent 2016: State of the Union:  Containers (CON316)
AWS re:Invent 2016: State of the Union: Containers (CON316)Amazon Web Services
 
AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)
AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)
AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)Amazon Web Services
 
AWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar SeriesAWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar SeriesAmazon Web Services
 
AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)
AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)
AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)Amazon Web Services
 
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...Amazon Web Services
 
AWS January 2016 Webinar Series - Introduction to Docker on AWS
AWS January 2016 Webinar Series - Introduction to Docker on AWSAWS January 2016 Webinar Series - Introduction to Docker on AWS
AWS January 2016 Webinar Series - Introduction to Docker on AWSAmazon Web Services
 
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)Amazon Web Services
 
AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...
AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...
AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...Amazon Web Services
 
AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...
AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...
AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...Amazon Web Services
 
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...Amazon Web Services
 
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)Amazon Web Services
 
AWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the CloudAWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the CloudAmazon Web Services
 
AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...
AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...
AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...Amazon Web Services
 
AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...
AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...
AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...Amazon Web Services
 
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWSAWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWSAmazon Web Services
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...Amazon Web Services
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...Amazon Web Services
 

Viewers also liked (20)

AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 
Managing Your Infrastructure as Code
Managing Your Infrastructure as CodeManaging Your Infrastructure as Code
Managing Your Infrastructure as Code
 
AWS re:Invent 2016: State of the Union: Containers (CON316)
AWS re:Invent 2016: State of the Union:  Containers (CON316)AWS re:Invent 2016: State of the Union:  Containers (CON316)
AWS re:Invent 2016: State of the Union: Containers (CON316)
 
AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)
AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)
AWS re:Invent 2016: Chalk Talk: Succeeding at Infrastructure-as-Code (GPSCT312)
 
AWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar SeriesAWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar Series
 
AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)
AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)
AWS re:Invent 2016: Getting Started with Docker on AWS (CMP209)
 
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
AWS re:Invent 2016: From EC2 to ECS: How Capital One uses Application Load Ba...
 
AWS as a Data Platform
AWS as a Data PlatformAWS as a Data Platform
AWS as a Data Platform
 
AWS January 2016 Webinar Series - Introduction to Docker on AWS
AWS January 2016 Webinar Series - Introduction to Docker on AWSAWS January 2016 Webinar Series - Introduction to Docker on AWS
AWS January 2016 Webinar Series - Introduction to Docker on AWS
 
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
 
AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...
AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...
AWS re:Invent 2016: Creating Your Virtual Data Center: VPC Fundamentals and C...
 
AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...
AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...
AWS re:Invent 2016: Simplifying Microsoft Architectures with AWS services (WI...
 
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
 
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)
 
AWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the CloudAWS Lambda: Event-driven Code for Devices and the Cloud
AWS Lambda: Event-driven Code for Devices and the Cloud
 
AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...
AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...
AWS re:Invent 2016: Building the Future of DevOps with Amazon Web Services (D...
 
AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...
AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...
AWS re:Invent 2016: Workshop: Deploy a Swift Web Application on Amazon ECS (C...
 
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWSAWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
AWS January 2016 Webinar Series - Introduction to Deploying Applications on AWS
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
 

Similar to AWS January 2016 Webinar Series - Getting Started with Big Data on AWS

Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinAmazon Web Services
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinIan Massingham
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Amazon Web Services
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-SourceAmazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
HSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSHSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSAmazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)Amazon Web Services
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 

Similar to AWS January 2016 Webinar Series - Getting Started with Big Data on AWS (20)

Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-Source
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
HSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSHSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWS
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

AWS January 2016 Webinar Series - Getting Started with Big Data on AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Erik Swensson, Shree Kenghe & Erick Dame January 26, 2016 Getting Started with Big Data Analytic Options on AWS & Common Use Cases
  • 2. Table of Contents • Big Data Introduction for AWS • Big Data Analytics Option on AWS • Usage Patterns & Anti-Patterns • Performance & Cost • Durability & Scalability • Interfaces • Building Big Data Analytic Solutions – The AWS Approach • Example Scenarios
  • 3. Big Data on AWS Immediate Availability. Deploy instantly. No hardware to procure, no infrastructure to maintain & scale Trusted & Secure. Designed to meet the strictest requirements. Continuously audited, including certifications such as ISO 27001, FedRAMP, DoD CSM, and PCI DSS. Broad & Deep Capabilities. Over 50 services and 100s of features to support virtually any big data application & workload Hundreds of Partners & Solutions. Get help from a consulting partner or choose from hundreds of tools and applications across the entire data management stack.
  • 4. Real-time Amazon Kinesis Firehose Object Storage Amazon S3 RDBMS Amazon RDS NoSQL DynamoDB Hadoop Ecosystem Amazon EMR Real-time AWS Lambda Amazon Kinesis Analytics Data Warehousing Amazon Redshift Machine Learning Amazon Machine Learning Business Intelligence & Data Visualization Amazon QuickSight Real-time Amazon Kinesis Streams Elastic Search Analytics Amazon ElasticSearch Collect Store Process & Analyze Visualize Data Import Amazon Import/Export Snowball IoT Amazon IoT Broad, Tightly Integrated Capabilities
  • 5. Petabyte scale Massively parallel Relational data warehouse Fully managed, zero admin As low as $1,000/TB/Year a lot faster a lot cheaper a whole lot simpler Amazon Redshift
  • 6. Amazon Redshift • Ideal Usage Patterns - Analyze • Sales data • Historical data • Gaming data • Social trends • Ad data • Performance • Massively Parallel Processing • Columnar Storage • Data Compression • Zone Maps • Direct-attached Storage • Cost model • No upfront costs or long term commitments • Free backup storage equivalent to 100% of provisioned storage With columnar storage, you only read the data you need
  • 7. Amazon Redshift • Scalability & Elasticity • Resize or scale - Number or type of nodes can be changed with a few clicks • Durability and Availability • Replication • Backup • Automated recovery from failed drives & nodes • Interfaces • JDBC/ODBC interface with BI/ETL tools • Amazon S3 or DynamoDB • Anti-patterns • Small datasets • OLTP • Unstructured Data • Blob Data 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 8. Ingest streaming data Process data in real-time Store terabytes of data per hour Amazon Kinesis
  • 9. Amazon Kinesis Streams • Ideal Usage Patterns – Streaming data ingestion and processing • Real-time data analytics • Data feed intake and processing e.g. logs • Real-time metrics and reporting • Performance • Throughput capacity in terms of shards • Cost model • No upfront costs or long term commitments • Pay as you go pricing • Hourly charge per shard • Charge for 1 million PUT transactions
  • 10. Amazon Kinesis Streams • Scalability & Elasticity • Scale – increase number of shards • Durability and Availability • Replication • Cursor preservation • Interfaces • Input – data coming in • Output – data going out • Kinesis Firehose • Anti-patterns • Small scale consistent throughput • Long term data storage and analytics
  • 11. Launch a cluster in minutes Pay by the hour and save with spot MapReduce, Apache Spark, Presto Amazon EMR
  • 12. Amazon EMR • Ideal Usage Patterns • Log processing and analytics • Large ETL and data movement • Risk modeling and threat analytics • Ad targeting and click stream analytics • Genomics • Predictive analytics • Ad-hoc data mining and analytics • Performance – driven by • Type of instance • Number of instances • Cost model • Only pay for hours the cluster is up • EC2 instance and EMR price
  • 13. Amazon EMR • Scalability & Elasticity • Resize a running cluster • Add more core or task nodes • Durability and Availability • Fault tolerant for slave node (HDFS) • Backup to S3 for resilience against master node failures • Interfaces • Hive, Pig, Spark, Hbase, Impala, Hunk, Presto, other popular tools • Anti-patterns • Small data sets • ACID (Atomicity, Consistency, Isolation and Durability) Amazon EMR Cluster Amazon EMR Cluster Amazon EMR Cluster
  • 14. Fully managed NoSQL database Single-Digit Millisecond latency at scale Supports document and key-value Amazon DynamoDB
  • 15. Amazon DynamoDB • Ideal Usage Patterns • Mobile apps, gaming, digital ad serving, live voting, sensor networks, log ingestion • Access control for web-based content, e- commerce shopping carts • Web session management • Performance • SSD • Provision throughput by table • Scalability & Elasticity • No limit to the amount of data stored • Dial-up or dial-down the read and write capacity of a table • Cost Model • Pay as you go • Provisioned throughput capacity (per hour) • Indexed data storage (per GB per month) • Data transfer in or out (per GB per month)  Provisioned read/write performance per table.  Predictable high performance scaled via console or API
  • 16. Amazon DynamoDB • Durability and Availability • Three Availability Zones (AZ) • Interfaces • AWS Management Console • API’s • SDK’s • Anti-patterns • Application tied to traditional relational database • Joins and or complex transactions • BLOB data • Large data with low I/O rate AZ-A AZ-B AZ-C
  • 17. Managed service designed to make it easy for developers of all levels to use machine learning Based on the same ML technology used for years by Amazon’s internal data scientists Amazon Machine Learning uses scalable and robust implementations of industry- standard ML algorithms Amazon Machine Learning
  • 18. Amazon Machine Learning • Ideal Usage Patterns • Enable applications that flag suspicious transactions • Personalize application content • Predict user activity • Listen to social media • Cost Model • Pay for what you use • No need to manage instances, only pay for the service • Performance • Real-time predictions designed to return within 100ms • 200 transactions can be handled per second by default (can be raised)
  • 19. Amazon Machine Learning • Durability and Availability • No maintenance windows or scheduled downtimes • Designed across multiple availability zones • Scalability & Elasticity • Model training up to 100GB • Multiple ML jobs can run simultaneously • Interfaces • Create a data source from S3, RDS and Redshift • Interact with ML via console, SDKs, and the ML API • Anti-patterns • Massive Data Sets for modeling > 100GB • Sequence prediction or unsupervised clustering task
  • 20. Event driven, fully managed compute No Infrastructure to Manage Automatic Scaling AWS Lambda
  • 21. AWS Lambda • Ideal Usage Patterns • Real-time file processing • Extract, Transform, Load • Performance • Process events within milliseconds • Cost Model • Pay for what you use • No need to manage instances, only pay for the service • Lambda free tier includes 1M free requests 1 2 3 Serverless Event-Driven Scale Subsecond Billing
  • 22. AWS Lambda • Durability and Availability • No maintenance windows or scheduled downtime • Async functions are retried 3 times if there is a failure • Scalability & Elasticity • Any number of concurrent functions that can be run • AWS Lambda will dynamically allocate capacity to match the rate of incoming events. • Interfaces • Lambda supports Java, Node.js, and Python • Trigger via event or schedule • Anti-patterns • Long running applications • Stateful applications in Lambda
  • 23. Setup Elasticsearch cluster in minutes Integrated with Logstash and Kibana Scale Elasticsearch cluster seamlessly Amazon Elasticsearch Service
  • 24. Amazon Elasticsearch • Ideal Usage Patterns • Analyze logs • Analyze data stream updates from other AWS services • Provide customers a rich search and navigation experience • Usage monitoring for mobile applications • Performance • Depends on multiple factors including instance type, workload, index, number of shards used, read replicas • Storage configurations –instance storage or EBS storage • Cost Model • Pay as you go • Only pay for compute and storage
  • 25. Amazon Elasticsearch • Durability and Availability • Zone Awareness • Automatic and Manual snapshots • Scalability & Elasticity • Add or remove instances • Modify EBS volumes for data growth • Interfaces • AWS Management Console • API’s • SDK’s • Kibana and Logstash (ELK Stack) • Anti-patterns • OLTP • Workloads needing larger than 5TB of storage requirements Elasticsearch + Logstash + Kibana = real-time analytics & visualization
  • 26. Build visualizations Perform ad-hoc analysis Share and collaborate via storyboards Native access on major mobile platforms Amazon QuickSight
  • 27. Introducing Amazon QuickSight Cloud-Powered Business Intelligence Service For 1/10th the Cost of Traditional BI Software  No IT effort. No dimensional modeling  Auto-discovery of all AWS data sources  Super-fast, Parallel, In-memory Calculation Engine (SPICE)  Fully Managed Available in Preview aws.amazon.com/quicksight
  • 28. Scale up or down as needed Pay for what you use Multiple options Do-it-yourself big data applications Amazon EC2
  • 29. The AWS Approach • Flexible Use the best tool for the job • Data structure, latency, throughput, access patterns • Scalable Immutable (append-only) • Batch/speed/serving layer • Minimum Admin Overhead Leverage AWS managed services • No or very low admin • Low Cost Big data ≠ big cost
  • 30. Scenario 1: Enterprise Data Warehouse Scenario 2: Capturing and Analyzing Sensor Data Scenario 3: Sentiment Analysis of Social Media Big Data Scenarios
  • 31. Scenario 1: Enterprise Data Warehouse Data Warehouse Architecture Data Sources Amazon S3 Amazon EMR Amazon S3 Amazon Redshift Amazon QuickSight
  • 32. Scenario 2: Capturing and Analyzing Sensor Data Data Sources Amazon S3 Amazon Redshift Amazon QuickSight Amazon Kinesis Enabled App Amazon Kinesis Enabled App Amazon DynamoDB Reporting Dashboard Customer Access Amazon Kinesis 1 2 3 4 5 6 7 8 9
  • 33. Scenario 3: Sentiment Analysis of Social Media Social Media Data Amazon EC2 Amazon Lambda Amazon ML Amazon Kinesis Amazon S3 Amazon SNS 1 2 4 5 6 3 7
  • 34. Next Steps • Subscribe to the AWS Big Data Blog blogs.aws.amazon.com/bigdata • Learn more, check the tutorials, guides, and self-paced labs aws.amazon.com/big-data • Register for the next Big Data Webinar Building Smart Applications with Amazon Machine Learning aws.amazon.com/about-aws/events/monthlywebinarseries Thu, Jan 28 2016 | 9AM PST

Editor's Notes

  1. Follow Up Email
  2. Amazon https://www.youtube.com/watch?v=P4KPPvEb_QI Generates weblogs @ 2TB/day, growing 67% YoY Oracle RAC legacy system Scan rate: 1 week of data/hour Hit RAC node limit of 32 nodes More data => Slower queries Migrated to Redshift Scan rate: 15 months of data (2.25 trillion rows) in 14 min Scaled to a 101 node DS1.8XL cluster – Petabytes More than 10X performance 21B rows joined with 10B rows in under 2 hours from days security, HasOffers loads 60M rows per day in 2 min intervals, Desk: high concurrency user facing portal (read/write cluster), Amazon.com/NTT PB scale. Pinterest saw 50-100x speed ups when moved 300TB from Hadoop to Redshift. Nokia saw 50% reduction in costs. https://www.youtube.com/watch?v=O4wAH5FQjS8
  3. 30 Million Ad opportunities per month.
  4. Yelp uses Amazon S3 to store daily logs and photos, generating around 1.2TB of logs per day. The company also uses Amazon EMR to power approximately 20 separate batch scripts, most of those processing the logs. Features powered by Amazon Elastic MapReduce include: Yelp developers advise others working with AWS to use the boto API as well as mrjob to ensure full utilization of Amazon Elastic MapReduce job flows. Yelp runs approximately 250 Amazon Elastic MapReduce jobs per day, processing 30TB of data and is grateful for AWS Support that helped with their Hadoop application development.
  5. Dropcam - Dropcam runs video streaming and storage servers on Amazon EC2 and Amazon S3, and uses Amazon DynamoDB to scale and maintain throughput. “DynamoDB grows with the number of cameras that are connected to the service,” says Nelson. “Throughput is very steady as cameras come online. By using DynamoDB, we reduced delivery time for video events to less than 50 milliseconds,” says Nelson.
  6. Dropcam - Dropcam runs video streaming and storage servers on Amazon EC2 and Amazon S3, and uses Amazon DynamoDB to scale and maintain throughput. “DynamoDB grows with the number of cameras that are connected to the service,” says Nelson. “Throughput is very steady as cameras come online. By using DynamoDB, we reduced delivery time for video events to less than 50 milliseconds,” says Nelson.
  7. Build Fax - Uses Amazon Machine Learning to provide roof-age and job-cost estimations for insurers and builders, with property-specific values that don’t need to rely on broad, ZIP code-level estimates. Models that previously took six months or longer to create are now complete in four weeks or less. Creates opportunities for new data analytics services that BuildFax can offer to customers, such as text analysis in Amazon ML to estimate job costs with 80 percent accuracy.
  8. VidRoll - AWS Lambda enables NoOps, allowing us to start and stay at scale without having to worry about infrastructure. As an exponential organization, it is critical that our developers focus on innovation. Lambda frees us from ever having to code for issues like concurrency, distributed file systems and other ‘success problems’ that typically present themselves when systems need to scale. We save time and money with Lambda.
  9. Amazon Elasticsearch service allows you to easily and securely deploy and scale an ELK stack in minutes. Integration with Logstash is tightly coupled and a Kibana instance is automatically configured for you. The service automatically detects and replaces failed Elasticsearch nodes, reducing the overhead associated with self-managed infrastructure and Elasticsearch software.
  10. https://aws.amazon.com/solutions/case-studies/major-league-baseball-mlbam/ Major League Baseball Advanced Media, L.P, which operates MLB.com, uses Elasticsearch extensively on its advanced game day statistics application. “Elasticsearch allows us to easily and quickly build bleeding edge big data and analytics applications using the ELK stack.” said Sean Curtis, Architect at MLB.com. “By offering direct access to the Elasticsearch API while offloading administrative tasks, Amazon Swift gives us the manageability, flexibility and control we need.”
  11. Before we go into solving the Big architecture, I want to introduce some “tried and test” architecture principles. Here at AWS we believe you should be using the right tool for the job – “instead of using a big swiss army knife for using a screw dreive, it will be best to use a screw drive - this is especially important for big data architectures. We’ll talk about this more. Decoupled architecture http://whatis.techtarget.com/definition/decoupled-architecture - In general, a decoupled architecture is a framework for complex work that allows components to remain completely autonomous and unaware of each other…this has been tried and battle test. Managed services – this is relatively now - Should I install Cassandra or MongoDB or CouchDB on AWS. You obviously can. Sometimes there are good reasons for doing this. Many customers still do this. Netflix is a great example. They run a multi-region Cassandra and are a poster child for how to do this. But for most customers, delegating this task to AWS makes more sense….you are better of spending your time on building features for your customers rather than building highly scalable distributed systems. Lambda Architecture -