SlideShare a Scribd company logo
1 of 62
AWS big data platform
agenda overview
10:00 AM Registration
10:30 AM Introduction to Big Data @ AWS
12:00 PM Lunch + Registration for Technical Sessions
12:30 PM Data Collection and Storage
1:45PM Real-time Event Processing
3:00PM Analytics (incl Machine Learning)
4:30 PM Open Q&A Roundtable
global footprint
Over 1 million active customers
across 190 countries
800+ government agencies
3,000+ educational institutions
11 regions
28 availability zones
52 edge locations
Everyday, AWS adds enough new server capacity to support
Amazon.com when it was a $7 billion global enterprise.
2014 laaS Magic Quadrant
“AWS is the overwhelming
market share leader, with
more than 5X the compute
capacity in use than the
aggregate total of the
other 14 providers.”
Enterprise
Applications
Virtual Desktop Sharing & Collaboration
Platform
Services
Analytics
Hadoop
Real-time
Streaming Data
Data
Warehouse
Data
Pipelines
App Services
Queuing &
Notifications
Workflow
App streaming
Transcoding
Email
Search
Deployment & Management
One-click web
app deployment
Dev/ops resource
management
Resource
Templates
Mobile Services
Identity
Sync
Mobile
Analytics
Push
Notifications
Administration
& Security
Identity
Management
Access
Control
Usage
Auditing
Key
Storage
Monitoring
And Logs
Core
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archival)
CDN
Databases
(Relational, NoSQL,
Caching)
Networking
(VPC, DX, DNS)
Infrastructure Regions Availability Zones Points of Presence
broad & deep core services
rich platform services
big data pipeline
Data Answers
Collect Process Analyze
Store
Collect Process Analyze
Store
primitive patterns
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
AWS Lambda
KCL Apps
EMR Redshift
Machine
Learning
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
S3
Kinesis
DynamoDB
RDS (Aurora)
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
data collection and storage
File: media, log files (sets of records)
Stream: records (eg: device stats)
Transactional: database reads/writes
AppsDevicesLoggingFrameworks
AWS services – data collection and storage
S3
Kinesis
DynamoDB
RDS (Aurora)
benefits of streamlined data collection
Increase velocity of data
• Upgrade existing applications to log records rather
than files – driven by need for greater agility
• Build new applications that are designed for
streaming data from the outset
Example:
S3
$0.030/GB-Mo
Redshift
Starts at
$0.25/hour
EC2
Starts at
$0.02/hour
Glacier
$0.010/GB-Mo
Kinesis
$0.015/shard 1MB/s in; 2MB/out
$0.028/million puts
500MM tweets/day = ~ 5,800 tweets/sec
2k/tweet is ~12MB/sec (~1TB/day)
$0.015/hour per shard, $0.028/million PUTS
Kinesis cost is $0.765/hour
Redshift cost is $0.850/hour (for a 2TB node)
S3 cost is $1.28/hour (no compression)
Total: $2.895/hour
cost &
scale
benefits of streamlined data collection
• Instrument existing applications
• Inject code to log activity – “new big data”
• Example: WAPO Labs Social Reader (now Trove)
Existing
Application
DynamoDB table(s)
GET calls & Queries
PUT calls
Query(…
PutItem(…
benefits of streamlined data collection
Increase data granularity
Customers Devices Data Items Item Size Frequency
Challenge: compounding scale
Benefit: improved data quality
primitive patterns
AWS Lambda
KCL Apps
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
event processing – enabling capabilities
S3 Event Notifications
Kinesis stream
DynamoDB Streams
AWS Lambda
KCL Apps
real-time event processing
• Event-driven programming
• Trigger activities based on real-time input
Examples:
 Proactively detect hardware errors in device logs
 Identify fraud from activity logs
 Monitor performance SLAs
 Notify when inventory drops below a threshold
benefits of event processing
• Build / add real-time events
 Take action between data collection and analytics
• Alerts and notifications, performance and security
• Automated data enrichment (eg: aggregations)
• De-couple application modules
 Streamline development and maintenance
 Increase agility
• MVP + iterate on discrete components
Collect | Store | Analyze
Alert
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
primitive patterns
EMR Redshift
Machine
Learning
NASDAQ
• 5.5B Records are loaded to
Amazon Redshift every day
• Security Requirements for
Client Side Encryption
• Historical Data - HDFS became
too expensive
 S3 + EMR to the Rescue
Retail and
POS Analytics
Process 10’s of TB
in hours vs. 2
weeks
80-90% reduction
in costs
big data use cases
Internet of Things
Digital Advertising
Online Gaming
Log Analytics
Customer Value Scoring
Personalization Engine
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
TempTracker
bee hive monitoring
in the AWS cloud
temperature
sensors
network card
waterproof
housing
Python (boto)
DynamoDBKinesis
App
Kinesis
ingestion
dashboard
Lambda
event
source
SNS
TempTracker: IoT sensor ingestion example
DynamoDB schema
hash range attributes
internal temperature
outside temperature
big data case study: Kaiten Sushiro
Kaiten Sushiro
• Kaiten Sushi Chain restaurant
• Gathering sensor data into Kinesis
Kaiten Sushiro data flow
2009 2010 2011 2012 2013
Move to AWS
cameras
Switch to
DynamoDB
IoT / connected devices
Simple video monitoring & security
Fast growth – “suddenly petabytes”
EC2 (live streaming)
S3 (CVR data)
DynamoDB (meta data)
CloudFront (CDN)
EMR (activity recognition)
applying analytics to
connected device data
VPC Subnet
MQTT Broker on
EC2 Instance
VPC
Internet
Gateway
EMR
Kinesis
DynamoDB
Redshift
Lambda
SNS
S3
Data Pipeline
backend analytics architecture for
connected device data
AWS big data ecosystem
S3
Kinesis
EMR
Redshift
Data Pipeline
DynamoDB
Collect Process & Analyze Visualize
AWS Professional Services
Partnering in Your Journey
Technical
Specialists
Specialty practices for
AWS skills transfer,
security, infrastructure
architecture,
application
optimization, analytics,
big data, and
operational integration
Advisory
Services
Portfolio strategy and
planning, cost/benefit
modeling, governance,
change management
and risk management
as it relates to
implementing the AWS
platform
Collaboration
Working together with
you and APN Premier
Partners you already
trust to provide you
with access to all
resources needed to
realize breakthrough
results
Proven
Process
Best practices and
patterns to help your
teams get the
foundation right, deploy
and migrate workloads,
and create a modern IT
operating model to
support your business
big data partner solutions
Solutions vetted by the AWS Partner Competency Program
Data
Enablement
Move, synchronize,
cleanse, and manage data
Data Analysis &
Visualization
Turn data into actionable
insight, enhance decision
making
Infrastructure
Intelligence
Harness data generated
from your systems and
infrastructure
Advanced
Analytics
Anticipate future events and
behaviors, conduct what-if
analysis
big data service offers
Service expertise vetted by the AWS Partner Competency Program
AWS marketplace
Advanced Analytics
Database and Data Enablement
Business Intelligence
1-click deployment to launch, on
multiple regions around the world
Pay-as-you-go pricing with no
long term contracts required
2,000+ product listings to
browse, test and buy software
Enterprise software store for business users who need simplified procurement
Amazon Machine
Learning
Amazon Aurora
smart applications
e-commerce: recommendations
made based on your past purchases
finance: alerts from your bank when
they suspect fraudulent transactions
retail: emails when items related to
things you typically buy are on sale
Amazon Machine Learning
1. Build & Train Model
 Create a datasource object (connect to Redshift, RDS, S3)
 Explore and understand your data
 Transform and train your model
2. Evaluate the Model & Optimize
 Assess model quality
 Fine-tune the model
3. Retrieve Predictions
 Batch: asynchronous, large volume prediction
 Real-time: synchronous, single-item prediction
Amazon Machine Learning
example use cases
• Fraud detection
• Demand forecasting
• Predictive customer support
• Click prediction
• Content personalization
• Document classification
Amazon Machine Learning
Currently Available in US-East-1
Amazon Aurora
Amazon’s New Relational Database Engine
a service-oriented architecture applied to
the database
• Moved the logging and storage layer
into a multi-tenant, scale-out
database-optimized storage service
• Integrated with other AWS services
like Amazon EC2, Amazon VPC,
Amazon DynamoDB, Amazon SWF,
and Amazon Route 53 for control
plane operations
• Integrated with Amazon S3 for
continuous backup with
99.999999999% durability
Control PlaneData Plane
Amazon
DynamoDB
Amazon SWF
Amazon Route 53
Logging + Storage
SQL
Transactions
Caching
Amazon S3
simplify data security
• Encryption to secure data at rest
 AES-256; hardware accelerated
 All blocks on disk and in Amazon S3 are encrypted
 Key management via AWS KMS
• SSL to secure data in transit
• Network isolation via Amazon VPC by default
• No direct access to nodes
• Supports industry standard security and data
protection certifications
Storage
SQL
Transactions
Caching
Amazon S3
Application
simplify storage management
• Read replicas are available as failover targets—no data loss
• Instantly create user snapshots—no performance impact
• Continuous, incremental backups to S3
• Automatic storage scaling up to 64 TB—no performance or
availability impact
• Automatic restriping, mirror repair, hot spot management,
encryption
Aurora storage
Highly available by default
• 6-way replication across 3 AZs
• 4 of 6 write quorum
 Automatic fallback to 3 of 4 if an AZ is unavailable
• 3 of 6 read quorum
SSD, scale-out, multi-tenant storage
• Seamless storage scalability
• Up to 64 TB database size
• Only pay for what you use
Log-structured storage
• Many small segments, each with
their own redo logs
• Log pages used to generate data pages
• Eliminates chatter between database and storage
SQL
Transactions
AZ 1 AZ 2 AZ 3
Caching
Amazon S3
self-healing, fault-tolerant
• Lose two copies or an AZ failure without read or write
availability impact
• Lose three copies without read availability impact
• Automatic detection, replication, and repair
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read and write availabilityRead availability
instant crash recovery
Traditional databases
• Have to replay logs since
the last checkpoint
• Single-threaded in
MySQL; requires a large
number of disk accesses
Amazon Aurora
• Underlying storage replays
redo records on demand as
part of a disk read
• Parallel, distributed,
asynchronous
Checkpointed Data Redo Log
Crash at T0 requires
a re-application of the
SQL in the redo log since
last checkpoint
T0 T0
Crash at T0 will result in redo
logs being applied to each segment
on demand, in parallel, asynchronously
write performance (console screenshot)
• MySQL Sysbench
• R3.8XL with 32 cores
and 244 GB RAM
• 4 client machines with
1,000 threads each
read performance (console screenshot)
• MySQL Sysbench
• R3.8XL with 32 cores
and 244 GB RAM
• Single client with
1,000 threads
read replica lag (console screenshot)
• Aurora Replica with 7.27 ms replica lag at 13.8 K updates/second
• MySQL 5.6 on the same hardware has ~2 s lag at 2 K updates/second
Aurora – current state
• Sign up for preview access at:
https://aws.amazon.com/rds/aurora/preview
• Now available in US West (Oregon) and EU (Ireland), in
addition to US East (N. Virginia)
• Thousands of customers already in the limited preview
• Unlimited preview: accepting all requests from late May
• Full service launch in the coming months
AWS big data platform
• Choice – platform breadth supports many use cases
• Specialization – optimal application experiences
• Managed Services – eliminate undifferentiated effort
S3
Kinesis
DynamoDB
RDS (Aurora)
AWS Lambda
KCL Apps
EMR Redshift
Machine
Learning
Thank you
Questions?

More Related Content

What's hot

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesCarole Gunst
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data PresentationMatthew Urdan
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 

Similar to The AWS Big Data Platform – Overview

AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?Software Guru
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Amazon Web Services
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsAmazon Web Services
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAmazon Web Services Korea
 

Similar to The AWS Big Data Platform – Overview (20)

AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?¿Quién es Amazon Web Services?
¿Quién es Amazon Web Services?
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

The AWS Big Data Platform – Overview

  • 1. AWS big data platform
  • 2. agenda overview 10:00 AM Registration 10:30 AM Introduction to Big Data @ AWS 12:00 PM Lunch + Registration for Technical Sessions 12:30 PM Data Collection and Storage 1:45PM Real-time Event Processing 3:00PM Analytics (incl Machine Learning) 4:30 PM Open Q&A Roundtable
  • 3. global footprint Over 1 million active customers across 190 countries 800+ government agencies 3,000+ educational institutions 11 regions 28 availability zones 52 edge locations Everyday, AWS adds enough new server capacity to support Amazon.com when it was a $7 billion global enterprise.
  • 4. 2014 laaS Magic Quadrant “AWS is the overwhelming market share leader, with more than 5X the compute capacity in use than the aggregate total of the other 14 providers.”
  • 5. Enterprise Applications Virtual Desktop Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Data Warehouse Data Pipelines App Services Queuing & Notifications Workflow App streaming Transcoding Email Search Deployment & Management One-click web app deployment Dev/ops resource management Resource Templates Mobile Services Identity Sync Mobile Analytics Push Notifications Administration & Security Identity Management Access Control Usage Auditing Key Storage Monitoring And Logs Core Services Compute (VMs, Auto-scaling and Load Balancing) Storage (Object, Block and Archival) CDN Databases (Relational, NoSQL, Caching) Networking (VPC, DX, DNS) Infrastructure Regions Availability Zones Points of Presence
  • 6. broad & deep core services
  • 8. big data pipeline Data Answers Collect Process Analyze Store
  • 9. Collect Process Analyze Store primitive patterns Data Collection and Storage Data Processing Event Processing Data Analysis
  • 10. primitive patterns S3 Kinesis DynamoDB RDS (Aurora) AWS Lambda KCL Apps EMR Redshift Machine Learning Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 11. primitive patterns S3 Kinesis DynamoDB RDS (Aurora) Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 12. data collection and storage File: media, log files (sets of records) Stream: records (eg: device stats) Transactional: database reads/writes AppsDevicesLoggingFrameworks
  • 13. AWS services – data collection and storage S3 Kinesis DynamoDB RDS (Aurora)
  • 14. benefits of streamlined data collection Increase velocity of data • Upgrade existing applications to log records rather than files – driven by need for greater agility • Build new applications that are designed for streaming data from the outset Example:
  • 15.
  • 17. 500MM tweets/day = ~ 5,800 tweets/sec 2k/tweet is ~12MB/sec (~1TB/day) $0.015/hour per shard, $0.028/million PUTS Kinesis cost is $0.765/hour Redshift cost is $0.850/hour (for a 2TB node) S3 cost is $1.28/hour (no compression) Total: $2.895/hour cost & scale
  • 18. benefits of streamlined data collection • Instrument existing applications • Inject code to log activity – “new big data” • Example: WAPO Labs Social Reader (now Trove) Existing Application DynamoDB table(s) GET calls & Queries PUT calls Query(… PutItem(…
  • 19. benefits of streamlined data collection Increase data granularity Customers Devices Data Items Item Size Frequency Challenge: compounding scale Benefit: improved data quality
  • 20. primitive patterns AWS Lambda KCL Apps Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 21. event processing – enabling capabilities S3 Event Notifications Kinesis stream DynamoDB Streams AWS Lambda KCL Apps
  • 22. real-time event processing • Event-driven programming • Trigger activities based on real-time input Examples:  Proactively detect hardware errors in device logs  Identify fraud from activity logs  Monitor performance SLAs  Notify when inventory drops below a threshold
  • 23. benefits of event processing • Build / add real-time events  Take action between data collection and analytics • Alerts and notifications, performance and security • Automated data enrichment (eg: aggregations) • De-couple application modules  Streamline development and maintenance  Increase agility • MVP + iterate on discrete components Collect | Store | Analyze Alert
  • 24. Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis primitive patterns EMR Redshift Machine Learning
  • 25. NASDAQ • 5.5B Records are loaded to Amazon Redshift every day • Security Requirements for Client Side Encryption • Historical Data - HDFS became too expensive  S3 + EMR to the Rescue
  • 26. Retail and POS Analytics Process 10’s of TB in hours vs. 2 weeks 80-90% reduction in costs
  • 27. big data use cases Internet of Things Digital Advertising Online Gaming Log Analytics Customer Value Scoring Personalization Engine Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  • 33. big data case study: Kaiten Sushiro
  • 34. Kaiten Sushiro • Kaiten Sushi Chain restaurant • Gathering sensor data into Kinesis
  • 36. 2009 2010 2011 2012 2013 Move to AWS cameras Switch to DynamoDB IoT / connected devices Simple video monitoring & security Fast growth – “suddenly petabytes”
  • 37. EC2 (live streaming) S3 (CVR data) DynamoDB (meta data) CloudFront (CDN) EMR (activity recognition)
  • 38. applying analytics to connected device data VPC Subnet MQTT Broker on EC2 Instance VPC Internet Gateway EMR Kinesis DynamoDB Redshift Lambda SNS S3 Data Pipeline
  • 39. backend analytics architecture for connected device data
  • 40. AWS big data ecosystem S3 Kinesis EMR Redshift Data Pipeline DynamoDB Collect Process & Analyze Visualize
  • 41. AWS Professional Services Partnering in Your Journey Technical Specialists Specialty practices for AWS skills transfer, security, infrastructure architecture, application optimization, analytics, big data, and operational integration Advisory Services Portfolio strategy and planning, cost/benefit modeling, governance, change management and risk management as it relates to implementing the AWS platform Collaboration Working together with you and APN Premier Partners you already trust to provide you with access to all resources needed to realize breakthrough results Proven Process Best practices and patterns to help your teams get the foundation right, deploy and migrate workloads, and create a modern IT operating model to support your business
  • 42. big data partner solutions Solutions vetted by the AWS Partner Competency Program Data Enablement Move, synchronize, cleanse, and manage data Data Analysis & Visualization Turn data into actionable insight, enhance decision making Infrastructure Intelligence Harness data generated from your systems and infrastructure Advanced Analytics Anticipate future events and behaviors, conduct what-if analysis
  • 43. big data service offers Service expertise vetted by the AWS Partner Competency Program
  • 44. AWS marketplace Advanced Analytics Database and Data Enablement Business Intelligence 1-click deployment to launch, on multiple regions around the world Pay-as-you-go pricing with no long term contracts required 2,000+ product listings to browse, test and buy software Enterprise software store for business users who need simplified procurement
  • 46. smart applications e-commerce: recommendations made based on your past purchases finance: alerts from your bank when they suspect fraudulent transactions retail: emails when items related to things you typically buy are on sale
  • 47. Amazon Machine Learning 1. Build & Train Model  Create a datasource object (connect to Redshift, RDS, S3)  Explore and understand your data  Transform and train your model 2. Evaluate the Model & Optimize  Assess model quality  Fine-tune the model 3. Retrieve Predictions  Batch: asynchronous, large volume prediction  Real-time: synchronous, single-item prediction
  • 48. Amazon Machine Learning example use cases • Fraud detection • Demand forecasting • Predictive customer support • Click prediction • Content personalization • Document classification
  • 49. Amazon Machine Learning Currently Available in US-East-1
  • 50. Amazon Aurora Amazon’s New Relational Database Engine
  • 51. a service-oriented architecture applied to the database • Moved the logging and storage layer into a multi-tenant, scale-out database-optimized storage service • Integrated with other AWS services like Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations • Integrated with Amazon S3 for continuous backup with 99.999999999% durability Control PlaneData Plane Amazon DynamoDB Amazon SWF Amazon Route 53 Logging + Storage SQL Transactions Caching Amazon S3
  • 52. simplify data security • Encryption to secure data at rest  AES-256; hardware accelerated  All blocks on disk and in Amazon S3 are encrypted  Key management via AWS KMS • SSL to secure data in transit • Network isolation via Amazon VPC by default • No direct access to nodes • Supports industry standard security and data protection certifications Storage SQL Transactions Caching Amazon S3 Application
  • 53. simplify storage management • Read replicas are available as failover targets—no data loss • Instantly create user snapshots—no performance impact • Continuous, incremental backups to S3 • Automatic storage scaling up to 64 TB—no performance or availability impact • Automatic restriping, mirror repair, hot spot management, encryption
  • 54. Aurora storage Highly available by default • 6-way replication across 3 AZs • 4 of 6 write quorum  Automatic fallback to 3 of 4 if an AZ is unavailable • 3 of 6 read quorum SSD, scale-out, multi-tenant storage • Seamless storage scalability • Up to 64 TB database size • Only pay for what you use Log-structured storage • Many small segments, each with their own redo logs • Log pages used to generate data pages • Eliminates chatter between database and storage SQL Transactions AZ 1 AZ 2 AZ 3 Caching Amazon S3
  • 55. self-healing, fault-tolerant • Lose two copies or an AZ failure without read or write availability impact • Lose three copies without read availability impact • Automatic detection, replication, and repair SQL Transaction AZ 1 AZ 2 AZ 3 Caching SQL Transaction AZ 1 AZ 2 AZ 3 Caching Read and write availabilityRead availability
  • 56. instant crash recovery Traditional databases • Have to replay logs since the last checkpoint • Single-threaded in MySQL; requires a large number of disk accesses Amazon Aurora • Underlying storage replays redo records on demand as part of a disk read • Parallel, distributed, asynchronous Checkpointed Data Redo Log Crash at T0 requires a re-application of the SQL in the redo log since last checkpoint T0 T0 Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously
  • 57. write performance (console screenshot) • MySQL Sysbench • R3.8XL with 32 cores and 244 GB RAM • 4 client machines with 1,000 threads each
  • 58. read performance (console screenshot) • MySQL Sysbench • R3.8XL with 32 cores and 244 GB RAM • Single client with 1,000 threads
  • 59. read replica lag (console screenshot) • Aurora Replica with 7.27 ms replica lag at 13.8 K updates/second • MySQL 5.6 on the same hardware has ~2 s lag at 2 K updates/second
  • 60. Aurora – current state • Sign up for preview access at: https://aws.amazon.com/rds/aurora/preview • Now available in US West (Oregon) and EU (Ireland), in addition to US East (N. Virginia) • Thousands of customers already in the limited preview • Unlimited preview: accepting all requests from late May • Full service launch in the coming months
  • 61. AWS big data platform • Choice – platform breadth supports many use cases • Specialization – optimal application experiences • Managed Services – eliminate undifferentiated effort S3 Kinesis DynamoDB RDS (Aurora) AWS Lambda KCL Apps EMR Redshift Machine Learning

Editor's Notes

  1. So, I wanted to get started by taking a look at how the AWS business is progressing. We now have over 1 million active customers, this is non-Amazon customers with AWS account usage activity in the past month. TALKING POINTS We define an “active customer” as non-Amazon customers who have account usage activity within the past month To support global business, we maintain 11 regions across the US, South America, Europe (Ireland and Germany), Japan, China, Singapore, and Australia. We count hundreds of thousands of customers across 190 countries This includes over 800 government agencies and over 3,000 educational institutions Scale and capacity matter. Every day, we add enough new server capacity to support Amazon.com when it was a $7B global enterprise.
  2. 500MM tweets/day = 5,800 tweets/second 2.5k/tweet =
  3. 500MM tweets/day = 5,800 tweets/second 2.5k/tweet =
  4. 12 MB/s Cost/hour Shard 1 $0.015 Requests 1000000 $0.028 Total Cost 12 shards $0.180 Hourly request $0.585 $0.765 $4.65 6.081293158 $5.25 for a peppermint mocha frappucino. You can run this app for an hour and 48 minutes per frappucino Network Transfer Rate: if you pull 1TB down per day, cost is $0.09/GB = $92.16/day.
  5.   The Nasdaq Group has been a user of Amazon Redshift since it was released and they are extremely happy with it. We’ve discussed our usage of that system at re:Invent several times, the most recent of which wasFIN401 Seismic Shift: Nasdaq’s Migration to Amazon Redshift. Currently, our system is moving an average of 5.5 billion rows into Amazon Redshift every day (14 billion on a peak day in October of 2014). In addition to our Amazon Redshift data warehouse, we have a large historical data footprint that we would like to access as a single, gigantic data set. Currently, this historical archive is spread across a large number of disparate systems, making it difficult to use. Our aim for a new, unified warehouse platform is two-fold: increase the accessibility of this historic data set to a growing number of internal groups at Nasdaq and to gain cost efficiencies in the process. For this platform, Hadoop is a clear choice: it supports a number of different SQL and other interfaces for accessing data and has an active and growing ecosystem of tools and projects.
  6. The AWS Partner Network (APN) is the global partner program for AWS. It is focused on helping our thousands of partners build successful AWS-based businesses by providing great technical and GTM support. APN Big Data Competency Partners have demonstrated technical and customer success in building solutions to help customers work with data productively, at any scale. Big data partner solutions are available across key big data use cases and solution areas, including: data enablement, data analysis/visualization, infrastructure intelligence, and advanced analytics.
  7. The APN Competency Program is designed to provide you with qualified APN Partners who have demonstrated technical proficiency and proven success in specialized solutions and vertical areas. Partners who’ve attained an APN Competency offer a variety of validated services and solutions on the AWS Cloud. We have a number of consulting partners that provide specific services and delivery capabilities around big data.
  8. Key talking points: No BI/Big data products solves all the scenarios and use cases, go to https://aws.amazon.com/marketplace/ to find a wide variety of analytical solutions. No need for multiple sales contract!!! Just Chose your type of EC2, No delays on using the product Pay-as you go. No long term commitment Success story One example is that a data scientist from Philips went into the Marketplace and processed their 37 million records in less than an hour on Saturday night. The consulting company estimated that the same task would take 3 months of engagement to deliver.  
  9. Control plane applies to storage Use Icons Another Box Collapsing The relational database reinvented using a service-oriented approach Make the layers of the database scale out and multi-tenant. Started with storage Retained drop-in compatibility with MySQL 5.6. Your existing applications should just work
  10. Kill storage label