SlideShare a Scribd company logo
1 of 49
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ben Snively
Specialist Solutions Architect,
Demo: Kate Werling
Solutions Architect
Big Data Meets AI
Driving Insights and Adding Intelligence to Your Solutions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we’ll cover
• Big Data and why organization care
• Common Challenge - Which,What,Hows…
• Demonstration
• Big Data Driving Machine Learning
• Demonstration
• Final Design Tenants
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VisualizationVariability
Big Data Is Defined Many Different Ways
Volume Velocity Variety Veracity Value
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://www.promptcloud.com
https://john-popelaars.blogspot.com
https://ww.signiant.com
https://www.linkedin.com/pulse/world-today-data-rich-information-poor-guru-p-mohapatra-pmp/
What do the analysts say?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Organizations that successfully generate business
value from their data will outperform their peers.An
Aberdeen survey saw organizations who
implemented a data lake outperforming similar
companies by 9% in organic revenue growth.*
24%
15%
Leaders Followers
Organic revenue growth
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Most Important: DrivingValue from Data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Is Changing  Analytics Are Adopting
Capture and store
new data at PB-EB
scale
Do new type of analytics
in a cost effective way
• Machine learning
• Big data processing
• Real-time analytics
• Full-text search
New types of
analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and nonrelational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Lakes from AWS
Analytics
• Unmatched durability, and availability at EB scale
• Best security, compliance, and audit capabilities
• Object-level controls for fine-grain access
• Fastest performance by retrieving subsets of data
• The most ways to bring data in
• 2x as many integrations with partners
• Analyze with broadest set of analytics & ML services
Machine
learning
Real-time dataOn-premises
Data Lake
on AWS
movementdata movement
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Managed ML Service
Deep Learning AMIs
Video and Image Recognition
Conversational Interfaces
Deep-Learning Video Camera
Natural Language Processing
Language Translation
Speech Recognition
Text-to-Speech
Interactive Analysis
Hadoop & Spark
Data Warehousing
Full-text search
Real-time analytics
Dashboards & Visualizations
Dedicated Network connection
Secure appliances
Ruggedized Shipping Container
Database migration
Connect Devices to AWS
Real-time Data Streams
Real-time Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
Data Lakes, Analytics, and IoT Portfolio from AWS
Broadest, deepest set of analytic services
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena Kinesis RDS
Central Storage
Secure, cost-effective
Storage in Amazon S3
S3
Snowball Database Migration
Service
Kinesis Firehose Direct Connect
Data Ingestion
Get your data into S3
Quickly and securely
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Security Token
Service
CloudWatch CloudTrail Key Management
Service
Data Lake Components
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight
Central Storage
Secure, cost-effective
Storage in Amazon S3
Metadata User Access
Security/Governance
Data Movement Analytics and Machine Learning
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena Kinesis RDS
Central Storage
Secure, cost-effective
Storage in Amazon S3
S3
Snowball Database Migration
Service
Kinesis Firehose Direct Connect
Data Ingestion
Get your data into S3
Quickly and securely
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Security Token
Service
CloudWatch CloudTrail Key Management
Service
Data Lake Components
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight A
Central Storage
Secure, cost-effective
Storage in Amazon S3
Glue ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common Big Data Challenges
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
WHICH tool should I use?
One tool that doesn’t do any
thing very well..
Organized suite of tools that do
each purpose very well..
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Purpose-built engines.
Right tool for the right
job.
WHICH tool should I use?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
WHICH tool should I use?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
WHAT Data Do I Have?
Gartner:
“Through 2018, 80% of data lakes will not include effective metadata
management capabilities, making them inefficient."
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Job AuthoringData Catalog Job Execution
Apache Hive Metastore compatible
Integrated with AWS services
Automatic crawling
Discover
Auto-generates ETL code
Python and Apache Spark
Edit, debug, and share
Develop
Serverless execution
Flexible scheduling
Monitoring and alerting
Deploy
WHAT Data Do I Have? – AWS Glue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On-premises data
Web app data
Amazon RDS
Other databases
Streaming data
Your data
AMAZON
QUICKSIGHT
WHAT Data Do I Have?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Other Ways to populate your Data Catalog
Call the AWS Glue CreateTable API
Create table manually Run Hive DDL statement
Apache Hive
Metastore
AWS GLUE ETL AWS GLUE
DATA CATALOG
Import from Apache Hive Metastore
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MOST Important: Selecting an Agile Framework
Start with a tool that will serve the purpose
Experiment, Test, Iterate, Adopt.
Let’s look at an example:
HOW can I get started?
Evolution of Netflix Data pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregate and upload events to
Hadoop/Hive for batch processing
EXPERIMENT new things
Batch  Batch+ Real-time
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chukwa front-end  Kafka
Kafka front-endKafka
ADOPT your solution
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Amazon Kinesis Streams processes multiple terabytes of log
data each day, yet events show up in our analytics in seconds,”
Bennett. “We can discover and respond to issues in real time,
ensuring high availability and a great customer experience.””
FOCUS on business value
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Select environment that allows you to try out different tools.
Focus on tools that all you to do as much focusing on analytics as possible…
AGILITY for the business
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agility of Analytics
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agility - Hadoop/Spark Analytics
• Distributed processing
• Diverse analytics
• Batch/Script (Hive/Pig)
• Interactive (Spark, Presto)
• Real-time (Spark)
• Machine Learning (Spark)
• NoSQL (HBase)
• For many use cases
• Log and clickstream analysis
• Machine learning
• Real-time analytics
• Large-scale analytics
• Genomics
• ETL
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agility - Hadoop/Spark Analytics on AWS
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
Amazon S3
Amazon EMR
Managed Hadoop/Spark
Object Storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 – Source of Truth, Multiple Clusters
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
HDF
S
EC2 Instance Memory
Intermediates
stored on local disk
or HDFS
Loca
l
HDF
S
EC2 Instance Memory
Intermediates
stored on local disk
or HDFS
Loca
l
Transient ETL Job
Source of Truth
HDFS
HDFS
HDFS
Local Intermediate
HDFS/Storage
Local Intermediate
HDFS/Storage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fitting this into the Common Data Catalog
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
Transient ETL Job
Source of Truth
HDFS
Describes Data in S3
MySQL DB
instance
Customershaveoptions
Glue Data
Catalog
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena is an interactive query
service that makes it easy to analyze data
directly from Amazon S3 using Standard
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demonstration
Data Catalog and Analyzing your
Data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine Learning and Big Data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Big Data driving Machine Learning
Better
Decisions
Object Storage
Databases
Data warehouse
Streaming analytics
BI
Hadoop
Spark/Presto
Elasticsearch
Better
Products Machine Learning
Deep Learning/ AI
More
Users
More
Data
Click stream
User activity
Generated content
Purchases
Clicks
Likes
Sensor data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agility in Machine Learning
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis
Amazon QuickSight
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
AWS IoT Core
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
Data Lake
on AWS
Storage | Archival Storage | Data Catalog
AnalyticsMachine learning
Real-time dataOn-premises movementdata movement
Machine Learning requires new tools and interfaces
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
Catalog
Central
Storage
Sagemaker
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agility in Machine Learning – for all users
Application
Services
• Designed for Application Developers
• Solution-oriented Prebuilt Models Available via APIs
• Image Analysis, Text-to-Speech, Conversational UX
Platforms
• Designed for Data Scientists to Address Common Needs
• Fully Managed Platform for Model Building
• Reduces the Heavy Lifting in Model Building & Deployment
Frameworks
• Designed for Data Scientists to Address Advanced / Emerging Needs
• Provides Maximum Flexibility to develop on the leading AI Frameworks
• Enables Expert AI Systems to be Developed & Deployed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Digital Globe – Using ML to Find the Right Data
Data lake:
• 100 PB of data in cloud
• Optimize storage tiers
Solution:
• Optimize their data lake
storage, cut costs in half
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FINRA - Data Is Central to Our Mission
Reconstruct the market from trillions of
events
• Data from broker-dealers and exchanges
• Equities, Options, Fixed Income
• Build a graph of market order events
Analyze the data looking for financial
fraud
• Insider trading, layering, cross-product
manipulation, front running & many more
• Looking for a needle in a haystack
4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FINRA - From data puddles to Data Lake
Database1
Storage
Query/Compute
Catalog
Database2
Storage
Query/Compute
Catalog
Databasen
Storage
Query/Compute
Catalog
Storage
Query/
Compute
Catalog
EMR Spark LambdaEMR Presto EMR HBase
herd Hive
metastore
FINRA in Data Center FINRA in AWS
Scales Silo
Amazon
S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabled Machine Learning on their Data Lake
Data
Scientist
Logical ‘Database’
EMR Cluster
Still one copy
of data!
Spark Cluster
DS-in-a-box
AuthN
Data
Scientist
Data
Scientist
Catalog
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
UDSP – Inventory – not just R
• R 3.2.5, Python (2.7.12 and 3.4.3)
• Packages
• R: 300+ Python: 100+
• Tools for Building Packages
• gcc, gfortran, make, java, maven, ant…
• IDEs
• Jupyter, RStudio Server
• Deep Learning
• CUDA, CuDNN (if GPU present)
• Theano, Caffe, Torch
• TensorFlow
16
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine Learning Demonstration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Final Design Tenants/Best Practices
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Core Tenants
• Loose Coupling, but highly performant
• Storage, Analytics, Metadata Management, etc..
• Future proof your analytics
• Choosing the best tool for the job
• Elasticity and multiple clusters for dedicated purposes
• Replace capacity planning with a consumption model
• Don’t forget metadata management
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use the right Storage Tier
Data structure → Fixed schema, JSON, key-value
Access patterns → Store data in the format you will access it
Data characteristics → Hot, warm, cold
Cost → Right cost
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session survey in
the mobile summit app!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)
Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)
Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)Amazon Web Services
 
The Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons LearnedThe Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons LearnedAmazon Web Services
 
Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018
Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018
Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...
How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...
How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...Amazon Web Services
 
Sicurezza e conformità al GDPR con AWS
Sicurezza e conformità al GDPR con AWSSicurezza e conformità al GDPR con AWS
Sicurezza e conformità al GDPR con AWSAmazon Web Services
 
What IT Transformation Really Means for the Enterprise
What IT Transformation Really Means for the EnterpriseWhat IT Transformation Really Means for the Enterprise
What IT Transformation Really Means for the EnterpriseTom Laszewski
 
Lessons Learned Scaling Your Talent Transformation
Lessons Learned Scaling Your Talent TransformationLessons Learned Scaling Your Talent Transformation
Lessons Learned Scaling Your Talent TransformationAmazon Web Services
 
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018Amazon Web Services
 
Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018
Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018
Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018Amazon Web Services
 
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...Amazon Web Services
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Boaz Ziniman
 
人工智能 (AI) 與機器學習概覽 (Level 200)
人工智能 (AI) 與機器學習概覽 (Level 200)人工智能 (AI) 與機器學習概覽 (Level 200)
人工智能 (AI) 與機器學習概覽 (Level 200)Amazon Web Services
 

What's hot (20)

Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)
Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)
Introduction to the Security Perspective of the Cloud Adoption Framework (CAF)
 
The Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons LearnedThe Future of Enterprise IT - Lessons Learned
The Future of Enterprise IT - Lessons Learned
 
Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018
Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018
Transforming your Business Ops Team for Cloud - AWS Summit Sydney 2018
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
Cloud Journey & Lessons Learnt
Cloud Journey & Lessons LearntCloud Journey & Lessons Learnt
Cloud Journey & Lessons Learnt
 
Amazon Container Services
Amazon Container ServicesAmazon Container Services
Amazon Container Services
 
Evolving Security in AWS
Evolving Security in AWSEvolving Security in AWS
Evolving Security in AWS
 
How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...
How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...
How to Enable Single Sign On to Multiple AWS Accounts and Business Applicatio...
 
DevOps: The Amazon Story
DevOps: The Amazon StoryDevOps: The Amazon Story
DevOps: The Amazon Story
 
Moving forward with AI
Moving forward with AIMoving forward with AI
Moving forward with AI
 
Sicurezza e conformità al GDPR con AWS
Sicurezza e conformità al GDPR con AWSSicurezza e conformità al GDPR con AWS
Sicurezza e conformità al GDPR con AWS
 
What IT Transformation Really Means for the Enterprise
What IT Transformation Really Means for the EnterpriseWhat IT Transformation Really Means for the Enterprise
What IT Transformation Really Means for the Enterprise
 
Lessons Learned Scaling Your Talent Transformation
Lessons Learned Scaling Your Talent TransformationLessons Learned Scaling Your Talent Transformation
Lessons Learned Scaling Your Talent Transformation
 
Security & Compliance
Security & ComplianceSecurity & Compliance
Security & Compliance
 
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
Security, Risk and Compliance of Your Cloud Journey - Tel Aviv Summit 2018
 
Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018
Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018
Mastering the Secret Sauce to SaaS - Adrian De Luca - AWS TechShift ANZ 2018
 
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
 
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
Starting your Cloud Transformation Journey - Tel Aviv Summit 2018
 
人工智能 (AI) 與機器學習概覽 (Level 200)
人工智能 (AI) 與機器學習概覽 (Level 200)人工智能 (AI) 與機器學習概覽 (Level 200)
人工智能 (AI) 與機器學習概覽 (Level 200)
 

Similar to Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions

Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Amazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Amazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Amazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaAmazon Web Services LATAM
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSAmazon Web Services
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSSteven Hsieh
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Amazon Web Services
 

Similar to Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions (20)

Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
Big Data on AWS - To infinity and beyond! - Tel Aviv Summit 2018
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresaImmersion Day - Como a AWS apoia a estratégia analítica de sua empresa
Immersion Day - Como a AWS apoia a estratégia analítica de sua empresa
 
BDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWSBDA305 Building Data Lakes and Analytics on AWS
BDA305 Building Data Lakes and Analytics on AWS
 
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWSAWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
AWS 2019 Taipei Summit - Building Serverless Analytics Platform on AWS
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ben Snively Specialist Solutions Architect, Demo: Kate Werling Solutions Architect Big Data Meets AI Driving Insights and Adding Intelligence to Your Solutions
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What we’ll cover • Big Data and why organization care • Common Challenge - Which,What,Hows… • Demonstration • Big Data Driving Machine Learning • Demonstration • Final Design Tenants
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VisualizationVariability Big Data Is Defined Many Different Ways Volume Velocity Variety Veracity Value
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://www.promptcloud.com https://john-popelaars.blogspot.com https://ww.signiant.com https://www.linkedin.com/pulse/world-today-data-rich-information-poor-guru-p-mohapatra-pmp/ What do the analysts say?
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Organizations that successfully generate business value from their data will outperform their peers.An Aberdeen survey saw organizations who implemented a data lake outperforming similar companies by 9% in organic revenue growth.* 24% 15% Leaders Followers Organic revenue growth *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence Most Important: DrivingValue from Data
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Is Changing  Analytics Are Adopting Capture and store new data at PB-EB scale Do new type of analytics in a cost effective way • Machine learning • Big data processing • Real-time analytics • Full-text search New types of analytics
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Data warehouse Business intelligence OLTP ERP CRM LOB • Relational and nonrelational data • TBs–EBs scale • Diverse analytical engines • Low-cost storage & analytics Devices Web Sensors Social Data lake Big data processing, real-time, machine learning
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Lakes from AWS Analytics • Unmatched durability, and availability at EB scale • Best security, compliance, and audit capabilities • Object-level controls for fine-grain access • Fastest performance by retrieving subsets of data • The most ways to bring data in • 2x as many integrations with partners • Analyze with broadest set of analytics & ML services Machine learning Real-time dataOn-premises Data Lake on AWS movementdata movement
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Managed ML Service Deep Learning AMIs Video and Image Recognition Conversational Interfaces Deep-Learning Video Camera Natural Language Processing Language Translation Speech Recognition Text-to-Speech Interactive Analysis Hadoop & Spark Data Warehousing Full-text search Real-time analytics Dashboards & Visualizations Dedicated Network connection Secure appliances Ruggedized Shipping Container Database migration Connect Devices to AWS Real-time Data Streams Real-time Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement Data Lakes, Analytics, and IoT Portfolio from AWS Broadest, deepest set of analytic services
  • 10. Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Data Lake Components Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Central Storage Secure, cost-effective Storage in Amazon S3 Metadata User Access Security/Governance Data Movement Analytics and Machine Learning
  • 11. Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight Amazon AI EMR Redshift Athena Kinesis RDS Central Storage Secure, cost-effective Storage in Amazon S3 S3 Snowball Database Migration Service Kinesis Firehose Direct Connect Data Ingestion Get your data into S3 Quickly and securely Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Security Token Service CloudWatch CloudTrail Key Management Service Data Lake Components Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access DynamoDB Elasticsearch API Gateway Identity & Access Management Cognito QuickSight A Central Storage Secure, cost-effective Storage in Amazon S3 Glue ETL
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common Big Data Challenges
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. WHICH tool should I use? One tool that doesn’t do any thing very well.. Organized suite of tools that do each purpose very well..
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built engines. Right tool for the right job. WHICH tool should I use?
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. WHICH tool should I use?
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. WHAT Data Do I Have? Gartner: “Through 2018, 80% of data lakes will not include effective metadata management capabilities, making them inefficient." Data Lake on AWS Storage | Archival Storage | Data Catalog
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Job AuthoringData Catalog Job Execution Apache Hive Metastore compatible Integrated with AWS services Automatic crawling Discover Auto-generates ETL code Python and Apache Spark Edit, debug, and share Develop Serverless execution Flexible scheduling Monitoring and alerting Deploy WHAT Data Do I Have? – AWS Glue
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. On-premises data Web app data Amazon RDS Other databases Streaming data Your data AMAZON QUICKSIGHT WHAT Data Do I Have?
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Other Ways to populate your Data Catalog Call the AWS Glue CreateTable API Create table manually Run Hive DDL statement Apache Hive Metastore AWS GLUE ETL AWS GLUE DATA CATALOG Import from Apache Hive Metastore
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. MOST Important: Selecting an Agile Framework Start with a tool that will serve the purpose Experiment, Test, Iterate, Adopt. Let’s look at an example: HOW can I get started? Evolution of Netflix Data pipeline
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aggregate and upload events to Hadoop/Hive for batch processing EXPERIMENT new things Batch  Batch+ Real-time
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chukwa front-end  Kafka Kafka front-endKafka ADOPT your solution
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. “Amazon Kinesis Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds,” Bennett. “We can discover and respond to issues in real time, ensuring high availability and a great customer experience.”” FOCUS on business value
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Select environment that allows you to try out different tools. Focus on tools that all you to do as much focusing on analytics as possible… AGILITY for the business
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agility of Analytics Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agility - Hadoop/Spark Analytics • Distributed processing • Diverse analytics • Batch/Script (Hive/Pig) • Interactive (Spark, Presto) • Real-time (Spark) • Machine Learning (Spark) • NoSQL (HBase) • For many use cases • Log and clickstream analysis • Machine learning • Real-time analytics • Large-scale analytics • Genomics • ETL YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agility - Hadoop/Spark Analytics on AWS YARN (Hadoop Resource Manager) NoSQLMachine learning Real-timeInteractiveScriptBatch Data Lake on AWS Amazon S3 Amazon EMR Managed Hadoop/Spark Object Storage
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 – Source of Truth, Multiple Clusters Amazon S3 Interactive Spark Cluster Amazon EMR Amazon EMR HDFS HDF S EC2 Instance Memory Intermediates stored on local disk or HDFS Loca l HDF S EC2 Instance Memory Intermediates stored on local disk or HDFS Loca l Transient ETL Job Source of Truth HDFS HDFS HDFS Local Intermediate HDFS/Storage Local Intermediate HDFS/Storage
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fitting this into the Common Data Catalog Amazon S3 Interactive Spark Cluster Amazon EMR Amazon EMR HDFS Transient ETL Job Source of Truth HDFS Describes Data in S3 MySQL DB instance Customershaveoptions Glue Data Catalog
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using Standard SQL
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demonstration Data Catalog and Analyzing your Data
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Machine Learning and Big Data
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Big Data driving Machine Learning Better Decisions Object Storage Databases Data warehouse Streaming analytics BI Hadoop Spark/Presto Elasticsearch Better Products Machine Learning Deep Learning/ AI More Users More Data Click stream User activity Generated content Purchases Clicks Likes Sensor data
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agility in Machine Learning Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Amazon QuickSight AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service AWS IoT Core Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams Data Lake on AWS Storage | Archival Storage | Data Catalog AnalyticsMachine learning Real-time dataOn-premises movementdata movement
  • 35. Machine Learning requires new tools and interfaces Machine Learning/Deep Learning Business Reporting Data Scientists Data Engineer IDE Data Catalog Central Storage Sagemaker
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agility in Machine Learning – for all users Application Services • Designed for Application Developers • Solution-oriented Prebuilt Models Available via APIs • Image Analysis, Text-to-Speech, Conversational UX Platforms • Designed for Data Scientists to Address Common Needs • Fully Managed Platform for Model Building • Reduces the Heavy Lifting in Model Building & Deployment Frameworks • Designed for Data Scientists to Address Advanced / Emerging Needs • Provides Maximum Flexibility to develop on the leading AI Frameworks • Enables Expert AI Systems to be Developed & Deployed
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Digital Globe – Using ML to Find the Right Data Data lake: • 100 PB of data in cloud • Optimize storage tiers Solution: • Optimize their data lake storage, cut costs in half
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 39.
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. FINRA - Data Is Central to Our Mission Reconstruct the market from trillions of events • Data from broker-dealers and exchanges • Equities, Options, Fixed Income • Build a graph of market order events Analyze the data looking for financial fraud • Insider trading, layering, cross-product manipulation, front running & many more • Looking for a needle in a haystack 4
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. FINRA - From data puddles to Data Lake Database1 Storage Query/Compute Catalog Database2 Storage Query/Compute Catalog Databasen Storage Query/Compute Catalog Storage Query/ Compute Catalog EMR Spark LambdaEMR Presto EMR HBase herd Hive metastore FINRA in Data Center FINRA in AWS Scales Silo Amazon S3
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Enabled Machine Learning on their Data Lake Data Scientist Logical ‘Database’ EMR Cluster Still one copy of data! Spark Cluster DS-in-a-box AuthN Data Scientist Data Scientist Catalog
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. UDSP – Inventory – not just R • R 3.2.5, Python (2.7.12 and 3.4.3) • Packages • R: 300+ Python: 100+ • Tools for Building Packages • gcc, gfortran, make, java, maven, ant… • IDEs • Jupyter, RStudio Server • Deep Learning • CUDA, CuDNN (if GPU present) • Theano, Caffe, Torch • TensorFlow 16
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Machine Learning Demonstration
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Final Design Tenants/Best Practices
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Core Tenants • Loose Coupling, but highly performant • Storage, Analytics, Metadata Management, etc.. • Future proof your analytics • Choosing the best tool for the job • Elasticity and multiple clusters for dedicated purposes • Replace capacity planning with a consumption model • Don’t forget metadata management
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use the right Storage Tier Data structure → Fixed schema, JSON, key-value Access patterns → Store data in the format you will access it Data characteristics → Hot, warm, cold Cost → Right cost
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile summit app!
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!

Editor's Notes

  1. We are at a big data and BI Summit, so I think most folks are familiar with Big Data, and some form of Vs. 3Vs, 5Vs, 7Vs – which represent some definition of a big data system.
  2. You don’t have to take my word for it… reports on the growth of data are readily available most everywhere you look. Top-Left – growth of unstructured data is vastly outpacing structured data Top-Right – the amount of data will grow 50x between 2010 and 2020 Bottom-Left – We already have PB/day customers. We’re trending towards EB and ZB data sets Bottom-Right – Data from sensors/connected-devices and social media are now described in multiples of the global population
  3. Organizations that successfully generate business value from their data, will outperform their peers. An Aberdeen survey saw organizations who implemented a Data Lake outperforming similar companies by 9% in organic revenue growth. These leaders were able to do new types of analytics like machine learning over new sources like log files, data from click-streams, social media, and internet connected devices stored in the Data Lake. This helped them to identify, and act upon opportunities for business growth faster by attracting and retaining customers, boosting productivity, proactively maintaining devices, and making informed decisions. Lock, Michael (Aberdeen), Angling for Insight In Today’s Data Lake (Oct 2017), pg 7.
  4. Amazon S3 provides object storage built to store and retrieve any amount of data. S3 has unmatched durability, and availability, built from the ground up to deliver a customer promise of 99.999999999% of durability at Exabyte scale. Only S3 automatically replicates your data in three availability zones within a single region, giving you unmatched resilience to single data center issues like power failures. Only S3 lets you do cross region replication seamlessly without having to use a separate storage class. Finally, only S3 allows you to do cross region replication where you choose any number of specified regions to replicate to. Amazon S3 has the best security, compliance, and audit capabilities of any storage service. It can automatically encrypt your data, and gives you three choices for key management through S3 Key Management, customer-provided keys, and with AWS Key Management Service (KMS). Only S3 gives you encryption when replicating data across regions, and lets you use separate accounts for the source and destination regions, protecting against malicious insider deletion of data. Only S3 has integration to an AI-powered security service to monitor, detect, and alert anomalies that might indicate early stages of an attack with Amazon Macie. To meet compliance regulations, you can log, and audit all account activity including how, when, and who is accessing objects in S3 through AWS CloudTrail. These features allow AWS to support security standards and compliance certifications for virtually every regulatory agency around the globe. Amazon S3 is the only storage service that lets you operate at the object level, rather than the bucket level. This allows you to set fine-grain access controls, and security policies to restrict access to specific objects, and create lifecycle policies to automatically delete or tier groups of objects into lower-cost storage. Amazon S3 is the only storage system that has the ability to retrieve only the subsets of data within an object that is needed with S3 Select, speeding up queries up to 400 percent, resulting in faster queries at lower costs. AWS provides the most ways to bring data into your data lake than anywhere else. These include importing real-time, streaming data with Amazon Kinesis, establishing a dedicated network connection between your premises and AWS with AWS Direct Connect, using secure appliances to transfer large amounts of data with AWS Snowball, using a ruggedized shipping container to transfer data at Exabyte-scale with AWS Snowmobile, and migrating your databases with AWS Database Migration Service. The Amazon S3 ecosystem has twice as many partner integrations than anyone else, with tens of thousands of consulting, systems integrator and independent software vendor partners. This means that it is easier to use S3 as primary storage, backup, archive, and disaster recovery with applications that you already own like from NetApp, EMC, Vertias, and others. Once you have started to build your data lake, AWS provides the broadest, and diverse set of options to analyze, and extract value from your data whether it be for analytics, machine learning, or IoT use cases. You are given the tools and frameworks of your choice, with the broadest set of purpose-built services available that all run directly on the data lake, without the need to move data into a separate analytics system.
  5. Many customers spend time and effort in analysis to find the perfect tool for their needs. At the rate the ecosystem is evolving, that tool might no longer be the best if you’ve spent so much time in research.
  6. Now that I know what data I have… In the old world, you knew your schema, you got a BI tool, and you asked it questions based on the structure. You knew exactly which questions you wanted to ask, which drove a very predictable collection and storage model When think about data in the context of the 3 V’s, you need different tooling… and you’re going to want ask questions of data that isn’t structured. In the new world of data analysis your questions are going to evolve and change over time. You need to be able to collect, store and analyze data without being constrained by resources, whether compute, storage, or even the tool being used. You want a purpose-built tool to derive the type of analysis – the type of insight – that you’re looking for.
  7. With the rise of Big Data, the ecosystem is quite active, and the tools are rapidly changing… You need the ability to evolve with the tools and your own needs. Many customers spend time and effort in analysis to find the perfect tool for their needs. At the rate the ecosystem is evolving, that tool might no longer be the best if you’ve spent so much time in research. Our recommendation: Find a tool that meets the need, then iterate in the tooling as you learn more about your actual needs. In order to do that, you need to have a good metadata management process, portable data formats, and easy access to the data. Otherwise, your data is in jail. Many customers tell us about the pain they experience when their data is locked behind a vendor-specific format for vendor-controlled interface.
  8. Problem #1 – Many organizations don’t know what they have. When you accumulate such a diversity of data, you need mechanisms to understand what data you have, where it is located, and what format. This is metadata management. And if not managed properly (or at all), the data is essentially lost. It is taking up space, but you have no means to put it to use. A common issue, regardless of whether it is on-prem or in the cloud, is the lack of a metadata management approach from the onset. The Financial Industry Regulatory Authority (FINRA) oversees more than 3,900 securities firms with approximately 640,000 brokers. FINRA processes approximately 6 terabytes of data and 37 billion records on an average day to build a complete, holistic picture of market trading in the U.S. On busy days, the stock markets can generate 75 billion+ records. The way they’re able to make all this data useful, whether to data scientists or business users or others, is through a metadata system they developed and open sourced, called HERD. This is the same platform that is used by LinkedIn, for example. But most organizations don’t actually go off and built their own tooling. Ivy Tech is a community college - 60,000 online and in-person course sections, 8,300 on staff, 170,000 students, and130 locations. Ivy Tech uses metadata capabilities provided by AWS to manage their information.
  9. These are the main components of Glue. Glue comprises of a data catalog which is a central metadata repository, an ETL engine that can auto-generate Python code, and a flexible scheduler that handles dependency resolution, job monitoring and retries. Together, these automate much of the heavy lifting involved with discovering, categorizing, cleaning, enriching, and moving data, so you can spend more time analyzing your data.   Glue automatically discovers your data, determines the schema, and builds your data catalog. The Glue data catalog provides out-of-box integration with Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum. The ETL code Glue generates is just Python code that is entirely customizable, reusable, and portable. You can edit this code using your favorite IDE or notebook and share it with others using GitHub. And finally, Glue is serverless. There are no resources to manage and you only pay for the resources your jobs consume while they run.
  10. Glue includes a feature called the AWS Glue crawlers. These crawlers allow you to discover your metadata for the catalog automatically. These can operate over obth your relational databases and data warehouses, but also your data lakes on S3. when crawling sources such as S3, it will first identify the format of the data, for example, is the dataset CSV, JSON, Parquet, Avro, etc, and then it will determine fields and type of each field within the data. It really does a great job, but you can also go in and modify the outputs. It can also identify both hive compliant as well as non-compliant partitioning of data.
  11. <21-28 to be screenshot heavy>
  12. OK – I’m jazzed… I know the pitfalls. Now… What do I do? Netflix data pipeline ~500 billion events and ~1.3 PB per day ~8 million events and ~24 GB per second during peak demand
  13. There are several hundred event streams flowing through the pipeline. For example: Video viewing activities UI activities Error logs Performance events Troubleshooting & diagnostic events We see netflix started with a batch analytics. Collecting their data using apache Chukwa and saving it in an S3 backed data lake. After they built this, they needed to start doing real-time analytics on the data. They easily pushed the new version out branching off and creating a Kafka based backend.
  14. To improve the reliability, and scale, they shifted from the Chukwa front-end pushing to Kafka to having Kafka publishing and routing specific messages to the consumer kafka topics.
  15. They shifted and built the log analytics on Kinesis Data Streams. built on Amazon Kinesis enables us to identify ways to increase efficiency, reduce costs, and improve resiliency for the best customer experience
  16. Zillow Group increases machine-learning calculation performance and scalability and delivers near-real-time home-valuation data to customers using AWS. The company houses a portfolio of the largest online real-estate and home-related brands. Zillow Group runs the Zestimate, its machine learning–based home-valuation tool, on Amazon Kinesis and Apache Spark on Amazon EMR. Zillow uses Kinesis Streams to collect public record data and MLS listings, and then update home value estimates in near real-time so home buyers and sellers can get the most up to date home value estimates. Zillow also sends the same data to its S3 data lake using Kinesis Firehose, so that all the applications can work with the most recent information. Using structured data, unstructured data like image, etc. DigitalGlobe went all in on AWS to meet the growing demand for commercial geo-intelligence, migrating its entire 17-year imagery archive to the cloud. DigitalGlobe is one of the world’s leading providers of high-resolution earth imagery, data, and analysis. The company used AWS Snowmobile to move 100 petabytes of data to the cloud, allowing it to move away from large file-transfer protocols and delivery workflows. DigitalGlobe also uses Amazon SageMaker to handle machine learning at scale. Dr. Walter Scott, CTO and founder at DigitalGlobe, spoke at re:Invent 2017. Cache rate improved by more than a factor of 2. Went up to 83% sometimes trending to 90% Stripe uses Athena Amazon.com uses DyanmoDB and a suite of other servlerless services in Herd. rocessing delays decrease from 1 second to 100 milliseconds; Herd controls the business logic for processing all Amazon.com customer orders worldwide, orchestrating more than 1,300 workflows for everything from order processing to fulfillment-center operations to coordinating parts of the Amazon Alexa backend. A mission-critical system used by more than 300 Amazon engineering teams, Herd executes more than 4 billion workflows on peak days. Requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. According to the team, the extreme scale, consistent performance, and high availability of DynamoDB let them meet needs of Prime Day without breaking a sweat. DynamoDB is used by Lyft to store GPS locations for all their rides, Tinder to store millions of user profiles and make billions of matches, Redfin to scale to millions of users and manage data for hundreds of millions of properties, Comcast to power their XFINITY X1 video service running on more than 20 million devices, BMW to run its car-as-a-sensor service that can scale up and down by two orders of magnitude within 24 hours, Nordstrom for their recommendations engine reducing processing time from 20 minutes to a few seconds, Under Armour to support its connected fitness community of 200 million users, Toyota Racing to make real time decisions on pit-stops, tire changes, and race strategy, and another 100,000+ AWS customers for a wide variety of high-scale, high-performance use cases.  
  17. You simply put your Data in S3 and submit SQL against it
  18. Why is AI/ML often talked about side by side with Data conferences. Data really fuels AI/ML. AI/ML is all about finding patterns in the data and using that patterns to make predictions, recognitions images,create speech and provide other intelligent capabilities. This in turn creates a flywheel effect where these new intelligent capabilities in tern increases user base and customer usage which create more data that allows organizations to better under their users drive analytics and new intelligent systems.
  19. “By using Amazon SageMaker, DigitalGlobe cache rate improved by more than a factor of two, often times being around 83% and sometimes trending to 90% cache hit. This allowed them to also cut their cloud storage cost in half by better utilizing their S3 Optimized cache and retrieving less from their 100+ PB Archive.” Purpose: showcase the power of ML to identify data utility The blue dots represent what humans decided to cache (almost the whole world) and the orange dots represent what our customers requested access to over a three month period. We were missing the mark by a long shot. http://blog.digitalglobe.com/industry/using-machine-learning-to-save-money-on-cloud-data-storage/
  20. Digital Globe: 2 different use cases – As the world’s leading provider of high-resolution Earth imagery, data and analysis, DigitalGlobe works with enormous amounts of data every day. Use Case 1: As more and more imagery is collected from their growing constellation of satellites it is critical for DG to predict and cache only the most relevant imagery at any given point in time, allowing them to take advantage of AWS’ tiered storage products to optimize their costs.  They are relying on machine learning as the business grows. By analyzing 17 years of changing access patterns to this imagery data, they can predict how long to keep the data readily available in Amazon S3 before moving it to cold archive in Amazon Glacier, by example. With Amazon SageMaker’s machine learning algorithms, they can identify and predict exactly what imagery is going to be used and requested in real-time, to drive down the cost of managing petabytes of data at-scale – and the engineers that are using SageMaker to do this knew nothing about Machine Learning when they started! Use Case 2: DigitalGlobe is making it easier for people to find, access, and run compute against their entire data archive in the cloud in order to apply deep learning to satellite imagery. They plan to use Amazon SageMaker to train models against petabytes of earth observation imagery datasets using hosted Jupyter notebooks, so DigitalGlobe's Geospatial Big Data Platform (GBDX) users can just push a button, create a model, and deploy it all within one scalable distributed environment at scale.