SlideShare a Scribd company logo
1 of 104
Big Data Analytics
with Amazon Web Services



               Dr. Matt Wood
An Online Seminar for Partners. Wednesday 1st August.
Hello, and thank you.
Big Data Analytics

   An introduction
Big Data Analytics

   An introduction

   The story of analytics on AWS
Big Data Analytics

   An introduction

   The story of analytics on AWS

   Integrating partners
Big Data Analytics

   An introduction

   The story of analytics on AWS

   Integrating partners

   Partner success stories
1




INTRODUCING BIG DATA
Data for competitive
     advantage.
Using data

  Customer segmentation,
  financial modeling,
  system analysis,
  line-of-sight,
  business intelligence.
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Cost of data generation
       is falling.
lower cost,
increased throughput

                             Generation




                         Collection & storage




                       Analytics & computation




                       Collaboration & sharing
Generation


                          HIGHLY CONSTRAINED

  Collection & storage




Analytics & computation




Collaboration & sharing
Very high barrier to turning
  data into information.
Move from a
data generation challenge to
    analytics challenge.
Enter the Cloud.
Remove the constraints.
Enable data-driven innovation.
Move to a distributed data
        approach.
Maturation of two things.
Software for distributed
      storage and analysis



Maturation of two things.
Software for distributed
      storage and analysis



Maturation of two things.

  Infrastructure for distributed
       storage and analysis
Software

  Frameworks for
  data-intensive workloads.

  Distributed by design.
Infrastructure

  Platform for
  data-intensive workloads.

  Distributed by design.
Support the
data timeline.
Generation


                          HIGHLY CONSTRAINED

  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Lower the
barrier to entry.
Accelerate time to market
   and increase agility.
Enable new business
   opportunities.
Washington Post

   Pinterest

    NASA
“AWS enables Pfizer to explore
difficult or deep scientific
questions in a timely, scalable
manner and helps us make better
decisions more quickly”

Michael Miller, Pfizer
2




THE STORY OF ANALYTICS
EC2
Utility computing.
 6 years young.
Scale out systems


 Embarrassingly parallel problems.
 Queue based distribution.
 Small, medium and high scale.
Cost optimization.



    EC2
Utility computing.
 6 years young.
Achieving economies of scale
100%




                                      Time
Achieving economies of scale
100%




               Reserved capacity




                                      Time
Achieving economies of scale
100%




                On-demand




               Reserved capacity




                                      Time
Achieving economies of scale
                                   UNUSED CAPACITY
100%




                On-demand




               Reserved capacity




                                                     Time
Spot Instances


 Bid on unused EC2 capacity.
 Very large discount.
 Perfect for batch runs.
 Balance cost and scale.
$650 per hour
Map/reduce

 Pattern for distributed computing.

 Software frameworks such as
 Hadoop.

 Write two functions. Scale up.
Map/reduce

 Pattern for distributed computing.

 Software frameworks such as
 Hadoop.

 Write two functions. Scale up.

 Complex cluster configuration
 and management.
Amazon Elastic MapReduce

 Managed Hadoop clusters.

 Easy to provision and monitor.

 Write two functions. Scale up.

 Optimized for S3 access.
S3

Input data




                  UNDER
                    THE




                      i
                  i
                  HOOD
S3

        Input data




Code     Elastic
       MapReduce




                          UNDER
                            THE




                              i
                          i
                          HOOD
S3

        Input data




Code     Elastic     Name
       MapReduce     node




                                 UNDER
                                   THE




                                     i
                                 i
                                 HOOD
S3

        Input data




Code     Elastic     Name
       MapReduce     node




                            Elastic
                            cluster
                                      UNDER
                                        THE




                                          i
                                      i
                                      HOOD
S3

        Input data




Code     Elastic     Name
       MapReduce     node


                                      HDFS


                            Elastic
                            cluster
                                             UNDER
                                               THE




                                                 i
                                             i
                                             HOOD
S3

        Input data




Code     Elastic              Name
       MapReduce              node

                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
                                                            UNDER
                                                              THE




                                                                i
                                                            i
                                                            HOOD
S3

        Input data




Code     Elastic              Name                            Output
       MapReduce              node                          S3 + SimpleDB


                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
                                                                  UNDER
                                                                    THE




                                                                            i
                                                                 i
                                                                  HOOD
S3

Input data




                    Output
                  S3 + SimpleDB




                        UNDER
                          THE




                                  i
                       i
                        HOOD
Performance
Performance
 Compute performance
UNDER
                               THE




                                 i
                             i
Cluster Compute              HOOD

 Intel Xeon E5-2670
 10 gig E non-blocking network
 60.5 Gb
 Placement groupings
UNDER
                               THE




                                 i
                             i
Cluster Compute              HOOD

 Intel Xeon E5-2670
 10 gig E non-blocking network
 60.5 Gb
 Placement groupings

 + GPU enabled instances
Performance
 Compute performance
IO performance



Performance
 Compute performance
NoSQL
Unstructured data storage.
DynamoDB

 Predictable, consistent performance
 Unlimited storage
 Single digit millisecond latencies
 No schema for unstructured data
 Backed on solid state drives
...and SSDs for all.
  New Hi1 storage instances.
UNDER
                                  THE




                                    i
                                i
hi1.4xlarge                     HOOD

  2 x 1Tb SSDs
  10 GigE network
  HVM: 90k IOPS read, 9k to 75k write
  PV: 120k IOPS read, 10k to 85k write
“The hi1.4xlarge configuration is
about half the system cost for the
same throughput.”


Netflix
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
EBS
Elastic Block Store
Provisioned IOPS
 Provision required IO performance
Provisioned IOPS
 Provision required IO performance
                  +
      EBS-optimized instances
     with dedicated throughput
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Performance + ease of use
3




PARTNER INTEGRATION
Extend platform with
     partners
Innovate on behalf of
    customers
Remove undifferentiated
    heavy lifting
MapR distribution for EMR

 Rolled the Amazon Hadoop
 optimizations into MapR

 Choice for EMR customers

 Easy deployment for MapR customers
MapR distribution for EMR

 Hadoop distribution

 Integrated into EMR

 NFS and ODBC drivers

 High availability and cluster
 mirroring
Informatica on EMR

 Enterprise data toolchain

 “Swiss army knife” for data formats

 Data integration

 Available to all on EMR
AWS Marketplace
Karmasphere, Marketshare, Acunu Cassandra,
     Metamarkets, Aspera and more.


       aws.amazon.com/marketplace
4




PARTNER SUCCESS STORIES
Razorfish
3.5 billion records
71MM unique cookies
 1.7MM targeted ads
        per day
3.5 billion records
         71MM unique cookies
          1.7MM targeted ads
                 per day


500% improvement in return on ad spend.
Cycle Computing
 + Schrodinger
30k cores, $4200 an hour
    (compared to $10+ million)
Marketshare
+ Ticketmaster
Optimize live event pricing
Reduced developer
  infrastructure
management time
 by 3 hours a day
Thank you!
Q&A
matthew@amazon.com
   @mza on Twitter

More Related Content

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Big Data Analytics with Amazon Web Services