SlideShare a Scribd company logo
1 of 80
IaaS Cloud Benchmarking:
  Approaches, Challenges, and
  Experience

                             Alexandru Iosup

                             Parallel and Distributed Systems Group
                             Delft University of Technology
                             The Netherlands

Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, …
Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips, Dick Epema,
Alexandru Iosup Collaborators Ion Stoica and the Mesos team (UC Berkeley),
Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad
Posea (UPB), Derrick Kondo, Emmanuel Jeannot (INRIA), Assaf Schuster, Mark
Silberstein, Orna Ben-Yehuda (Technion), ...
                                   November 12, 2012
                                                                                    1
                            MTAGS, SC’12, Salt Lake City, UT, USA
November 12, 2012
                    2
November 12, 2012
                    3
What is Cloud Computing?
3. A Useful IT Service
“Use only when you want! Pay only for what you use!”




                      November 12, 2012
                                                       4
IaaS Cloud Computing




   Many tasks

    VENI – @larGe: Massivizing Online Games using Cloud Computing
Which Applications Need
    Cloud Computing? A Simplistic
    View…
                         Social                            Tsunami Epidemic
       High    Web       Gaming Exp.                       Prediction Simulation
               Server           Research                        Space Survey
                    OK, so we’re done here?                   Comet Detected

 Demand         SW               Social
Variability                                    Analytics
              Dev/Test         Networking
                              Not so fast!
                               Online                         Pharma
              Taxes, Office     Gaming                        Research
                                                    Sky Survey
              @Home Tools                                      HP Engineering
       Low
              Low             Demand Volume                        High

                                November 12, 2012   After an idea by Helmut Krcmar
                                                                                   6
What I Learned from Grids


 • Average job size is 1 (that is, there are no [!] tightly-
   coupled, only conveniently parallel jobs )


From Parallel to Many-Task Computing


A. Iosup, C. Dumitrescu, D.H.J. Epema, H. Li, L. Wolters, How
are Real Grids Used? The Analysis of Four Grid Traces and Its
Implications, Grid 2006.
A. Iosup and D.H.J. Epema, Grid Computing Workloads, IEEE
Internet Computing 15(2): 19-26 (2011)
                         November 12, 2012
                                                               7
What I Learned from Grids

                           Server                • 99.99999% reliable
                   Small Cluster                 • 99.999% reliable
  Grids are unreliable infrastructure
                      Production                 • 5x decrease in failure rate
                       Cluster                     after first year [Schroeder and Gibson,
    CERN LCG jobs                                    DSN‘06]
     74.71% successful
  25.29% unsuccessful
                           DAS-2                 • >10% jobs fail [Iosup et al., CCGrid’06]

                        TeraGrid              • 20-45% failures [Khalili et al., Grid’06]
                                                Grid-level availability: 70%
                            Grid3         • 27% failures, 5-10 retries [Dumitrescu et
                                             al., GCC’05]
     Source: dboard-gr.cern.ch, May’07.


A. Iosup, M. Jan, O. Sonmez, and 2012
                        November 12, D.H.J. Epema, On the Dynamic
                                                                9
Resource Availability in Grids, Grid 2007, Sep 2007.
What I Learned From Grids,
 Applied to IaaS Clouds


                                                               or



                               We just don’t know!
http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/                Tropical Cyclone Nargis (NASA, ISSS, 04/29/08)




 •     “The path to abundance”                                         •    “The killer cyclone”
 •     On-demand capacity                                              •    Performance
 •     Cheap for short-term tasks                                           for scientific applications
 •     Great for web apps (EIP, web                                         (compute- or data-intensive)
       crawl, DB ops, I/O)                                             •    Failures, Many-tasks, etc.
                                                        November 12, 2012
                                                                                                                             10
This Presentation: Research Questions
    Q0: What are the workloads of IaaS clouds?

               Q1: What is the performance
            of production IaaS cloud services?
         Q2: How variable is the performance
      of widely used production cloud services?

  Q3: How do provisioning and allocation policies
  affect the performance of IaaS cloud services?
Other questions studied at TU Delft: How does virtualization affect the performance
of IaaS cloud services? What is a good model for cloud workloads? Etc.
       But … This is benchmarking =
   process of quantifying the performance
      and other non-functional properties
                   November 12, 2012
                 of the system                                                    11
Why IaaS Cloud Benchmarking?


• Establish and share best-practices in answering
  important questions about IaaS clouds

•   Use in procurement
•   Use in system design
•   Use in system tuning and operation
•   Use in performance management

• Use in training

                       November 12, 2012
                                                    12
Agenda


1.   An Introduction to IaaS Cloud Computing
2.   Research Questions or Why We Need Benchmarking?
3.   A General Approach and Its Main Challenges
4.   IaaS Cloud Workloads (Q0)
5.   IaaS Cloud Performance (Q1) and Perf. Variability (Q2)
6.   Provisioning and Allocation Policies for IaaS Clouds (Q3)
7.   Conclusion




                         November 12, 2012
                                                             13
A General Approach for
IaaS Cloud Benchmarking




             November 12, 2012
                                 14
Approach: Real Traces, Models, Real Tools,
Real-World Experimentation (+ Simulation)

 •   Formalize real-world scenarios
 •   Exchange real traces
 •   Model relevant operational elements
 •   Scalable tools for meaningful and repeatable experiments
 •   Comparative studies
 •   Simulation only when needed (long-term scenarios, etc.)
                    Rule of thumb:
        Put 10-15% project effort
             into benchmarking
                         November 12, 2012
                                                            15
10 Main Challenges in 4
    Categories*                                               * List not exhaustive
    • Methodological                        • Workload-related
        1. Experiment compression             1. Statistical workload models
        2. Beyond black-box testing           2. Benchmarking performance
           through testing short-term            isolation under various multi-
           dynamics and long-term                tenancy models
           evolution
        3. Impact of middleware

    • System-Related                        • Metric-Related
        1. Reliability, availability, and     1. Beyond traditional
           system-related properties             performance: variability,
        2. Massive-scale, multi-site             elasticity, etc.
           benchmarking                       2. Closer integration with cost
        3. Performance isolation                 models


Iosup, Prodan, and Epema, IaaS Cloud       Read our article
                         November 12, 2012
Benchmarking: Approaches, Challenges, and                                         16

Experience, MTAGS 2012. (invited paper)
Agenda

                                          Workloads
1. An Introduction to IaaS Cloud Computing
2. Research Questions or Why We Need Benchmarking?
3.                                        Performance
   A General Approach and Its Main Challenges
4. IaaS Cloud Workloads (Q0)
5. IaaS Cloud Performance (Q1) &          Variability
   Perf. Variability (Q2)
6. Provisioning & Allocation Policies
   for IaaS Clouds (Q3)                   Policies
7. Conclusion

                      November 12, 2012
                                                        17
IaaS Cloud Workloads: Our Team



 Alexandru Iosup       Dick Epema       Mathieu Jan         Ozan Sonmez    Thomas de Ruiter
     TU Delft            TU Delft      TU Delft/INRIA         TU Delft         TU Delft

        BoTs              BoTs               BoTs              BoTs           MapReduce
     Workflows            Grids      Statistical modeling                       Big Data
      Big Data                                                            Statistical modeling
Statistical modeling




                       Radu Prodan    Thomas Fahringer Simon Ostermann
                         U.Isbk.          U.Isbk.          U.Isbk.

                        Workflows         Workflows          Workflows

                                      November 12, 2012
                                                                                           18
What I’ll Talk About
IaaS Cloud Workloads (Q0)

  1.   BoTs
  2.   Workflows
  3.   Big Data Programming Models
  4.   MapReduce workloads




                      November 12, 2012
                                          19
What is a Bag of Tasks (BoT)? A
     System View
 BoT = set of jobs sent by a user…




 …that is submitted at most Δs after the
   first job


                                                 Time [units]
• Why Bag of Tasks? From the perspective
  of the user, jobs in set are just tasks of a larger job
• A single useful result from the complete BoT
• Result can be combination of all tasks, or a selection
  of the results of most or even a single task
Iosup et al., The Characteristics and
                           2012-2013
Performance of Groups of Jobs in Grids,                          20

Euro-Par, LNCS, vol.4641, pp. 382-393, 2007.                    Q0
Applications of the BoT Programming
Model
• Parameter sweeps
   • Comprehensive, possibly exhaustive investigation of a model
   • Very useful in engineering and simulation-based science

• Monte Carlo simulations
   • Simulation with random elements: fixed time yet limited inaccuracy
   • Very useful in engineering and simulation-based science

• Many other types of batch processing
   • Periodic computation, Cycle scavenging
   • Very useful to automate operations and reduce waste

                             2012-2013
                                                                     21
                                                                   Q0
BoTs Are the Dominant Programming
Model for Grid Computing (Many Tasks)




                                                            22
         Iosup and Epema: Grid Computing Workloads.

             IEEE Internet Computing 15(2): 19-26 (2011)
                                                           Q0
What is a Wokflow?


WF = set of jobs with precedence
   (think Direct Acyclic Graph)




                                   2012-2013
                                                23
                                               Q0
Applications of the Workflow
        Programming Model
        • Complex applications
              • Complex filtering of data
              • Complex analysis of instrument measurements


        • Applications created by non-CS scientists*
              • Workflows have a natural correspondence in the real-world,
                as descriptions of a scientific procedure
              • Visual model of a graph sometimes easier to program


        • Precursor of the MapReduce Programming Model
          (next slides)

                                                    2012-2013
*Adapted from: Carole Goble and David de Roure, Chapter in “The Fourth          24

Paradigm”, http://research.microsoft.com/en-us/collaboration/fourthparadigm/   Q0
Workflows Exist in Grids, but Did No
    Evidence of a Dominant Programming Model

  • Traces


  • Selected Findings




     •   Loose coupling
     •   Graph with 3-4 levels
     •   Average WF size is 30/44 jobs
     •   75%+ WFs are sized 40 jobs or less, 95% are sized 200 jobs or less
Ostermann et al., On the Characteristics of Grid
                           2012-2013
Workflows, CoreGRID Integrated Research in Grid                           25

Computing (CGIW), 2008.                                                 Q0
What is “Big Data”?

• Very large, distributed aggregations of loosely structured
  data, often incomplete and inaccessible

• Easily exceeds the processing capacity of conventional
  database systems

• Principle of Big Data: “When you can, keep everything!”

• Too big, too fast, and doesn’t comply with the traditional
  database architectures

                             2011-2012
                                                                26
                                                               Q0
The Three “V”s of Big Data
• Volume
      •   More data vs. better models
      •   Data grows exponentially
      •   Analysis in near-real time to extract value
      •   Scalable storage and distributed queries
• Velocity
      • Speed of the feedback loop
      • Gain competitive advantage: fast recommendations
      • Identify fraud, predict customer churn faster
• Variety
      • The data can become messy: text, video, audio, etc.
      • Difficult to integrate into applications

Adapted from: Doug Laney, “3D data management”, META Group/Gartner report,
                                                   2011-2012
Feb 2001. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-    27
Management-Controlling-Data-Volume-Velocity-and-Variety.pdf                  Q0
Ecosystems of Big-Data Programming
      Models
                                                                                 High-Level Language
Flume BigQuery     SQL     Meteor JAQL       Hive    Pig Sawzall         Scope   DryadLINQ AQL

                                                                                  Programming Model
                               PACT MapReduce Model               Prege      Dataflow     Algebrix
                                                                    l

                                                                 Execution Engine
Flume Dremel Tera Azure Nephele Haloop Hadoop/ Giraph MPI/ Dryad        Hyracks
Engine Service Data Engine              YARN         Erlang
        Tree Engine


                                                                                    Storage Engine
 S3     GFS    Tera Azure              HDFS         Voldemort             L CosmosF      Asterix
               Data Data                                                  F    S          B-tree
               Store Store        * Plus Zookeeper, CDN, etc.             S


                                               2012-2013
Adapted from: Dagstuhl Seminar on Information Management in the Cloud,                            28

http://www.dagstuhl.de/program/calendar/partlist/?semnr=11321&SUOG                             Q0
Our Statistical MapReduce Models
• Real traces
   • Yahoo
   • Google
   • 2 x Social Network Provider




                               November 12, 2012
             de Ruiter and Iosup. A workload model for MapReduce.
                                                                         29

               MSc thesis at TU Delft. Jun 2012. Available online via   Q0
Agenda

                                          Workloads
1. An Introduction to IaaS Cloud Computing
2. Research Questions or Why We Need Benchmarking?
3.                                        Performance
   A General Approach and Its Main Challenges
4. IaaS Cloud Workloads (Q0)
5. IaaS Cloud Performance (Q1) &          Variability
   Perf. Variability (Q2)
6. Provisioning & Allocation Policies
   for IaaS Clouds (Q3)                   Policies
7. Conclusion

                      November 12, 2012
                                                        30
IaaS Cloud Performance: Our Team



Alexandru Iosup    Dick Epema     Nezih Yigitbasi    Athanasios Antoniou
    TU Delft         TU Delft        TU Delft             TU Delft

 Performance      Performance      Performance          Performance
  Variability     IaaS clouds       Variability           Isolation
   Isolation
Multi-tenancy
Benchmarking




                  Radu Prodan    Thomas Fahringer Simon Ostermann
                    U.Isbk.          U.Isbk.          U.Isbk.

                  Benchmarking     Benchmarking        Benchmarking

                                 November 12, 2012
                                                                           31
What I’ll Talk About
IaaS Cloud Performance (Q1)
  1.   Previous work
  2.   Experimental setup
  3.   Experimental results
  4.   Implications on real-world workloads




                       November 12, 2012
                                              32
Some Previous Work
(>50 important references across our
studies)
Virtualization Overhead
   •   Loss below 5% for computation [Barham03] [Clark04]
   •   Loss below 15% for networking [Barham03] [Menon05]
   •   Loss below 30% for parallel I/O [Vetter08]
   •   Negligible for compute-intensive HPC kernels [You06] [Panda06]


Cloud Performance Evaluation
   • Performance and cost of executing a sci. workflows [Dee08]
   • Study of Amazon S3 [Palankar08]
   • Amazon EC2 for the NPB benchmark suite [Walker08] or
     selected HPC benchmarks [Hill08]
   • CloudCmp [Li10]
   • Kosmann et al.
                             November 12, 2012
                                                                        33
Q1
    Production IaaS Cloud Services

•   Production IaaS cloud: lease resources (infrastructure) to
    users, operate on the market and have active customers




                                November 12, 2012
                                                                               34
          Iosup et al., Performance Analysis of Cloud Computing Services for Many
Q1
Our Method
•   Based on general performance technique: model
    performance of individual components; system
    performance is performance of workload + model
    [Saavedra and Smith, ACM TOCS’96]
•   Adapt to clouds:
    1. Cloud-specific elements: resource provisioning and allocation
    2. Benchmarks for single- and multi-machine jobs
    3. Benchmark CPU, memory, I/O, etc.:




                            November 12, 2012
                                                                           35
      Iosup et al., Performance Analysis of Cloud Computing Services for Many
Single Resource
Provisioning/Release                                                   Q1




•   Time depends on instance type
•   Boot time non-negligible
                             November 12, 2012
                                                                            36
       Iosup et al., Performance Analysis of Cloud Computing Services for Many
Multi- Resource
Provisioning/Release                                                    Q1




•   Time for multi- resource increases with number of resources
                             November 12, 2012
                                                                            37
       Iosup et al., Performance Analysis of Cloud Computing Services for Many
Q1
 CPU Performance of Single
 Resource
• ECU definition: “a 1.1 GHz
  2007 Opteron” ~ 4 flops
  per cycle at full pipeline,
  which means at peak
  performance one ECU
  equals 4.4 gigaflops per
  second (GFLOPS)
• Real performance
  0.6..0.1 GFLOPS =
  ~1/4..1/7 theoretical peak



                             November 12, 2012
                                                                            38
       Iosup et al., Performance Analysis of Cloud Computing Services for Many
Q1
 HPLinpack Performance (Parallel)




• Low efficiency for parallel compute-intensive applications
• Low performance vs cluster computing and supercomputing
                            November 12, 2012
                                                                           39
      Iosup et al., Performance Analysis of Cloud Computing Services for Many
Q1
 Performance Stability (Variability)
                                                                         Q2




• High performance variability for the best-performing
  instances

                             November 12, 2012
       Iosup et al., Performance Analysis of Cloud Computing Services for Many
                                                                            40

         Tasks Scientific Computing, (IEEE TPDS 2011).
Q1
Summary


• Much lower performance than theoretical peak
   • Especially CPU (GFLOPS)
• Performance variability
• Compared results with some of the commercial
  alternatives (see report)




                       November 12, 2012
                                                  41
Q1
    Implications: Simulations


• Input: real-world workload traces, grids and PPEs
• Running in
   • Original env.
   • Cloud with
     source-like perf.
   • Cloud with
     measured perf.
• Metrics
   • WT, ReT, BSD(10s)
   • Cost [CPU-h]


                                  November 12, 2012
                                                                                 42
            Iosup et al., Performance Analysis of Cloud Computing Services for Many
Q1
Implications: Results




• Cost: Clouds, real >> Clouds, source
• Performance:
   • AReT: Clouds, real >> Source env. (bad)
   • AWT,ABSD: Clouds, real << Source env. (good)
                          November 12, 2012
                                                                         43
    Iosup et al., Performance Analysis of Cloud Computing Services for Many
Agenda

                                          Workloads
1. An Introduction to IaaS Cloud Computing
2. Research Questions or Why We Need Benchmarking?
3.                                        Performance
   A General Approach and Its Main Challenges
4. IaaS Cloud Workloads (Q0)
5. IaaS Cloud Performance (Q1) &          Variability
   Perf. Variability (Q2)
6. Provisioning & Allocation Policies
   for IaaS Clouds (Q3)                   Policies
7. Conclusion

                      November 12, 2012
                                                        44
IaaS Cloud Performance: Our Team



Alexandru Iosup    Dick Epema     Nezih Yigitbasi    Athanasios Antoniou
    TU Delft         TU Delft        TU Delft             TU Delft

 Performance      Performance      Performance          Performance
  Variability     IaaS clouds       Variability           Isolation
   Isolation
Multi-tenancy
Benchmarking




                  Radu Prodan    Thomas Fahringer Simon Ostermann
                    U.Isbk.          U.Isbk.          U.Isbk.

                  Benchmarking     Benchmarking        Benchmarking

                                 November 12, 2012
                                                                           45
What I’ll Talk About
IaaS Cloud Performance Variability (Q2)

  1. Experimental setup
  2. Experimental results
  3. Implications on real-world workloads




                     November 12, 2012
                                            46
Q2
    Production Cloud Services

•   Production cloud: operate on the market and have active
    customers

•   IaaS/PaaS:                •               PaaS:
    Amazon Web Services (AWS)                 Google App Engine (GAE)
    •   EC2 (Elastic Compute Cloud)            •     Run (Python/Java runtime)
    •   S3 (Simple Storage Service)            •     Datastore (Database) ~ SDB
    •   SQS (Simple Queueing Service)          •     Memcache (Caching)
    •   SDB (Simple Database)                  •     URL Fetch (Web crawling)
    •   FPS (Flexible Payment Service)




                                 November 12, 2012
                                                                                   47
               Iosup, Yigitbasi, Epema. On the Performance Variability of
Q2
    Our Method         [1/3]
    Performance Traces

    • CloudStatus*
        • Real-time values and weekly averages for most of the
          AWS and GAE services


    • Periodic performance probes
        • Sampling rate is under 2 minutes




* www.cloudstatus.com

                                 November 12, 2012
                                                                             48
               Iosup, Yigitbasi, Epema. On the Performance Variability of
Our Method                                      [2/3]                              Q2
  Analysis
1. Find out whether variability is present
   •   Investigate several months whether the performance metric is highly
       variable


1. Find out the characteristics of variability
   •   Basic statistics: the five quartiles (Q0-Q4) including the median (Q2), the
       mean, the standard deviation
   •   Derivative statistic: the IQR (Q3-Q1)
   •   CoV > 1.1 indicate high variability


1. Analyze the performance variability time patterns
   •   Investigate for each performance metric the presence of
       daily/monthly/weekly/yearly time patterns
   •   E.g., for monthly patterns divide the dataset into twelve subsets and for
       each subset compute the statistics and plot for visual inspection
                                   November 12, 2012
                                                                                      49
              Iosup, Yigitbasi, Epema. On the Performance Variability of
Q2
Our Method                              [3/3]
Is Variability Present?
• Validated Assumption: The performance delivered
  by production services is variable.




                         November 12, 2012
                                                                     50
       Iosup, Yigitbasi, Epema. On the Performance Variability of
Q2
AWS Dataset (1/4): EC2                                        Variable
                                                            Performance




•   Deployment Latency [s]: Time it takes to start a small instance, from the
    startup to the time the instance is available
•   Higher IQR and range from week 41 to the end of the year; possible reasons:
     • Increasing EC2 user base
     • Impact on applications using EC2 for auto-scaling


                                  November 12, 2012
                                                                                   51
             Iosup, Yigitbasi, Epema. On the Performance Variability of
Q2
AWS Dataset (2/4): S3                                         Stable
                                                            Performance




•   Get Throughput [bytes/s]: Estimated rate at which an object in a bucket is
    read
•   The last five months of the year exhibit much lower IQR and range
     •   More stable performance for the last five months
     •   Probably due to software/infrastructure upgrades

                                   November 12, 2012
                                                                                  52
             Iosup, Yigitbasi, Epema. On the Performance Variability of
Q2
AWS Dataset (3/4): SQS
                       Variable Performance



                                                                       Stable
                                                                     Performance




•   Average Lag Time [s]: Time it takes for a posted message to become
    available to read. Average over multiple queues.
•   Long periods of stability (low IQR and range)
•   Periods of high performance variability also exist



                                November 12, 2012
                                                                             53
Q2
AWS Dataset (4/4): Summary


•   All services exhibit time patterns in performance
•   EC2: periods of special behavior
•   SDB and S3: daily, monthly and yearly patterns
•   SQS and FPS: periods of special behavior




                           November 12, 2012
                                                                       54
         Iosup, Yigitbasi, Epema. On the Performance Variability of
Experimental Setup (1/2): Simulations
•    Trace based simulations for three applications                        Q2
•    Input
      • GWA traces
      • Number of daily unique users
      • Monthly performance variability




    Application                           Service

    Job Execution                         GAE Run
    Selling Virtual Goods                 AWS FPS
    Game Status Maintenance               AWS SDB/GAE Datastore


                                November 12, 2012
                                                                            59
              Iosup, Yigitbasi, Epema. On the Performance Variability of
Experimental Setup (2/2): Metrics                                       Q2


• Average Response Time and Average Bounded Slowdown
• Cost in millions of consumed CPU hours
• Aggregate Performance Penalty -- APP(t)



   • Pref (Reference Performance): Average of the twelve monthly medians
   • P(t): random value sampled from the distribution corresponding to the
     current month at time t (Performance is like a box of chocolates, you
     never know what you’re gonna get ~ Forrest Gump)
   • max U(t): max number of users over the whole trace
   • U(t): number of users at time t
   • APP—the lower the better



                             November 12, 2012
                                                                             60
          Iosup, Yigitbasi, Epema. On the Performance Variability of
Q2
Game Status Maintenance (1/2):
Scenario
 • Maintenance of game status for a large-scale social
   game such as Farm Town or Mafia Wars which have
   millions of unique users daily

 • AWS SDB and GAE Datastore

 • We assume that the number of database operations
   depends linearly on the number of daily unique users




                           November 12, 2012
                                                                       65
         Iosup, Yigitbasi, Epema. On the Performance Variability of
Q2
Game Status Maintenance (2): Results

 AWS SDB                                  GAE
                                          Datastore




• Big discrepancy between SDB and Datastore services
• Sep’09-Jan’10: APP of Datastore is well below than that of
  SDB due to increasing performance of Datastore
• APP of Datastore ~1 => no performance penalty
• APP of SDB ~1.4 => %40 higher performance penalty than SDB
                             November 12, 2012
                                                                         66
           Iosup, Yigitbasi, Epema. On the Performance Variability of
Agenda

                                          Workloads
1. An Introduction to IaaS Cloud Computing
2. Research Questions or Why We Need Benchmarking?
3.                                        Performance
   A General Approach and Its Main Challenges
4. IaaS Cloud Workloads (Q0)
5. IaaS Cloud Performance (Q1) &          Variability
   Perf. Variability (Q2)
6. Provisioning & Allocation Policies
   for IaaS Clouds (Q3)                   Policies
7. Conclusion

                      November 12, 2012
                                                        67
IaaS Cloud Policies: Our Team



Alexandru Iosup   Dick Epema          Bogdan Ghit          Athanasios Antoniou
    TU Delft        TU Delft            TU Delft                TU Delft

  Provisioning    Provisioning        Provisioning            Provisioning
   Allocation      Allocation          Allocation              Allocation
   Elasticity        Koala               Koala                  Isolation
      Utility                                                     Utility
    Isolation
 Multi-Tenancy




                                 Orna Agmon-Ben Yehuda       David Villegas
                                         Technion               FIU/IBM
                                     Elasticity, Utility    Elasticity, Utility

                                    November 12, 2012
                                                                                  68
What I’ll Talk About
Provisioning and Allocation Policies for IaaS
   Clouds (Q3)

  1. Experimental setup
  2. Experimental results
  3. Ad for Bogdan’s lecture (next)




                      November 12, 2012
                                                69
Q3
Provisioning and Allocation
Policies* Scheduling
* For User-Level

• Provisioning                     • Allocation




• Also looked at combined
                                     The SkyMark Tool for
  Provisioning + Allocation        IaaS Cloud Benchmarking
  policies
                          November 12, 2012
        Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and
                                                                         70

          Allocation Policies for Infrastructure-as-a-Service Clouds,
Q3
Experimental Setup (1)
• Environments
  • DAS4, Florida International University (FIU)
  • Amazon EC2




• Workloads
  • Bottleneck
  • Arrival pattern




                          November 12, 2012
        Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and
                                                                         72

          Allocation Policies for Infrastructure-as-a-Service Clouds,
Q3
Experimental Setup (2)
• Performance Metrics
  • Traditional: Makespan, Job Slowdown
  • Workload Speedup One (SU1)
  • Workload Slowdown Infinite (SUinf)

• Cost Metrics
  • Actual Cost (Ca)
  • Charged Cost (Cc)

• Compound Metrics
  • Cost Efficiency (Ceff)
  • Utility


                             November 12, 2012
        Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and
                                                                         73

          Allocation Policies for Infrastructure-as-a-Service Clouds,
Q3
Performance Metrics




• Makespan very similar
• Very different job slowdown

                          November 12, 2012
        Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and
                                                                         74

          Allocation Policies for Infrastructure-as-a-Service Clouds,
Cost Metrics
Charged Cost (C c )




                 Q: Why is OnDemand worse than Startup?
                             A: VM thrashing




                 Q: Why no OnDemand on Amazon EC2?


                        November 12, 2012
                                                          75
Cost Metrics                                                             Q3
       Actual Cost                                 Charged Cost




• Very different results between actual and charged
   • Cloud charging function an important selection criterion
• All policies better than Startup in actual cost
• Policies much better/worse than Startup in charged cost

                           November 12, 2012
         Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and
                                                                          76

           Allocation Policies for Infrastructure-as-a-Service Clouds,
Compound Metrics (Utilities)
Utility (U )




                               77
Q3
Compound Metrics




• Trade-off Utility-Cost still needs investigation
• Performance or Cost, not both:
  the policies we have studied improve one, but not both

                          November 12, 2012
        Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and
                                                                         78

          Allocation Policies for Infrastructure-as-a-Service Clouds,
Ad: Resizing MapReduce Clusters
•   Constraints:
     • Data is big and difficult to move
     • Resources need to be released fast



•   Approach:                                     MR cluster
     • Grow / shrink at processing layer
     • Resize based on resource utilization
     • Policies for provisioning and allocation



    Stay put for next talk in MTAGS
    2012!
                                                                         79

                                      2011-2012
              Ghit, Epema. Resource Management for Dynamic MapReduce Clusters in
                                                                               79

                 Multicluster Systems. MTAGS 2012.
Agenda

                                          Workloads
1. An Introduction to IaaS Cloud Computing
2. Research Questions or Why We Need Benchmarking?
3.                                        Performance
   A General Approach and Its Main Challenges
4. IaaS Cloud Workloads (Q0)
5. IaaS Cloud Performance (Q1) &          Variability
   Perf. Variability (Q2)
6. Provisioning & Allocation Policies
   for IaaS Clouds (Q3)                   Policies
7. Conclusion

                      November 12, 2012
                                                        80
Agenda


1.   An Introduction to IaaS Cloud Computing
2.   Research Questions or Why We Need Benchmarking?
3.   A General Approach and Its Main Challenges
4.   IaaS Cloud Workloads (Q0)
5.   IaaS Cloud Performance (Q1) and Perf. Variability (Q2)
6.   Provisioning and Allocation Policies for IaaS Clouds (Q3)
7.   Conclusion




                         November 12, 2012
                                                             81
Conclusion Take-Home Message

• IaaS cloud benchmarking: approach + 10 challenges


• Put 10-15% project effort in benchmarking =
  understanding how IaaS clouds really work
  • Q0: Statistical workload models

  • Q1/Q2: Performance/variability

  • Q3: Provisioning and allocation


• Tools and Workloads
  • SkyMark                                      http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/

  • MapReduce                November 12, 2012
                                                                                                           82
Thank you for your attention!
Questions? Suggestions?
Observations?
 More Info:
 - http://www.st.ewi.tudelft.nl/~iosup/research.html
 - http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html
 - http://www.pds.ewi.tudelft.nl/

                                                  Do not hesitate to
                                                   contact me…
 Alexandru Iosup

 A.Iosup@tudelft.nl
 http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”)
 Parallel and Distributed Systems Group
 Delft University of Technology
                              November 12, 2012
                                                                       83
WARNING: Ads




           November 12, 2012
                               84
The Parallel and Distributed Systems
 Group at TU Delft



VENI                                    VENI                                        VENI
 Alexandru Iosup        Dick Epema          Ana Lucia     Henk Sips     Johan Pouwelse
                                           Varbanescu
   Grids/Clouds        Grids/Clouds                      HPC systems       P2P systems
   P2P systems          P2P systems       HPC systems     Multi-cores      File-sharing
     Big Data        Video-on-demand       Multi-cores   P2P systems    Video-on-demand
  Online gaming          e-Science          Big Data
                                           e-Science

 Home page
       •   www.pds.ewi.tudelft.nl
 Publications
       • see PDS publication database at publications.st.ewi.tudelft.nl
                              August 31, 2011
                                                                                          85
(TU) Delft – the Netherlands – Europe




founded 13th century
pop: 100,000




 pop.: 100,000         pop:16.5 M

founded 1842
pop: 13,000
www.pds.ewi.tudelft.nl/ccgrid2013


Dick Epema, General Chair                Delft, the Netherlands
Delft University of Technology Delft     May 13-16, 2013

Thomas Fahringer, PC Chair               Paper submission deadline:
University of Innsbruck
                                         November 22, 2012



                                       Nov 2012
SPEC Research Group (RG)
The Research Group of the
Standard Performance Evaluation Corporation

            Mission Statement
            Provide a platform for collaborative research
             efforts in the areas of computer benchmarking
             and quantitative system analysis
            Provide metrics, tools and benchmarks for
             evaluating early prototypes and research
             results as well as full-blown implementations
            Foster interactions and collaborations btw.
             industry and academia

          Find more information on: http://research.spec.org
Current Members (Mar 2012)




     Find more information on: http://research.spec.org
If you have an interest in new performance
  methods, you should join the SPEC RG
Find   a new venue to discuss your work
  Exchange   with experts on how the performance of systems can be
   measured and engineered
  Find out about novel methods and current trends in performance
   engineering
  Get in contact with leading organizations in the field of performance
   evaluation
Find a new group of potential employees
Join a SPEC standardization process
Performance in a broad sense:
  Classical  performance metrics: Response time, throughput,
   scalability, resource/cost/energy, efficiency, elasticity
  Plus dependability in general: Availability, reliability, and security


                Find more information on: http://research.spec.org

More Related Content

Viewers also liked

DevOps Perspectives Edition 3
DevOps Perspectives Edition 3DevOps Perspectives Edition 3
DevOps Perspectives Edition 3Paul Speers
 
Cloud Builders Meetup - Containers @ Autodesk
Cloud Builders Meetup - Containers @ AutodeskCloud Builders Meetup - Containers @ Autodesk
Cloud Builders Meetup - Containers @ AutodeskStephen Voorhees
 
Managing IaaS Resources
Managing IaaS ResourcesManaging IaaS Resources
Managing IaaS ResourcesOmar Nawaz
 
Zimory White Paper: Challenges Implementing an IaaS Cloud Exchange
Zimory White Paper: Challenges Implementing an IaaS Cloud ExchangeZimory White Paper: Challenges Implementing an IaaS Cloud Exchange
Zimory White Paper: Challenges Implementing an IaaS Cloud ExchangeZimory
 
7 things to consider when choosing your IaaS provider for ISV/SaaS
7 things to consider when choosing your IaaS provider for ISV/SaaS7 things to consider when choosing your IaaS provider for ISV/SaaS
7 things to consider when choosing your IaaS provider for ISV/SaaSFrederik Denkens
 
AWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaS
AWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaSAWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaS
AWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaSAmazon Web Services
 
Scalable Eventing Over Apache Mesos
Scalable Eventing Over Apache MesosScalable Eventing Over Apache Mesos
Scalable Eventing Over Apache MesosOlivier Paugam
 
(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014
(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014
(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014Amazon Web Services
 
IaaS vs. PaaS: Windows Azure Compute Solutions
IaaS vs. PaaS: Windows Azure Compute SolutionsIaaS vs. PaaS: Windows Azure Compute Solutions
IaaS vs. PaaS: Windows Azure Compute SolutionsIdo Flatow
 
Beyond PaaS v.s IaaS: How to Manage Both
Beyond PaaS v.s IaaS: How to Manage BothBeyond PaaS v.s IaaS: How to Manage Both
Beyond PaaS v.s IaaS: How to Manage BothRightScale
 
State of the Cloud DevOps Trends
State of the Cloud DevOps TrendsState of the Cloud DevOps Trends
State of the Cloud DevOps TrendsRightScale
 
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudKoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudTobias Koprowski
 
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAmazon Web Services
 

Viewers also liked (14)

DevOps Perspectives Edition 3
DevOps Perspectives Edition 3DevOps Perspectives Edition 3
DevOps Perspectives Edition 3
 
Cloud Builders Meetup - Containers @ Autodesk
Cloud Builders Meetup - Containers @ AutodeskCloud Builders Meetup - Containers @ Autodesk
Cloud Builders Meetup - Containers @ Autodesk
 
Managing IaaS Resources
Managing IaaS ResourcesManaging IaaS Resources
Managing IaaS Resources
 
Zimory White Paper: Challenges Implementing an IaaS Cloud Exchange
Zimory White Paper: Challenges Implementing an IaaS Cloud ExchangeZimory White Paper: Challenges Implementing an IaaS Cloud Exchange
Zimory White Paper: Challenges Implementing an IaaS Cloud Exchange
 
7 things to consider when choosing your IaaS provider for ISV/SaaS
7 things to consider when choosing your IaaS provider for ISV/SaaS7 things to consider when choosing your IaaS provider for ISV/SaaS
7 things to consider when choosing your IaaS provider for ISV/SaaS
 
AWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaS
AWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaSAWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaS
AWS Enterprise Summit London 2015 | Gartner Keynote - The Future of Cloud IaaS
 
Scalable Eventing Over Apache Mesos
Scalable Eventing Over Apache MesosScalable Eventing Over Apache Mesos
Scalable Eventing Over Apache Mesos
 
(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014
(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014
(ENT311) Public IaaS Provider Bake-off: AWS vs Azure | AWS re:Invent 2014
 
IaaS vs. PaaS: Windows Azure Compute Solutions
IaaS vs. PaaS: Windows Azure Compute SolutionsIaaS vs. PaaS: Windows Azure Compute Solutions
IaaS vs. PaaS: Windows Azure Compute Solutions
 
Beyond PaaS v.s IaaS: How to Manage Both
Beyond PaaS v.s IaaS: How to Manage BothBeyond PaaS v.s IaaS: How to Manage Both
Beyond PaaS v.s IaaS: How to Manage Both
 
State of the Cloud DevOps Trends
State of the Cloud DevOps TrendsState of the Cloud DevOps Trends
State of the Cloud DevOps Trends
 
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudKoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
 
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
 
IaaS, SaaS, PasS : Cloud Computing
IaaS, SaaS, PasS : Cloud ComputingIaaS, SaaS, PasS : Cloud Computing
IaaS, SaaS, PasS : Cloud Computing
 

Similar to IaaS Cloud Benchmarking: Approaches, Challenges, and Experience

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Jpl pervasive cloud_now_and_future_aws_sep_2010_v2
Jpl pervasive cloud_now_and_future_aws_sep_2010_v2Jpl pervasive cloud_now_and_future_aws_sep_2010_v2
Jpl pervasive cloud_now_and_future_aws_sep_2010_v2ReadMaloney
 
AWS Customer Presentation - NASA JPL Pervasive Cloud Now and Future
AWS Customer Presentation - NASA JPL Pervasive Cloud Now and FutureAWS Customer Presentation - NASA JPL Pervasive Cloud Now and Future
AWS Customer Presentation - NASA JPL Pervasive Cloud Now and FutureAmazon Web Services
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchMatthew Lease
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storagejccastrejon
 
Dbdes mnn cloud_oct2012
Dbdes mnn cloud_oct2012Dbdes mnn cloud_oct2012
Dbdes mnn cloud_oct2012Steven Backman
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisJensNimis
 
Shams.khawaja
Shams.khawajaShams.khawaja
Shams.khawajaNASAPMC
 
DDS: The data-centric future beyond message-based integration
DDS: The data-centric future beyond message-based integrationDDS: The data-centric future beyond message-based integration
DDS: The data-centric future beyond message-based integrationGerardo Pardo-Castellote
 
GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.Alexandru Iosup
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
A comparative study between cloud computing and fog
A comparative study between cloud computing and fog A comparative study between cloud computing and fog
A comparative study between cloud computing and fog Manash Kumar Mondal
 
S.P.A.C.E. Exploration for Software Engineering
 S.P.A.C.E. Exploration for Software Engineering S.P.A.C.E. Exploration for Software Engineering
S.P.A.C.E. Exploration for Software EngineeringCS, NcState
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationCloudera, Inc.
 
Experimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsExperimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsFrederic Desprez
 
Above the Clouds: A View From Academia
Above the Clouds: A View From AcademiaAbove the Clouds: A View From Academia
Above the Clouds: A View From AcademiaEduserv
 
Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRDavid Carmel
 
STPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has ArrivedSTPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has ArrivedSOASTA
 
Gluecon miller horizon
Gluecon miller horizonGluecon miller horizon
Gluecon miller horizonMike Miller
 

Similar to IaaS Cloud Benchmarking: Approaches, Challenges, and Experience (20)

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Jpl pervasive cloud_now_and_future_aws_sep_2010_v2
Jpl pervasive cloud_now_and_future_aws_sep_2010_v2Jpl pervasive cloud_now_and_future_aws_sep_2010_v2
Jpl pervasive cloud_now_and_future_aws_sep_2010_v2
 
AWS Customer Presentation - NASA JPL Pervasive Cloud Now and Future
AWS Customer Presentation - NASA JPL Pervasive Cloud Now and FutureAWS Customer Presentation - NASA JPL Pervasive Cloud Now and Future
AWS Customer Presentation - NASA JPL Pervasive Cloud Now and Future
 
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic SearchCrowdsourcing for Search Evaluation and Social-Algorithmic Search
Crowdsourcing for Search Evaluation and Social-Algorithmic Search
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storage
 
Dbdes mnn cloud_oct2012
Dbdes mnn cloud_oct2012Dbdes mnn cloud_oct2012
Dbdes mnn cloud_oct2012
 
Cloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens NimisCloud Computing Tutorial - Jens Nimis
Cloud Computing Tutorial - Jens Nimis
 
Cloud computingresearch
Cloud computingresearchCloud computingresearch
Cloud computingresearch
 
Shams.khawaja
Shams.khawajaShams.khawaja
Shams.khawaja
 
DDS: The data-centric future beyond message-based integration
DDS: The data-centric future beyond message-based integrationDDS: The data-centric future beyond message-based integration
DDS: The data-centric future beyond message-based integration
 
GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
A comparative study between cloud computing and fog
A comparative study between cloud computing and fog A comparative study between cloud computing and fog
A comparative study between cloud computing and fog
 
S.P.A.C.E. Exploration for Software Engineering
 S.P.A.C.E. Exploration for Software Engineering S.P.A.C.E. Exploration for Software Engineering
S.P.A.C.E. Exploration for Software Engineering
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
Experimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and InstrumentsExperimental Computer Science - Approaches and Instruments
Experimental Computer Science - Approaches and Instruments
 
Above the Clouds: A View From Academia
Above the Clouds: A View From AcademiaAbove the Clouds: A View From Academia
Above the Clouds: A View From Academia
 
Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IR
 
STPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has ArrivedSTPCon fall 2012: The Testing Renaissance Has Arrived
STPCon fall 2012: The Testing Renaissance Has Arrived
 
Gluecon miller horizon
Gluecon miller horizonGluecon miller horizon
Gluecon miller horizon
 

IaaS Cloud Benchmarking: Approaches, Challenges, and Experience

  • 1. IaaS Cloud Benchmarking: Approaches, Challenges, and Experience Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips, Dick Epema, Alexandru Iosup Collaborators Ion Stoica and the Mesos team (UC Berkeley), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), Derrick Kondo, Emmanuel Jeannot (INRIA), Assaf Schuster, Mark Silberstein, Orna Ben-Yehuda (Technion), ... November 12, 2012 1 MTAGS, SC’12, Salt Lake City, UT, USA
  • 4. What is Cloud Computing? 3. A Useful IT Service “Use only when you want! Pay only for what you use!” November 12, 2012 4
  • 5. IaaS Cloud Computing Many tasks VENI – @larGe: Massivizing Online Games using Cloud Computing
  • 6. Which Applications Need Cloud Computing? A Simplistic View… Social Tsunami Epidemic High Web Gaming Exp. Prediction Simulation Server Research Space Survey OK, so we’re done here? Comet Detected Demand SW Social Variability Analytics Dev/Test Networking Not so fast! Online Pharma Taxes, Office Gaming Research Sky Survey @Home Tools HP Engineering Low Low Demand Volume High November 12, 2012 After an idea by Helmut Krcmar 6
  • 7. What I Learned from Grids • Average job size is 1 (that is, there are no [!] tightly- coupled, only conveniently parallel jobs ) From Parallel to Many-Task Computing A. Iosup, C. Dumitrescu, D.H.J. Epema, H. Li, L. Wolters, How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications, Grid 2006. A. Iosup and D.H.J. Epema, Grid Computing Workloads, IEEE Internet Computing 15(2): 19-26 (2011) November 12, 2012 7
  • 8. What I Learned from Grids Server • 99.99999% reliable Small Cluster • 99.999% reliable Grids are unreliable infrastructure Production • 5x decrease in failure rate Cluster after first year [Schroeder and Gibson, CERN LCG jobs DSN‘06] 74.71% successful 25.29% unsuccessful DAS-2 • >10% jobs fail [Iosup et al., CCGrid’06] TeraGrid • 20-45% failures [Khalili et al., Grid’06] Grid-level availability: 70% Grid3 • 27% failures, 5-10 retries [Dumitrescu et al., GCC’05] Source: dboard-gr.cern.ch, May’07. A. Iosup, M. Jan, O. Sonmez, and 2012 November 12, D.H.J. Epema, On the Dynamic 9 Resource Availability in Grids, Grid 2007, Sep 2007.
  • 9. What I Learned From Grids, Applied to IaaS Clouds or We just don’t know! http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ Tropical Cyclone Nargis (NASA, ISSS, 04/29/08) • “The path to abundance” • “The killer cyclone” • On-demand capacity • Performance • Cheap for short-term tasks for scientific applications • Great for web apps (EIP, web (compute- or data-intensive) crawl, DB ops, I/O) • Failures, Many-tasks, etc. November 12, 2012 10
  • 10. This Presentation: Research Questions Q0: What are the workloads of IaaS clouds? Q1: What is the performance of production IaaS cloud services? Q2: How variable is the performance of widely used production cloud services? Q3: How do provisioning and allocation policies affect the performance of IaaS cloud services? Other questions studied at TU Delft: How does virtualization affect the performance of IaaS cloud services? What is a good model for cloud workloads? Etc. But … This is benchmarking = process of quantifying the performance and other non-functional properties November 12, 2012 of the system 11
  • 11. Why IaaS Cloud Benchmarking? • Establish and share best-practices in answering important questions about IaaS clouds • Use in procurement • Use in system design • Use in system tuning and operation • Use in performance management • Use in training November 12, 2012 12
  • 12. Agenda 1. An Introduction to IaaS Cloud Computing 2. Research Questions or Why We Need Benchmarking? 3. A General Approach and Its Main Challenges 4. IaaS Cloud Workloads (Q0) 5. IaaS Cloud Performance (Q1) and Perf. Variability (Q2) 6. Provisioning and Allocation Policies for IaaS Clouds (Q3) 7. Conclusion November 12, 2012 13
  • 13. A General Approach for IaaS Cloud Benchmarking November 12, 2012 14
  • 14. Approach: Real Traces, Models, Real Tools, Real-World Experimentation (+ Simulation) • Formalize real-world scenarios • Exchange real traces • Model relevant operational elements • Scalable tools for meaningful and repeatable experiments • Comparative studies • Simulation only when needed (long-term scenarios, etc.) Rule of thumb: Put 10-15% project effort into benchmarking November 12, 2012 15
  • 15. 10 Main Challenges in 4 Categories* * List not exhaustive • Methodological • Workload-related 1. Experiment compression 1. Statistical workload models 2. Beyond black-box testing 2. Benchmarking performance through testing short-term isolation under various multi- dynamics and long-term tenancy models evolution 3. Impact of middleware • System-Related • Metric-Related 1. Reliability, availability, and 1. Beyond traditional system-related properties performance: variability, 2. Massive-scale, multi-site elasticity, etc. benchmarking 2. Closer integration with cost 3. Performance isolation models Iosup, Prodan, and Epema, IaaS Cloud Read our article November 12, 2012 Benchmarking: Approaches, Challenges, and 16 Experience, MTAGS 2012. (invited paper)
  • 16. Agenda Workloads 1. An Introduction to IaaS Cloud Computing 2. Research Questions or Why We Need Benchmarking? 3. Performance A General Approach and Its Main Challenges 4. IaaS Cloud Workloads (Q0) 5. IaaS Cloud Performance (Q1) & Variability Perf. Variability (Q2) 6. Provisioning & Allocation Policies for IaaS Clouds (Q3) Policies 7. Conclusion November 12, 2012 17
  • 17. IaaS Cloud Workloads: Our Team Alexandru Iosup Dick Epema Mathieu Jan Ozan Sonmez Thomas de Ruiter TU Delft TU Delft TU Delft/INRIA TU Delft TU Delft BoTs BoTs BoTs BoTs MapReduce Workflows Grids Statistical modeling Big Data Big Data Statistical modeling Statistical modeling Radu Prodan Thomas Fahringer Simon Ostermann U.Isbk. U.Isbk. U.Isbk. Workflows Workflows Workflows November 12, 2012 18
  • 18. What I’ll Talk About IaaS Cloud Workloads (Q0) 1. BoTs 2. Workflows 3. Big Data Programming Models 4. MapReduce workloads November 12, 2012 19
  • 19. What is a Bag of Tasks (BoT)? A System View BoT = set of jobs sent by a user… …that is submitted at most Δs after the first job Time [units] • Why Bag of Tasks? From the perspective of the user, jobs in set are just tasks of a larger job • A single useful result from the complete BoT • Result can be combination of all tasks, or a selection of the results of most or even a single task Iosup et al., The Characteristics and 2012-2013 Performance of Groups of Jobs in Grids, 20 Euro-Par, LNCS, vol.4641, pp. 382-393, 2007. Q0
  • 20. Applications of the BoT Programming Model • Parameter sweeps • Comprehensive, possibly exhaustive investigation of a model • Very useful in engineering and simulation-based science • Monte Carlo simulations • Simulation with random elements: fixed time yet limited inaccuracy • Very useful in engineering and simulation-based science • Many other types of batch processing • Periodic computation, Cycle scavenging • Very useful to automate operations and reduce waste 2012-2013 21 Q0
  • 21. BoTs Are the Dominant Programming Model for Grid Computing (Many Tasks) 22 Iosup and Epema: Grid Computing Workloads. IEEE Internet Computing 15(2): 19-26 (2011) Q0
  • 22. What is a Wokflow? WF = set of jobs with precedence (think Direct Acyclic Graph) 2012-2013 23 Q0
  • 23. Applications of the Workflow Programming Model • Complex applications • Complex filtering of data • Complex analysis of instrument measurements • Applications created by non-CS scientists* • Workflows have a natural correspondence in the real-world, as descriptions of a scientific procedure • Visual model of a graph sometimes easier to program • Precursor of the MapReduce Programming Model (next slides) 2012-2013 *Adapted from: Carole Goble and David de Roure, Chapter in “The Fourth 24 Paradigm”, http://research.microsoft.com/en-us/collaboration/fourthparadigm/ Q0
  • 24. Workflows Exist in Grids, but Did No Evidence of a Dominant Programming Model • Traces • Selected Findings • Loose coupling • Graph with 3-4 levels • Average WF size is 30/44 jobs • 75%+ WFs are sized 40 jobs or less, 95% are sized 200 jobs or less Ostermann et al., On the Characteristics of Grid 2012-2013 Workflows, CoreGRID Integrated Research in Grid 25 Computing (CGIW), 2008. Q0
  • 25. What is “Big Data”? • Very large, distributed aggregations of loosely structured data, often incomplete and inaccessible • Easily exceeds the processing capacity of conventional database systems • Principle of Big Data: “When you can, keep everything!” • Too big, too fast, and doesn’t comply with the traditional database architectures 2011-2012 26 Q0
  • 26. The Three “V”s of Big Data • Volume • More data vs. better models • Data grows exponentially • Analysis in near-real time to extract value • Scalable storage and distributed queries • Velocity • Speed of the feedback loop • Gain competitive advantage: fast recommendations • Identify fraud, predict customer churn faster • Variety • The data can become messy: text, video, audio, etc. • Difficult to integrate into applications Adapted from: Doug Laney, “3D data management”, META Group/Gartner report, 2011-2012 Feb 2001. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data- 27 Management-Controlling-Data-Volume-Velocity-and-Variety.pdf Q0
  • 27. Ecosystems of Big-Data Programming Models High-Level Language Flume BigQuery SQL Meteor JAQL Hive Pig Sawzall Scope DryadLINQ AQL Programming Model PACT MapReduce Model Prege Dataflow Algebrix l Execution Engine Flume Dremel Tera Azure Nephele Haloop Hadoop/ Giraph MPI/ Dryad Hyracks Engine Service Data Engine YARN Erlang Tree Engine Storage Engine S3 GFS Tera Azure HDFS Voldemort L CosmosF Asterix Data Data F S B-tree Store Store * Plus Zookeeper, CDN, etc. S 2012-2013 Adapted from: Dagstuhl Seminar on Information Management in the Cloud, 28 http://www.dagstuhl.de/program/calendar/partlist/?semnr=11321&SUOG Q0
  • 28. Our Statistical MapReduce Models • Real traces • Yahoo • Google • 2 x Social Network Provider November 12, 2012 de Ruiter and Iosup. A workload model for MapReduce. 29 MSc thesis at TU Delft. Jun 2012. Available online via Q0
  • 29. Agenda Workloads 1. An Introduction to IaaS Cloud Computing 2. Research Questions or Why We Need Benchmarking? 3. Performance A General Approach and Its Main Challenges 4. IaaS Cloud Workloads (Q0) 5. IaaS Cloud Performance (Q1) & Variability Perf. Variability (Q2) 6. Provisioning & Allocation Policies for IaaS Clouds (Q3) Policies 7. Conclusion November 12, 2012 30
  • 30. IaaS Cloud Performance: Our Team Alexandru Iosup Dick Epema Nezih Yigitbasi Athanasios Antoniou TU Delft TU Delft TU Delft TU Delft Performance Performance Performance Performance Variability IaaS clouds Variability Isolation Isolation Multi-tenancy Benchmarking Radu Prodan Thomas Fahringer Simon Ostermann U.Isbk. U.Isbk. U.Isbk. Benchmarking Benchmarking Benchmarking November 12, 2012 31
  • 31. What I’ll Talk About IaaS Cloud Performance (Q1) 1. Previous work 2. Experimental setup 3. Experimental results 4. Implications on real-world workloads November 12, 2012 32
  • 32. Some Previous Work (>50 important references across our studies) Virtualization Overhead • Loss below 5% for computation [Barham03] [Clark04] • Loss below 15% for networking [Barham03] [Menon05] • Loss below 30% for parallel I/O [Vetter08] • Negligible for compute-intensive HPC kernels [You06] [Panda06] Cloud Performance Evaluation • Performance and cost of executing a sci. workflows [Dee08] • Study of Amazon S3 [Palankar08] • Amazon EC2 for the NPB benchmark suite [Walker08] or selected HPC benchmarks [Hill08] • CloudCmp [Li10] • Kosmann et al. November 12, 2012 33
  • 33. Q1 Production IaaS Cloud Services • Production IaaS cloud: lease resources (infrastructure) to users, operate on the market and have active customers November 12, 2012 34 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 34. Q1 Our Method • Based on general performance technique: model performance of individual components; system performance is performance of workload + model [Saavedra and Smith, ACM TOCS’96] • Adapt to clouds: 1. Cloud-specific elements: resource provisioning and allocation 2. Benchmarks for single- and multi-machine jobs 3. Benchmark CPU, memory, I/O, etc.: November 12, 2012 35 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 35. Single Resource Provisioning/Release Q1 • Time depends on instance type • Boot time non-negligible November 12, 2012 36 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 36. Multi- Resource Provisioning/Release Q1 • Time for multi- resource increases with number of resources November 12, 2012 37 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 37. Q1 CPU Performance of Single Resource • ECU definition: “a 1.1 GHz 2007 Opteron” ~ 4 flops per cycle at full pipeline, which means at peak performance one ECU equals 4.4 gigaflops per second (GFLOPS) • Real performance 0.6..0.1 GFLOPS = ~1/4..1/7 theoretical peak November 12, 2012 38 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 38. Q1 HPLinpack Performance (Parallel) • Low efficiency for parallel compute-intensive applications • Low performance vs cluster computing and supercomputing November 12, 2012 39 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 39. Q1 Performance Stability (Variability) Q2 • High performance variability for the best-performing instances November 12, 2012 Iosup et al., Performance Analysis of Cloud Computing Services for Many 40 Tasks Scientific Computing, (IEEE TPDS 2011).
  • 40. Q1 Summary • Much lower performance than theoretical peak • Especially CPU (GFLOPS) • Performance variability • Compared results with some of the commercial alternatives (see report) November 12, 2012 41
  • 41. Q1 Implications: Simulations • Input: real-world workload traces, grids and PPEs • Running in • Original env. • Cloud with source-like perf. • Cloud with measured perf. • Metrics • WT, ReT, BSD(10s) • Cost [CPU-h] November 12, 2012 42 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 42. Q1 Implications: Results • Cost: Clouds, real >> Clouds, source • Performance: • AReT: Clouds, real >> Source env. (bad) • AWT,ABSD: Clouds, real << Source env. (good) November 12, 2012 43 Iosup et al., Performance Analysis of Cloud Computing Services for Many
  • 43. Agenda Workloads 1. An Introduction to IaaS Cloud Computing 2. Research Questions or Why We Need Benchmarking? 3. Performance A General Approach and Its Main Challenges 4. IaaS Cloud Workloads (Q0) 5. IaaS Cloud Performance (Q1) & Variability Perf. Variability (Q2) 6. Provisioning & Allocation Policies for IaaS Clouds (Q3) Policies 7. Conclusion November 12, 2012 44
  • 44. IaaS Cloud Performance: Our Team Alexandru Iosup Dick Epema Nezih Yigitbasi Athanasios Antoniou TU Delft TU Delft TU Delft TU Delft Performance Performance Performance Performance Variability IaaS clouds Variability Isolation Isolation Multi-tenancy Benchmarking Radu Prodan Thomas Fahringer Simon Ostermann U.Isbk. U.Isbk. U.Isbk. Benchmarking Benchmarking Benchmarking November 12, 2012 45
  • 45. What I’ll Talk About IaaS Cloud Performance Variability (Q2) 1. Experimental setup 2. Experimental results 3. Implications on real-world workloads November 12, 2012 46
  • 46. Q2 Production Cloud Services • Production cloud: operate on the market and have active customers • IaaS/PaaS: • PaaS: Amazon Web Services (AWS) Google App Engine (GAE) • EC2 (Elastic Compute Cloud) • Run (Python/Java runtime) • S3 (Simple Storage Service) • Datastore (Database) ~ SDB • SQS (Simple Queueing Service) • Memcache (Caching) • SDB (Simple Database) • URL Fetch (Web crawling) • FPS (Flexible Payment Service) November 12, 2012 47 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 47. Q2 Our Method [1/3] Performance Traces • CloudStatus* • Real-time values and weekly averages for most of the AWS and GAE services • Periodic performance probes • Sampling rate is under 2 minutes * www.cloudstatus.com November 12, 2012 48 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 48. Our Method [2/3] Q2 Analysis 1. Find out whether variability is present • Investigate several months whether the performance metric is highly variable 1. Find out the characteristics of variability • Basic statistics: the five quartiles (Q0-Q4) including the median (Q2), the mean, the standard deviation • Derivative statistic: the IQR (Q3-Q1) • CoV > 1.1 indicate high variability 1. Analyze the performance variability time patterns • Investigate for each performance metric the presence of daily/monthly/weekly/yearly time patterns • E.g., for monthly patterns divide the dataset into twelve subsets and for each subset compute the statistics and plot for visual inspection November 12, 2012 49 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 49. Q2 Our Method [3/3] Is Variability Present? • Validated Assumption: The performance delivered by production services is variable. November 12, 2012 50 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 50. Q2 AWS Dataset (1/4): EC2 Variable Performance • Deployment Latency [s]: Time it takes to start a small instance, from the startup to the time the instance is available • Higher IQR and range from week 41 to the end of the year; possible reasons: • Increasing EC2 user base • Impact on applications using EC2 for auto-scaling November 12, 2012 51 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 51. Q2 AWS Dataset (2/4): S3 Stable Performance • Get Throughput [bytes/s]: Estimated rate at which an object in a bucket is read • The last five months of the year exhibit much lower IQR and range • More stable performance for the last five months • Probably due to software/infrastructure upgrades November 12, 2012 52 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 52. Q2 AWS Dataset (3/4): SQS Variable Performance Stable Performance • Average Lag Time [s]: Time it takes for a posted message to become available to read. Average over multiple queues. • Long periods of stability (low IQR and range) • Periods of high performance variability also exist November 12, 2012 53
  • 53. Q2 AWS Dataset (4/4): Summary • All services exhibit time patterns in performance • EC2: periods of special behavior • SDB and S3: daily, monthly and yearly patterns • SQS and FPS: periods of special behavior November 12, 2012 54 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 54. Experimental Setup (1/2): Simulations • Trace based simulations for three applications Q2 • Input • GWA traces • Number of daily unique users • Monthly performance variability Application Service Job Execution GAE Run Selling Virtual Goods AWS FPS Game Status Maintenance AWS SDB/GAE Datastore November 12, 2012 59 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 55. Experimental Setup (2/2): Metrics Q2 • Average Response Time and Average Bounded Slowdown • Cost in millions of consumed CPU hours • Aggregate Performance Penalty -- APP(t) • Pref (Reference Performance): Average of the twelve monthly medians • P(t): random value sampled from the distribution corresponding to the current month at time t (Performance is like a box of chocolates, you never know what you’re gonna get ~ Forrest Gump) • max U(t): max number of users over the whole trace • U(t): number of users at time t • APP—the lower the better November 12, 2012 60 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 56. Q2 Game Status Maintenance (1/2): Scenario • Maintenance of game status for a large-scale social game such as Farm Town or Mafia Wars which have millions of unique users daily • AWS SDB and GAE Datastore • We assume that the number of database operations depends linearly on the number of daily unique users November 12, 2012 65 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 57. Q2 Game Status Maintenance (2): Results AWS SDB GAE Datastore • Big discrepancy between SDB and Datastore services • Sep’09-Jan’10: APP of Datastore is well below than that of SDB due to increasing performance of Datastore • APP of Datastore ~1 => no performance penalty • APP of SDB ~1.4 => %40 higher performance penalty than SDB November 12, 2012 66 Iosup, Yigitbasi, Epema. On the Performance Variability of
  • 58. Agenda Workloads 1. An Introduction to IaaS Cloud Computing 2. Research Questions or Why We Need Benchmarking? 3. Performance A General Approach and Its Main Challenges 4. IaaS Cloud Workloads (Q0) 5. IaaS Cloud Performance (Q1) & Variability Perf. Variability (Q2) 6. Provisioning & Allocation Policies for IaaS Clouds (Q3) Policies 7. Conclusion November 12, 2012 67
  • 59. IaaS Cloud Policies: Our Team Alexandru Iosup Dick Epema Bogdan Ghit Athanasios Antoniou TU Delft TU Delft TU Delft TU Delft Provisioning Provisioning Provisioning Provisioning Allocation Allocation Allocation Allocation Elasticity Koala Koala Isolation Utility Utility Isolation Multi-Tenancy Orna Agmon-Ben Yehuda David Villegas Technion FIU/IBM Elasticity, Utility Elasticity, Utility November 12, 2012 68
  • 60. What I’ll Talk About Provisioning and Allocation Policies for IaaS Clouds (Q3) 1. Experimental setup 2. Experimental results 3. Ad for Bogdan’s lecture (next) November 12, 2012 69
  • 61. Q3 Provisioning and Allocation Policies* Scheduling * For User-Level • Provisioning • Allocation • Also looked at combined The SkyMark Tool for Provisioning + Allocation IaaS Cloud Benchmarking policies November 12, 2012 Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and 70 Allocation Policies for Infrastructure-as-a-Service Clouds,
  • 62. Q3 Experimental Setup (1) • Environments • DAS4, Florida International University (FIU) • Amazon EC2 • Workloads • Bottleneck • Arrival pattern November 12, 2012 Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and 72 Allocation Policies for Infrastructure-as-a-Service Clouds,
  • 63. Q3 Experimental Setup (2) • Performance Metrics • Traditional: Makespan, Job Slowdown • Workload Speedup One (SU1) • Workload Slowdown Infinite (SUinf) • Cost Metrics • Actual Cost (Ca) • Charged Cost (Cc) • Compound Metrics • Cost Efficiency (Ceff) • Utility November 12, 2012 Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and 73 Allocation Policies for Infrastructure-as-a-Service Clouds,
  • 64. Q3 Performance Metrics • Makespan very similar • Very different job slowdown November 12, 2012 Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and 74 Allocation Policies for Infrastructure-as-a-Service Clouds,
  • 65. Cost Metrics Charged Cost (C c ) Q: Why is OnDemand worse than Startup? A: VM thrashing Q: Why no OnDemand on Amazon EC2? November 12, 2012 75
  • 66. Cost Metrics Q3 Actual Cost Charged Cost • Very different results between actual and charged • Cloud charging function an important selection criterion • All policies better than Startup in actual cost • Policies much better/worse than Startup in charged cost November 12, 2012 Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and 76 Allocation Policies for Infrastructure-as-a-Service Clouds,
  • 68. Q3 Compound Metrics • Trade-off Utility-Cost still needs investigation • Performance or Cost, not both: the policies we have studied improve one, but not both November 12, 2012 Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and 78 Allocation Policies for Infrastructure-as-a-Service Clouds,
  • 69. Ad: Resizing MapReduce Clusters • Constraints: • Data is big and difficult to move • Resources need to be released fast • Approach: MR cluster • Grow / shrink at processing layer • Resize based on resource utilization • Policies for provisioning and allocation Stay put for next talk in MTAGS 2012! 79 2011-2012 Ghit, Epema. Resource Management for Dynamic MapReduce Clusters in 79 Multicluster Systems. MTAGS 2012.
  • 70. Agenda Workloads 1. An Introduction to IaaS Cloud Computing 2. Research Questions or Why We Need Benchmarking? 3. Performance A General Approach and Its Main Challenges 4. IaaS Cloud Workloads (Q0) 5. IaaS Cloud Performance (Q1) & Variability Perf. Variability (Q2) 6. Provisioning & Allocation Policies for IaaS Clouds (Q3) Policies 7. Conclusion November 12, 2012 80
  • 71. Agenda 1. An Introduction to IaaS Cloud Computing 2. Research Questions or Why We Need Benchmarking? 3. A General Approach and Its Main Challenges 4. IaaS Cloud Workloads (Q0) 5. IaaS Cloud Performance (Q1) and Perf. Variability (Q2) 6. Provisioning and Allocation Policies for IaaS Clouds (Q3) 7. Conclusion November 12, 2012 81
  • 72. Conclusion Take-Home Message • IaaS cloud benchmarking: approach + 10 challenges • Put 10-15% project effort in benchmarking = understanding how IaaS clouds really work • Q0: Statistical workload models • Q1/Q2: Performance/variability • Q3: Provisioning and allocation • Tools and Workloads • SkyMark http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ • MapReduce November 12, 2012 82
  • 73. Thank you for your attention! Questions? Suggestions? Observations? More Info: - http://www.st.ewi.tudelft.nl/~iosup/research.html - http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html - http://www.pds.ewi.tudelft.nl/ Do not hesitate to contact me… Alexandru Iosup A.Iosup@tudelft.nl http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”) Parallel and Distributed Systems Group Delft University of Technology November 12, 2012 83
  • 74. WARNING: Ads November 12, 2012 84
  • 75. The Parallel and Distributed Systems Group at TU Delft VENI VENI VENI Alexandru Iosup Dick Epema Ana Lucia Henk Sips Johan Pouwelse Varbanescu Grids/Clouds Grids/Clouds HPC systems P2P systems P2P systems P2P systems HPC systems Multi-cores File-sharing Big Data Video-on-demand Multi-cores P2P systems Video-on-demand Online gaming e-Science Big Data e-Science Home page • www.pds.ewi.tudelft.nl Publications • see PDS publication database at publications.st.ewi.tudelft.nl August 31, 2011 85
  • 76. (TU) Delft – the Netherlands – Europe founded 13th century pop: 100,000 pop.: 100,000 pop:16.5 M founded 1842 pop: 13,000
  • 77. www.pds.ewi.tudelft.nl/ccgrid2013 Dick Epema, General Chair Delft, the Netherlands Delft University of Technology Delft May 13-16, 2013 Thomas Fahringer, PC Chair Paper submission deadline: University of Innsbruck November 22, 2012 Nov 2012
  • 78. SPEC Research Group (RG) The Research Group of the Standard Performance Evaluation Corporation Mission Statement Provide a platform for collaborative research efforts in the areas of computer benchmarking and quantitative system analysis Provide metrics, tools and benchmarks for evaluating early prototypes and research results as well as full-blown implementations Foster interactions and collaborations btw. industry and academia Find more information on: http://research.spec.org
  • 79. Current Members (Mar 2012) Find more information on: http://research.spec.org
  • 80. If you have an interest in new performance methods, you should join the SPEC RG Find a new venue to discuss your work Exchange with experts on how the performance of systems can be measured and engineered Find out about novel methods and current trends in performance engineering Get in contact with leading organizations in the field of performance evaluation Find a new group of potential employees Join a SPEC standardization process Performance in a broad sense: Classical performance metrics: Response time, throughput, scalability, resource/cost/energy, efficiency, elasticity Plus dependability in general: Availability, reliability, and security Find more information on: http://research.spec.org

Editor's Notes

  1. 1 minute Cloud = Infinite stack of computers at your disposal Fine-grained, flexible, dynamic, and cost-effective source of computers for any applications
  2. We see each day that a commercially acceptable server needs to be 99.99999% reliable. That’s one error every … hours! As soon as we increase in size, failures become more frequent. A top cluster manufacturer advertises only “three nines” reliability. We are not unhappy with this value. [goal for grids] It is well known that the reliability of a system is lower in the beginning, and towards the end of its life (the famous bath-tub shape). Production clusters have been found to be xx to yy % reliable, after passing the ??-year margin [Sch06]. With Grids, we seem to still be in the beginning: errors occur very frequently. Recent studies have shown that ... [Ios06, Kha06] [Ios06] A. Iosup and D. H. J. Epema. Grenchmark: A framework for analyzing, testing, and comparing grids. In CCGRID , pages 313–320. IEEE Computer Society, 2006. [Kha06] O. Khalili, Jiahue He, et al. Measuring the performance and reliability of production computational grids. In GRID . IEEE Computer Society, 2006. [Dum05] C. Dumitrescu, I. Raicu, and I. T. Foster. Experiences in running workloads over grid3. In H. Zhuge and G. Fox, editors, GCC , volume 3795 of LNCS , pages 274–286, 2005.
  3. Specific research questions
  4. high-level research question
  5. A production cloud is a cloud operating on the market and having active customers AWS and GAE are the two largest in terms of the number of customers
  6. A production cloud is a cloud operating on the market and having active customers AWS and GAE are the two largest in terms of the number of customers
  7. Our analysis comprise three steps
  8. We verify our assumption in this slide and we just show the result for EC2. In the paper we have similar results for other services.
  9. To measure create/delete/read times CloudStatus uses a simple set of data which we refer to the combination of all these entities as a ’User Group’.
  10. We show in slide 20 that the performance of datastore increases for the last months of the year APR gives a hint for provider selection
  11. high-level research question