SlideShare a Scribd company logo
1 of 97
Solaris/Linux Performance
       Measurement, Tools and Tuning
       Adrian Cockcroft, acockcroft@netflix.com
       May 1, 2009




2009                                              5/1/09   Page 1
Abstract

       •  This course focuses on the measurement sources and tuning
          parameters available in Unix and Linux, including TCP/IP
          measurement and tuning, workload analysis, complex storage
          subsystems, and with a deep dive on advanced Solaris metrics
          such as microstates and extended system accounting.
       •  The meaning and behavior of metrics is covered in detail.
          Common fallacies, misleading indicators, sources of
          measurement error and other traps for the unwary will be
          exposed.
       •  Free tools for Capacity Planning are covered in detail in a
          different slide deck, interleaved for this event.
       •  Updated slide decks live at http://www.slideshare.net/adrianco


2009                Solaris/Linux Performance Measurement and Tuning    5/1/09   Slide 2
Sources

       •  Adrian Cockcroft
          –  Sun Microsystems 1988-2004, Distinguished Engineer
          –  eBay Research Labs 2004-2007, Distinguished Engineer
          –  Netflix 2007, Director - Web Engineering – Personalization Systems
          –  CMG 2007 Michelson Award Winner for lifetime contribution to
             computer measurement
          –  Note: I am a Netflix employee, but this material does not refer to and
             is not endorsed by Netflix. It is based on the author's work over the
             last 20+ years.
       •  Books by the author
          –  Sun Performance and Tuning, Prentice Hall, 1994, 1998 (2nd Ed)
          –  Resource Management, Prentice Hall, 2000
          –  Capacity Planning for Internet Services, Prentice Hall, 2001




2009                 Solaris/Linux Performance Measurement and Tuning         5/1/09   Slide 3
Contents

       •  Capacity Planning and Performance Definitions
       •  Workload Characteristics and Analysis
       •  Implications of Virtualization and Cloud Computing
       •  Metric collection interfaces
       •  Free Tools for capacity planning (separate slide deck)
       •  CPU - measurement issues and virtualization
       •  Network - Internet Servers and TCP/IP essentials
       •  Memory – The memory-go-round, Swap space instrumentation
       •  Disks - virtualization, SSDs, filesystems, simple disks and RAID
       •  Quick tips and Recipes
       •  References


2009                 Solaris/Linux Performance Measurement and Tuning        5/1/09   Slide 4
Definitions




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 5
Capacity Planning Definitions

       •  Capacity
           –  Resource utilization and headroom
       •  Planning
           –  Predicting future needs by analyzing historical data and
              modeling future scenarios
       •  Performance Monitoring
           –  Collecting and reporting on performance data
       •  Unix/Linux (apologies to users of OSX, HP-UX, AIX etc.)
           –  Emphasis on Solaris and Linux
           –  Much of the discussion is independent of the OS



2009                 Solaris/Linux Performance Measurement and Tuning    5/1/09   Slide 6
Measurement Terms and Definitions

       •  Bandwidth - gross work per unit time [unattainable]
       •  Throughput - net work per unit time
       •  Peak throughput - at maximum acceptable response time
       •  Utilization - busy time relative to elapsed time [can be misleading]
       •  Queue length - number of requests waiting
       •  Service time - time to process a unit of work after waiting
       •  Response time - time to complete a unit of work including waiting
       •  Key Performance Indicator (KPI) – a measurement you have
          decided to watch because it has some business value



2009                Solaris/Linux Performance Measurement and Tuning     5/1/09   Slide 7
Service Level Agreements (SLA)

       •  Behavioral goals for the system in terms of KPIs
       •  Response time target
         –  Rule of thumb: Estimate 95th percentile response time as
            three times mean response time
         –  e.g. if SLA says 1 second response, measured average
            should be less than 333ms
       •  Utilization Target (a proxy for Response Time)
         –  Specified as a minimum and maximum
         –  Minimum utilization target to keep costs down
         –  Maximum utilization target for good response times and
            capacity headroom for future workload fluctuations


2009              Solaris/Linux Performance Measurement and Tuning     5/1/09   Slide 8
Capacity Planning Requirements

       •  We care about CPU, Memory, Network and Disk resources, and
          Application response times
       •  We need to know how much of each resource we are using
          now, and will use in the future
       •  We need to know how much headroom we have to handle
          higher loads
       •  We want to understand how headroom varies, and how it relates
          to application response times and throughput
       •  The application workload must be characterized so we can
          understand and manage system behaviours
       •  We want to be able to find the bottleneck in an under-performing
          system


2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 9
Workloads




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 10
Workload Characteristics: One by One

       Constant Workloads
       •  e.g. Numerical computation, compute intensive batch

       •  Trivial to model, utilization and duration define the work




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 11
Simple Random Arrivals

       •  Random arrival of transactions with fixed mean service time
         –  Little’s Law: QueueLength = Throughput * Response
         –  Utilization Law: Utilization = Throughput * ServiceTime

       •  Complex models are often reduced to this model
         –  By averaging over longer time periods since the formulas only
            work if you have stable averages
         –  By wishful thinking (i.e. how to fool yourself)

       •  e.g. Unix Load Average is actually CPU Queue Length
         –  Throughput up a little, load average up a lot = slow system
         –  So load average is a proxy metric for response time
         –  High load average per CPU implies slow response times

2009              Solaris/Linux Performance Measurement and Tuning    5/1/09   Slide 12
Mixed random arrivals of transactions
        with stable mean service times
       •  Think of the grocery store checkout analogy
          –  Trolleys full of shopping vs. baskets full of shopping
          –  Baskets are quick to service, but get stuck behind trolleys
          –  Relative mixture of transaction types starts to matter

       •  Many transactional systems handle a mixture
          –  Databases, web services

       •  Consider separating fast and slow transactions
          –  So that we have a “10 items or less” line just for baskets
          –  Separate pools of servers for different services
          –  Don’t mix OLTP with DSS queries in databases

       •  Performance is often thread-limited
          –  Thread limit and slow transactions constrains maximum throughput
          –  Throughput = Queue / ResponseTime

       •  Model using analytical solvers like PDQ

2009                 Solaris/Linux Performance Measurement and Tuning       5/1/09   Slide 13
Load dependent servers – non-stable
        mean service times
       •  Mean service time increases at high throughput
         –  Due to non-scalable algorithms, lock contention
         –  System runs out of memory and starts paging or frequent GC

       •  Systems have “tipping points”
         –  Hysteresis means they don’t come back when load drops
         –  This is why you have to kill catatonic systems

       •  Model using simulation tools like Hyperformix, Opnet
         –  Behaviour is non-linear and hard to model
         –  Practical option is to avoid tipping points
         –  Best designs shed load to be stable at the limit

2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 14
Self-similar / fractal workloads – bursty
        rather than random

       •  Self-similar
         –  Looks “random” at close up, stays “random” as you zoom out
         –  Work arrives in bursts, transactions aren’t independent
         –  Bursts cluster together in super-bursts, etc.

       •  Network packet streams tend to be fractal

       •  Common in practice, too hard to model
         –  Probably the most common reason why your model is wrong!




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 15
State Dependent Services

       •  Personalized services that store user history
         –  Transactions for new users are quick
         –  Transactions for users with lots of state/history are slower
         –  As user base builds state and ages you get into lots of
            trouble…

       •  Social Networks, Recommendation Services
         –  Facebook, Flickr, Netflix, Pandora, Twitter etc.

       •  “Abandon hope all ye who enter here”
         –    Not tractable to model, repeatable tests are tricky
         –    Long fat tail response time distribution and timeouts
         –    Excessively long service times for some users
         –    Solutions: careful algorithm design, lots of caching

2009                Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 16
Workload Modelling Survivalism

       •  Simplify the workload algorithms
          –  move from hard or impossible to simpler models
          –  use caching and pre-compute to get constant service times

       •  Stand further away
          –  averaging is your friend – gets rid of complex fluctuations

       •  Minimalist Models
          –  most models are far too complex – the classic beginners error…
          –  the art of modelling is to only model what really matters

       •  Don’t model details you don’t use
          –  model peak hour of the week, not day to day fluctuations
          –  e.g. “Will the web site survive next Sunday night?”


2009                Solaris/Linux Performance Measurement and Tuning       5/1/09   Slide 17
Metrics




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 18
Measurement Data Interfaces

   •  Several generic raw access methods
       –    Read the kernel directly
       –    Structured system data
       –    Process data
       –    Network data
       –    Accounting data
       –    Application data
   •  Command based data interfaces
       –  Scrape data from vmstat, iostat, netstat, sar, ps
       –  Higher overhead, lower resolution, missing metrics
   •  Data available is always platform and release specific…




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 19
Reading kernel memory - kvm

   •    The only way to get data in very old Unix variants
   •    Use kernel namelist symbol table and open /dev/kmem
   •    Solaris wraps up interface in kvm library
   •    Advantages
         –  Still the only way to get at some kinds of data
         –  Low overhead, fast bulk data capture
   •    Disadvantages
         –    Too much intimate implementation detail exposed
         –    No locking protection to ensure consistent data
         –    Highly non-portable, unstable over releases and patches
         –    Tools break when kernel moves between 32 and 64bit address
              support

2009                 Solaris/Linux Performance Measurement and Tuning      5/1/09   Slide 20
Structured Kernel Statistics - kstat

   •  Solaris 2 introduced kstat and extended usage in each release
   •  Used by Solaris 2 vmstat, iostat, sar, network interface stats, etc.
   •  Advantages
       –  The recommended and supported Solaris metric access API
       –  Does not require setuid root commands to access for reads
       –  Individual named metrics stable over releases
       –  Consistent data using locking, but low overhead
       –  Unchanged when kernel moves to 64bit address support
       –  Extensible to add metrics without breaking existing code
   •  Disadvantages
       –  Somewhat complex hierarchical kstat_chain structure
       –  State changes (device online/offline) cause kstat_chain rebuild
2009             Solaris/Linux Performance Measurement and Tuning    5/1/09   Slide 21
Kernel Trace - TNF, Dtrace, ktrace

   •  Solaris, Linux, Windows and other Unixes have similar features
       –  Solaris has TNF probes and prex command to control them
       –  User level probe library for hires tracepoints allows
          instrumentation of multithreaded applications
       –  Kernel level probes allow disk I/O and scheduler tracing
   •  Advantages
       –  Low overhead, microsecond resolution
       –  I/O trace capability is extremely useful
   •  Disadvantages
       –  Too much data to process with simple tracing capabilities
       –  Trace buffer can overflow or cause locking issues

2009             Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 22
Dtrace – Dynamic Tracing
   •  One of the most exiting new features in Solaris 10, rave reviews
   •  Book: quot;Solaris Performance and Toolsquot; by Richard McDougall
      and Brendan Gregg
   •  Advantages
       –  No overhead when it is not in use
       –  Low overhead probes can be put anywhere/everywhere
       –  Trace data is correlated and filtered at source, get exactly the
          data you want, very sophisticated data providers included
       –  Bundled, supported, designed to be safe for production
          systems
   •  Disadvantages
       –  Solaris specific, but being ported to BSD/Linux
       –  No high level tools support yet
       –  Yet another (awk-like) scripting language to learn


2009             Solaris/Linux Performance Measurement and Tuning    5/1/09   Slide 23
Hardware counters
   •  Solaris cpustat for X86 and UltraSPARC pipeline and cache counters
   •  Solaris busstat for server backplanes and I/O buses, corestat for
      multi-core systems
   •  Intel Trace Collector, Vampir for Linux
   •  Most modern CPUs and systems have counters
   •  Advantages
       –  See what is really happening, more accurate than kernel stats
       –  Cache usage useful for tuning code algorithms
       –  Pipeline usage useful for HPC tuning for megaflops
       –  Backplane and memory bank usage useful for database servers
   •  Disadvantages
       –  Raw data is confusing, lots of architectural background info
          needed
       –  Most tools focus on developer code tuning


2009             Solaris/Linux Performance Measurement and Tuning        5/1/09   Slide 24
Configuration information
   •    System configuration data comes from too many sources!
         –  Solaris device tree displayed by prtconf and prtdiag
         –  Solaris 8 adds dynamic configuration notification device picld
         –  SCSI device info using iostat -E in Solaris
         –  Logical volume info from product specific vxprint and metastat
         –  Hardware RAID info from product specific tools
         –  Critical storage config info must be accessed over ethernet…
         –  Linux device tree in /proc is a bit easier to navigate
   •    It is very hard to combine all this data!
   •    DMTF CIM objects try to address this, but no-one seems to use them…
   •    Free tool - Config Engine: http://www.cfengine.org




2009                Solaris/Linux Performance Measurement and Tuning         5/1/09   Slide 25
Application instrumentation Examples
   •    Oracle V$ Tables – detailed metrics used by many tools
   •    Apache logging for web services
   •    ARM standard instrumentation
   •    Custom do-it-yourself and log file scraping
   •    Advantages
         –  Focussed application specific information
         –  Business metrics are needed to do real capacity planning
   •    Disadvantages
         –  No common access methods
         –  ARM is a collection interface only, vendor specific tools, data
         –  Very few applications are instrumented, even fewer have support
            from performance tools vendors



2009                Solaris/Linux Performance Measurement and Tuning     5/1/09   Slide 26
Kernel values, tunables and defaults
       •  There is often far too much emphasis on kernel tweaks
           –  There really are few “magic bullet” tunables
           –  It rarely makes a significant difference
       •  Fix the system configuration or tune the application instead!
       •  Very few adjustable components
           –  “No user serviceable parts inside”
           –  But Unix has so much history people think it is like a 70’s car
           –  Solaris really is dynamic, adaptive and self-tuning
           –  Most other “traditional Unix” tunables are just advisory limits
           –  Tweaks may be workarounds for bugs/problems
           –  Patch or OS release removes the problem - remove the tweak
       Solaris Tunable Parameters Reference Manual (if you must…)
          –  http://docs.sun.com/app/docs/doc/817-0404



2009                 Solaris/Linux Performance Measurement and Tuning           5/1/09   Slide 27
Process based data - /proc

   •    Used by ps, proctool and debuggers, pea.se, proc(1) tools on Solaris
   •    Solaris and Linux both have /proc/pid/metric hierarchy
   •    Linux also includes system information in /proc rather than kstat
   •    Advantages
         –  The recommended and supported process access API
         –  Metric data structures reasonably stable over releases
         –  Consistent data using locking
         –  Solaris microstate data provides accurate process state timers
   •    Disadvantages
         –  High overhead for open/read/close for every process
         –  Linux reports data as ascii text, Solaris as binary structures


2009                Solaris/Linux Performance Measurement and Tuning         5/1/09   Slide 28
Network protocol data

   •    Based on a streams module interface in Solaris
   •    Solaris 2 ndd interface used to configure protocols and interfaces
   •    Solaris 2 mib interface used by netstat -s and snmpd to get TCP stats etc.
   •    Advantages
         –    Individual named metrics reasonably stable over releases
         –    Consistent data using locking
         –    Extensible to add metrics without breaking existing code
         –    Solaris ndd can retune TCP online without reboot
         –    System data is often also made available via SNMP prototcol
   •    Disadvantages
         –  Underlying API is not supported, SNMP access is preferred




2009                  Solaris/Linux Performance Measurement and Tuning               5/1/09   Slide 29
Tracing and profiling
       •  Tracing Tools
          –    truss - shows system calls made by a process
          –    sotruss / apitrace - shows shared library calls
          –    prex - controls TNF tracing for user and kernel code
          –    snoop/tcpdump – network traces for analysis with wireshark
       •  Profiling Tools
          –    Compiler profile feedback using -xprofile=collect and use
          –    Sampled profile relink using -p and prof/gprof
          –    Function call tree profile recompile using -pg and gprof
          –    Shared library call profiling setenv LD_PROFILE and gprof
       •  Accurate CPU timing for process using /usr/proc/bin/ptime
       •  Microstate process information using pea.se and pw.se
         10:40:16 name lwmx   pid   ppid    uid    usr%   sys% wait% chld% size   rss     pf
         nis_cachemgr     5   176      1      0    1.40   0.19 0.00 0.00 16320 11584     0.0
         jre              1 17255   3184   5743   11.80   0.19 0.00 0.00 178112 110336    0.0
         sendmail         1 16751      1      0    1.01   0.43 0.00 0.43 18624 16384     0.0
         se.sparc.5.6     1 16741   1186   9506    5.90   0.47 0.00 0.00 16320 14976     0.0
         imapd            1 16366    198   5710    6.88   1.09 1.02 0.00 34048 29888     0.1
         dtmail          10 16364   9070   5710    0.75   1.12 0.00 0.00 102144 94400    0.0




2009                    Solaris/Linux Performance Measurement and Tuning                        5/1/09   Slide 30
Free Tools
                (See Separate Slide Deck)
           http://www.slideshare.net/adrianco




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 31
Headroom




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 32
What would you say if you were asked:

       How busy is that system?
       A: I have no idea…
       A: 10%
       A: Why do you want to know?
       A: I’m sorry, you don’t understand your question….




2009            Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 33
Headroom Estimation

       •  CPU Capacity
         –  Relatively easy to figure out
       •  Network Usage
         –  Use bytes not packets/s
       •  Memory Capacity
         –  Tricky - easier in Solaris 8
       •  Disk Capacity
         –  Can be very complex




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 34
Headroom


       •       Headroom is available usable resources
                    –  Total Capacity minus Peak Utilization and Margin
                    –  Applies to usr+sysRAM, Net, Disk and OS
                                  CPU, CPU for Peak Period
                    100
                                                    Margin
                    90
                    80
                                                    Headroom
                    70
                    60
            CPU %




                    50
                    40                         Utilization
                    30
                    20
                    10
                     0
                                                   Time


2009                      Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 35
Utilization

                •  Utilization is the proportion of busy time
                •  Always defined over a time interval

                                                                             OnCPU Scheduling for Each CPU




                                                      Mean CPU Util
                                                       OnCPU and
                                                                      0.56
                        usr+sys CPU for Peak Period

               100
                                                                        0
               90
               80                                                            1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
               70
                                                                                                  Microseconds
               60
       CPU %




               50
               40
                                     Utilization
               30
               20
               10
                0
                                      Time



2009                    Solaris/Linux Performance Measurement and Tuning                                                  5/1/09    Slide 36
Response Time
       •  Response Time = Queue time + Service time
       •  The Usual Assumptions…
           –  Steady state averages
           –  Random arrivals
           –  Constant service time
           –  M servers processing the same queue

       •  Approximations
           –  Queue length = Throughput * Response Time (Little's Law)
           –  Utilzation = Throughput * Service Time (utilization law)
           –  Response Time = Service Time / (1 - UtilizationM)




2009            Solaris/Linux Performance Measurement and Tuning         5/1/09   Slide 37
Response Time Curves
                                       The traditional view of Utilization as a proxy for response time
                                       Systems with many CPUs can run at higher utilization levels, but degrade more
                                                   rapidly when they run out of capacity
                                       Headroom margin should be set according to a response time target.

                                                                         Response Time Curves                  R = S / (1 - (U%)m)
                                       10.00
       Response Time Increase Factor




                                        9.00
                                        8.00
                                                                                                                            One CPU
                                        7.00
                                                                                                                            Two CPUs
                                        6.00
                                                                                                                            Four CPUs
                                        5.00                                                                                Eight CPUs
                                                                                         Headroom                           16 CPUs
                                        4.00
                                                                                         margin                             32 CPUs
                                        3.00                                                                                64 CPUs
                                        2.00
                                        1.00
                                        0.00
                                               0       10    20    30      40      50       60       70   80   90    100
                                                                        Total System Utilization %




2009                                                    Solaris/Linux Performance Measurement and Tuning                        5/1/09   Slide 38
So what's the problem with Utilization?
   •     Unsafe assumptions! Complex adaptive systems are not simple!

   •     Random arrivals?
          –  Bursty traffic with long tail arrival rate distribution

   •     Constant service time?
          –  Variable clock rate CPUs, inverse load dependent service time

          –  Complex transactions, request and response dependent

   •     M servers processing the same queue?
          –  Virtual servers with varying non-integral concurrency
          –  Non-identical servers or CPUs, Hyperthreading, Multicore, NUMA

   •     Measurement Errors?
          –  Mechanisms with built in bias, e.g. sampling from the scheduler clock
          –  Platform and release specific systemic changes in accounting of interrupt time


2009                    Solaris/Linux Performance Measurement and Tuning                 5/1/09   Slide 39
Variable Clock Rate CPUs

       •    Laptop and other low power devices do this all the time
             –  Watch CPU usage of a video application and toggle mains/battery power….
       •    Server CPU Power Optimization - AMD PowerNow!™
             –    AMD Opteron server CPU detects overall utilization and reduces clock rate
             –    Actual speeds vary, but for example could reduce from 2.6GHz to 1.2GHz
             –    Changes are not understood or reported by operating system metrics
             –    Speed changes can occur every few milliseconds (thermal shock issues)
             –    Dual core speed varies per socket, Quad core varies per core
             –    Quad core can dynamically stop entire cores to save power
       •    Possible scenario:
             –    You estimate 20% utilization at 2.6GHz
             –    You see 45% reported in practice (at 1.2GHz)
             –    Load doubles, reported utilization drops to 40% (at 2.6GHz)
             –    Actual mapping of utilization to clock rate is unknown at this point


       •    Note: Older and quot;low powerquot; Opterons used in blades fix clock rate


2009                       Solaris/Linux Performance Measurement and Tuning                   5/1/09   Slide 40
Virtual Machine Monitors

       •  VMware, Xen, IBM LPARs etc.
          –  Non-integral and non-constant fractions of a machine
          –  Naiive operating systems and applications that don't expect this
             behavior
          –  However, lots of recent tools development from vendors


       •  Average CPU count must be reported for each measurement
          interval


       •  VMM overhead varies, application scaling characteristics may
          be affected


2009               Solaris/Linux Performance Measurement and Tuning        5/1/09   Slide 41
Threaded CPU Pipelines

       •    CPU microarchitecture optimizations
             –  Extra register sets working with one execution pipeline
             –  When the CPU stalls on a memory read, it switches registers/threads
             –  Operating system sees multiple schedulable entities (CPUs)
       •    Intel Hyperthreading
             –    Each CPU core has an extra thread to use spare cycles
             –    Typical benefit is 20%, so total capacity is 1.2 CPUs
             –    I.e. Second thread much slower when first thread is busy
             –    Hyperthreading aware optimizations in recent operating systems
       •    Sun “CoolThreads”
             –    quot;Niagaraquot; SPARC CPU has eight cores, one shared floating point unit
             –    Each CPU core has four threads, but each core is a very simple design
             –    Behaves like 32 slow CPUs for integer, snail like uniprocessor for FP
             –    Overall throughput is very high, performance per watt is exceptional
             –    Niagara 2 has dedicated FPU and 8 threads per core (total 64 threads)



2009                      Solaris/Linux Performance Measurement and Tuning                5/1/09   Slide 42
Measurement Errors
       •    Mechanisms with built in bias
             –  e.g. sampling from the scheduler clock underestimates CPU usage
             –  Solaris 9 and before, Linux, AIX, HP-UX “sampled CPU time”
             –  Solaris 10 and HP-UX “measured CPU time” far more accurate
             –  Solaris microstate process accounting always accurate but in Solaris 10
                microstates are also used to generate system-wide CPU

       •    Accounting of interrupt time
             –  Platform and release specific systemic changes
             –  Solaris 8 - sampled interrupt time spread over usr/sys/idle
             –  Solaris 9 - sampled interrupt time accumulated into sys only
             –  Solaris 10 - accurate interrupt time spread over usr/sys/idle
             –  Solaris 10 Update 1 - accurate interrupt time in sys only



2009                    Solaris/Linux Performance Measurement and Tuning             5/1/09   Slide 43
CPU time measurements
       •  Biased sample CPU measurements
          –  See 1998 Paper quot;Unix CPU Time Measurement Errorsquot;
          –  Microstate measurements are accurate, but are platform and tool specific.
             Sampled metrics are more inaccurate at low utilization
       •  CPU time is sampled by the 100Hz clock interrupt
          –    sampling theory says this is accurate for an unbiased sample
          –    the sample is very biased, as the clock also schedules the CPU
          –    daemons that wakeup on the clock timer can hide in the gaps
          –    problem gets worse as the CPU gets faster
       •  Increase clock interrupt rate? (Solaris)
          –  set hires_tick=1 sets rate to 1000Hz, good for realtime wakeups
          –  harder to hide CPU usage, but slightly higher overhead
       •  Use measured CPU time at per-process level
          –    microstate accounting takes timestamp on each state change
          –    very accurate and also provides extra information
          –    still doesn’t allow for interrupt overhead
          –    Prstat -m and the pea.se command uses this accurate measurement


2009                    Solaris/Linux Performance Measurement and Tuning            5/1/09   Slide 44
More CPU Measurement Issues

       •  Load average differences
          –  Just includes CPU queue (Solaris)
          –  Includes CPU and Disk (Linux) – which is a broken metric
       •  Wait for I/O is a misleading subset of idle time
          –  Metric removed in Solaris 10 – always zero
          –  Ignore it in all other Unix/Linux releases
          –  Only makes sense on uni-processor systems




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 45
How to plot Headroom

       •  Measure and report absolute CPU power if you can get it…
       •  Plot shows headroom in blue, margin in red, total power tracking day/
          night workload variation, plotted as mean + two standard deviations.




2009                 Solaris/Linux Performance Measurement and Tuning        5/1/09   Slide 46
“Cockcroft Headroom Plot”
  •    Scatter plot of response time
       (ms) vs. Throughput (KB) from
       iostat metrics
  •    Histograms on axes
  •    Throughput time series plot
  •    Shows distributions and shape
       of response time
  •    Fits throughput weighted
       inverse gaussian curve
  •    Coded using quot;Rquot; statistics
       package
  •    Blogged development at
  http://perfcap.blogspot.com/search?q=chp



2009                  Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 47
How busy is that system again?

       •  Check your assumptions…
       •  Record and plot absolute capacity for each measurement interval
       •  Plot response time as a function of throughput, not just utilization
       •  SOA response characteristics are complicated…
       •  More detailed discussion in CMG06 Paper and blog entries
          –  “Utilization is Virtually Useless as a Metric” - Adrian Cockcroft - CMG06



                     http://perfcap.blogspot.com/search?q=utilization
                         http://perfcap.blogspot.com/search?q=chp




2009                  Solaris/Linux Performance Measurement and Tuning                   5/1/09   Slide 48
CPU




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 49
CPU Capacity Measurements

       •  CPU Capacity is defined by CPU type and clock rate, or a
          benchmark rating like SPECrateInt2000
       •  CPU throughput - CPU scheduler transaction rate
          –  measured as the number of voluntary context switches
       •  CPU Queue length
          –  CPU load average gives an approximation via a time
             decayed average of number of jobs running and ready to run
       •  CPU response time
          –  Solaris microstate accounting measures scheduling delay
       •  CPU utilization
          –  Defined as busy time divided by elapsed time for each CPU
          –  Badly distorted and undermined by virtualization……

2009              Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 50
Controlling and CPUs in Solaris

       •  psrinfo - show CPU status and clock rate
       •  Corestat - show internal behavior of multi-core CPUs
       •  psradm - enable/disable CPUs
       •  pbind - bind a process to a CPU
       •  psrset - create sets of CPUs to partition a system
          –  At least one CPU must remain in the default set, to run kernel services
             like NFS threads
          –  All CPUs still take interrupts from their assigned sources
          –  Processes can be bound to sets
       •  mpstat shows per-CPU counters (per set in Solaris 9)
       CPU minf mjf xcal   intr ithr   csw icsw migr smtx    srw syscl   usr sys   wt idl
       0     45   1    0    232    0   780 234 106 201         0   950    72 28     0   0
       1     29   1    0    243    0   810 243 115 186         0 1045     69 31     0   0
       2     27   1    0    235    0   827 243 110 199         0 1000     75 25     0   0
       3     26   0    0    217    0   794 227 120 189         0   925    70 30     0   0
       4      9   0    0    234   92   403   94   84 1157      0   625    66 34     0   0



2009                 Solaris/Linux Performance Measurement and Tuning               5/1/09   Slide 51
Monitoring CPU mutex lock statistics

       •  To fix mutex contention change the application workload or upgrade to a newer
          OS release
       •  Locking strategies are too complex to be patched
       •  Lockstat Command
          –    very powerful and easy to use
          –    Solaris 8 extends lockstat to include kernel CPU time profiling
          –    dynamically changes all locks to be instrumented
          –    displays lots of useful data about which locks are contending
       # lockstat sleep 5
       Adaptive mutex spin: 3318 events
       Count indv cuml rcnt     spin Lock                   Caller
       -------------------------------------------------------------------------------
       601 18% 18% 1.00          1 flock_lock             cleanlocks+0x10
       302   9% 27% 1.00         7 0xf597aab0             dev_get_dev_info+0x4c
       251   8% 35% 1.00         1 0xf597aab0             mod_rele_dev_by_major+0x2c
       245   7% 42% 1.00         3 0xf597aab0             cdev_size+0x74
       160   5% 47% 1.00         7 0xf5b3c738             ddi_prop_search_common+0x50




2009                     Solaris/Linux Performance Measurement and Tuning           5/1/09   Slide 52
Network




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 53
Network interface and NFS metrics

       •  Network interface throughput counters from kstat on Solaris
           –    rbytes, obytes — read and output byte counts
           –    multircv, multixmt — multicast byte counts
           –    brdcstrcv, brdcstxmt — broadcast byte counts
           –    norcvbuf, noxmtbuf — buffer allocation failure counts
       •  Linux netstat shows byte throughput, (Solaris doesn’t)
       •  NFS Client Statistics Shown in iostat on Solaris
       crun% iostat -xnP                              extended device Statistics
       r/s w/s    kr/s   kw/s wait actv wsvc_t asvc_t %w %b device
       0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 crun:vold(pid363)
       0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 servdist:/usr/dist
       0.0 0.5     0.0    7.9 0.0 0.0      0.0   20.7   0 1 servhome:/export/home/adrianc
       0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 servhome:/var/mail
       0.0 1.3     0.0   10.4 0.0 0.2      0.0 128.0    0 2 c0t2d0s0
       0.0 0.0     0.0    0.0 0.0 0.0      0.0    0.0   0 0 c0t2d0s2




2009                    Solaris/Linux Performance Measurement and Tuning                    5/1/09   Slide 54
TCP - A Simple Approach

       •  Capacity and Throughput Metrics to Watch
       •  Connections
         –  Current number of established connections
         –  New outgoing connection rate (active opens)
         –  Outgoing connection attempt failure rate
         –  New incoming connection rate (passive opens)
         –  Incoming connection attempt failure rate (resets)
       •  Throughput
         –  Input and output byte rates
         –  Input and output segment rates
         –  Output byte retransmit percentage


2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 55
Obtaining Measurements

       •  Get the TCP MIB via SNMP or netstat -s
       •  Standard TCP metric names:
         –  tcpCurrEstab: current number of established connections
         –  tcpActiveOpens: number of outgoing connections since boot
         –  tcpAttemptFails: number of outgoing failures since boot
         –  tcpPassiveOpens: number of incoming connections since boot
         –  tcpOutRsts: number of resets sent to reject connection
         –  tcpEstabResets: resets sent to terminate established
            connections
         –  (tcpOutRsts - tcpEstabResets): incoming connection failures
         –  tcpOutDataSegs, tcpInDataSegs: data transfer in segments
         –  tcpRetransSegs: retransmitted segments

2009              Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 56
Internet Server Issues

       •  TCP Connections are expensive
         –  TCP is optimized for reliable data on long lived connections
         –  Making a connection uses a lot more CPU than moving data
         –  Connection setup handshake involves several round trip
            delays
         –  Each open connection consumes about 1 KB plus data buffers
       •  Pending connections cause “listen queue” issues
       •  Each new connection goes through a “slow start” ramp up
       •  Other TCP Issues
         –  TCP windows can limit high latency high speed links
         –  Lost or delayed data causes time-outs and retransmissions


2009              Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 57
TCP Sequence Diagram for HTTP Get




2009         Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 58
Stalled HTTP Get and Persistent HTTP




2009          Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 59
Memory




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 60
Memory Capacity Measurements

       •  Physical Memory Capacity Utilization and Limits
          –  Kernel memory, Shared Memory segment
          –  Executable code, stack and heap
          –  File system cache usage, Unused free memory
       •  Virtual Memory Capacity - Paging/Swap Space
          –  When there is no more available swap, Unix stops working
       •  Memory Throughput
          –  Hardware counter metrics can track CPU to Memory traffic
          –  Page in and page out rates
       •  Memory Response Time
          –  Platform specific hardware memory latency makes a difference, but
             hard to measure
          –  Time spent waiting for page-in is part of Solaris microstate
             accounting


2009               Solaris/Linux Performance Measurement and Tuning      5/1/09   Slide 61
Page Size Optimization

       •  Systems may support large pages for reduced overhead
         –  Solaris support is more dynamic/flexible than Linux at present
       •  Intimate Shared Memory locks large pages in RAM
         –  No swap space reservation
         –  Used for large database server Shared Global Area
       •  No good metrics to track usage and fragmentation issues
       •  Solaris ppgsz command can set heap and stack pagesize
       •  SPARC Architecture
         –  Base page size is 8KB, Large pages are 4MB
       •  Intel/AMD x86 Architectures
         –  Base page size is 4KB, Large pages are 2MB

2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 62
Cache principles

       •  Temporal locality - “close in time”
          –  If you need something frequently, keep it near you
          –  If you don’t use it for a while, put it back
          –  If you change it, save the change by putting it back
       •  Spacial locality - “close in space - nearby”
          –  If you go to get one thing, get other stuff that is nearby
          –  You may save a trip by prefetching things
          –  You can waste bandwidth if you fetch too much you don’t use
       •  Caches work well with randomness
          –  Randomness prevents worst case behaviour
          –  Deterministic patterns often cause cache busting accesses
       •  Very careful cache friendly tuning can give great speedups


2009                 Solaris/Linux Performance Measurement and Tuning      5/1/09   Slide 63
The memory go round - Unix/Linux

       •  Memory usage flows between subsystems
                         Kernel               System V
                         Memory               Shared
                         Buffers              Memory


                                                 shm_unlink
                                   kernel
                         kernel           shmget
                                   alloc
                         free
                                     Head
                                                   delete
                      exit
                                      Free        read
                          brk         RAM         write
                          pagein      List        mmap
                           reclaim            reclaim
          Process                                         Filesystem
          Stack and                  Ta il                Cache
          Heap
                          pageout            pageout
                          scanner            scanner


2009                    Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 64
The memory go round - Solaris 8 and Later

       •  Memory usage flows between subsystems
                           Kernel               System V
                           Memory               Shared
                           Buffers              Memory


                                                   shm_unlink
                                     kernel
                           kernel           shmget
                                     alloc
                           free
                                       Head
                                      Free RAM List
                        exit
                                               read
                            brk
                                    delete     write
                            pagein
                                               mmap
                                      Filesystem
                             reclaim Cache
            Process
            Stack and                  Ta il
            Heap
                            pageout
                            scanner

2009                    Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 65
Solaris Swap Space

       •  Swap is very confusing and badly instrumented!
       # se swap.se
       ani_max 54814 ani_resv 19429 ani_free 37981 availrmem 13859 swapfs_minfree 1972
          ramres 11887 swap_resv 19429 swap_alloc 16833 swap_avail 47272 swap_free
          49868
       Misleading data printed by swap -s
       134664 K allocated + 20768 K reserved = 155432 K used, 378176 K available
       Corrected labels:
       134664 K allocated + 20768 K unallocated = 155432 K reserved, 378176 K available
       Mislabelled sar -r 1
       freeswap (really swap available) 756352 blocks
       Useful swap data:
       Total swap 520 M available 369 M   reserved 151 M   Total disk 428 M   Total RAM 92 M
       # swap -s
       total: 134056k bytes allocated + 20800k reserved = 154856k used, 378752k available
       # sar -r 1
       18:40:51 freemem freeswap
       18:40:52    4152   756912




2009                  Solaris/Linux Performance Measurement and Tuning                 5/1/09   Slide 66
Disk




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 67
Disk Capacity Measurements

       •  Detailed metrics vary by platform
       •  Easy for the simple disk cases
       •  Hard for cached RAID subsystems
       •  Almost Impossible for shared disk subsystems and SANs
          –  Another system or volume can be sharing a backend
             spindle, when it gets busy your own volume can saturate,
             even though you did not change your own workload!




2009              Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 68
Storage Utilization

       •  Storage virtualization broke utilization metrics a long time ago
       •  Host server measures busy time on a quot;diskquot;
           –  Simple disk, quot;single serverquot; response time gets high near 100%
              utilization
           –  Cached RAID LUN, one I/O stream can report 100% utilization, but
              full capacity supports many threads of I/O since there are many
              disks and RAM buffering

       •  New metric - quot;Capability Utilizationquot;
           –  Adjusted to report proportion of actual capacity for current workload
              mix
           –  Measured by tools such as Ortera Atlas (http://www.ortera.com)




2009                 Solaris/Linux Performance Measurement and Tuning        5/1/09   Slide 69
Solaris Filesystem Issues

       ufs - standard, reliable, good for lots of small files
          ufs with transaction log - faster writes and recovery
       tmpfs - fastest if you have enough RAM, volatile
       NFS
          NFS2 - safe and common, 8KB blocks, slow writes
          NFS3 - more readahead and writebehind, faster
              default 32KB block size - fast sequential, may be slow random
              default TCP instead of UDP, more robust over WAN
          NFS4 - adds stateful behavior
          cachefs - good for read-mostly NFS speedup
       Veritas VxFS - useful on old Solaris releases
       Solaris 8 UFS Upgrade
          ufs was extended to be more competitive with VxFS
          transaction log unbuffered direct access option and snapshot backup capability
             now available “for free” with Solaris 8



2009                  Solaris/Linux Performance Measurement and Tuning             5/1/09   Slide 70
Solaris 10 ZFS - What it doesn't have....

       •  Nice features
          –    No extra cost - its bundled in a free OS
          –    No volume manager - its built in
          –    No space management - file systems use a common pool
          –    No long wait for newfs to finish - create a 3TB file system in a second
          –    No fsck - its transactional commit means its consistent on disk
          –    No slow writes - disk write caches are enabled and flushed reliably
          –    No random or small writes - all writes are large batched sequential
          –    No rsync - snapshots can be differenced and replicated remotely
          –    No silent data corruption - all data is checksummed as it is read
          –    No bad archives - all the data in the file system is scrubbed regularly
          –    No penalty for software RAID - RAID-Z has a clever optimization
          –    No downtime - mirroring, RAID-Z and hot spares
          –    No immediate maintenance - double parity disks if you need them
       •  Wish-list
          –  No way to know how much performance headroom you have!
          –  No clustering support

2009                     Solaris/Linux Performance Measurement and Tuning                5/1/09   Slide 71
Linux Filesystems

       •  There are a large number of options!
          –  http://en.wikipedia.org/wiki/Comparison_of_file_systems
       •  EXT3
          –    Common default for many Linux distributions
          –    Efficient for CPU and space, small block size
          –    relatively simple for reliability and recovery
          –    Journalling support options can improve performance
          –    EXT4 came out of development at the end of 2008
       •  XFS
          –  Based on Silicon Graphics XFS, mature and reliable
          –  Better for large files and streaming throughput
          –  High Performance Computing heritage



2009                   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 72
Disk Configurations

       •  Sequential access is ~10 times faster than random
          –  Sequential rates are now about 50-100 MB/s per disk
          –  Random rates are 166 operations/sec, (250/sec at 15000rpm)
          –  The size of each random read should be as big as possible
       •  Reads should be cached in main memory
          –    “The only good fast read is the one you didn’t have to do”
          –    Database shared memory or filesystem cache is microseconds
          –    Disk subsystem cache is milliseconds, plus extra CPU load
          –    Underlying disk is ~6ms, as its unlikely that data is in cache
       •  Writes should be cached in nonvolatile storage
          –  Allows write cancellation and coalescing optimizations
          –  NVRAM inside the system - Direct access to Flash storage
          –  Solid State Disks based on Flash are the quot;Next Big Thingquot;

2009                   Solaris/Linux Performance Measurement and Tuning         5/1/09   Slide 73
Disk Throughput
       14000




       12000




       10000




        8000



                                                                  disk_wK/s
                                                                  disk_rK/s



        6000




        4000




        2000




          0




2009           Solaris/Linux Performance Measurement and Tuning   5/1/09      Slide 74
Max and Avg Disk Utilization (Same data)
        100




        90




        80




        70




        60




                                                                  disk_max%
        50
                                                                  disk_avg%




        40




        30




        20




        10




         0




2009           Solaris/Linux Performance Measurement and Tuning     5/1/09    Slide 75
Data from iostat

       •  What can we see here?
                                                                           sd7 root ufs
       extended disk statistics
       disk      r/s   w/s   Kr/s     Kw/s wait actv     svc_t   %w   %b
       sd7       0.1   1.7    0.1     13.3   0.0   0.2   109.8    0    1
       sd15    534.2 17.5 1320.4      35.0   0.0   0.3     0.6    0   26
                                                                           solid state disks
       sd45    291.9 23.0    603.2    49.8   0.0   0.2     0.6    0   15
       sd60      3.1   0.0   25.3      0.0   0.0   0.0     7.8    0    2
       sd61      3.3   0.0   26.4      0.0   0.0   0.0     7.6    0    2

                                                                                       
                                                                           stripe 8K RR
       sd62      3.2   0.0   26.1      0.0   0.0   0.0     8.1    0    3
       sd63      3.8   0.0   30.1      0.0   0.0   0.0     7.2    0    3
       sd64      3.6   0.0   28.8      0.0   0.0   0.0     7.4    0    3
       sd65      3.8   0.0   31.2      0.0   0.0   0.0     7.3    0    3
       sd67      9.7   1.5   77.8      4.3   0.0   0.1     9.0    0    8
                                                                           stripe 
       sd68    10.7    1.4   85.3      4.2   0.0   0.1     9.0    0   10
       sd69    10.0    1.5   79.9      4.2   0.0   0.1     9.0    0    9
       sd70    10.4    1.0   83.1      3.2   0.0   0.1     9.1    0    9
       sd71      9.9   1.4   78.8      4.6   0.0   0.1     8.7    0    9

                                                                           cached write log
       sd72    10.0    1.1   79.9      3.7   0.0   0.1     8.5    0    8
       sd75      0.0 27.6     0.0    297.3   0.0   0.0     1.1    0    2
       sd210   12.1    0.3   108.9     0.6   0.0   0.1     9.8    0   10
       sd211   12.9    0.4   114.8     0.7   0.0   0.1    10.6    0   11
       sd212   12.0    0.6   107.1     1.3   0.0   0.1    11.1    0   10
       sd213   13.8    0.3   122.2     0.9   0.0   0.2    11.1    0   11
                                                                           stripe
       sd214   12.5    0.5   112.1     1.0   0.0   0.1    10.3    0   10
       sd215   12.1    0.3   109.5     0.8   0.0   0.1    10.5    0   10




2009                                 Solaris/Linux Performance Measurement and Tuning           5/1/09   Slide 76
Simple Disks

       •  Utilization shows capacity usage
         Measured using iostat %b
       •  Response time is svc_t
         svc_t increases due to waiting in the queues caused by bursty
           loads
       •  Service time per I/O is Util/IOPS
         Calculate as(%b/100)/(rps+wps)
         Decreases due to optimization of queued requests as load
          increases




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 77
Single Disk Parameters

       •  e.g. Seagate 18GB ST318203FC
         –  Obtain from www.seagate.com
         –  RPM = 10000 = 6.0ms = 166/s
         –  Avg read seek = 5.2ms
         –  Avg write seek = 6.0ms
         –  Avg transfer rate = 24.5 MB/s
         –  Random IOPS
           •  Approx 166/s for small requests
           •  Approx 24.5/size for large requests




2009              Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 78
Mirrored Disks

       •  All writes go to both disks
       •  Read policy alternatives
          –  All reads from one side
          –  Alternate from side to side
          –  Split by block number to reduce seek
          –  Read both and use first to respond
       •  Simple Capacity Assumption
          –  Assume duplicated interconnects
          –  Same capacity as unmirrored




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 79
Concatenated and Fat
       Stripe Disks
       •  Request size less than interlace
       •  Requests go to one disk
       •  Single threaded requests
         –  Same capacity as single disk
       •  Multithreaded requests
         –  Same service time as one disk
         –  Throughput of N disks if more than N threads are evenly
            distributed




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 80
Striped Disks

       •  Request size more than interlace
       •  Requests split over N disks
         –  Single and multithreaded requests
         –  N = request size / interlace
         –  Throughput of N disks
       •  Service Time Reduction
         –  Reduced size of request reduces service time for large
            transfers
         –  Need to wait for all disks to complete - slowest dominates




2009               Solaris/Linux Performance Measurement and Tuning      5/1/09   Slide 81
RAID5 for Small
       Requests
                                                                      log
       •  Writes must calculate parity
         –  Read parity and old data blocks
         –  Calculate new parity
         –  Write log and data and parity
         –  Triple service time
         –  One third throughput of one disk
       •  Read performs like stripe
         –  Throughput of N-1, service of one
         –  Degraded mode throughput about one




2009               Solaris/Linux Performance Measurement and Tuning          5/1/09   Slide 82
RAID5 for Large
       Requests
                                                                       log
       •  Write full stripe and parity
       •  Capacity similar to stripe
          –  Similar read and write performance
          –  Throughput of N-1 disks
          –  Service time for size reduced by N-1
          –  Less interconnect load than mirror
       •  Degraded Mode
          –  Throughput halved and service similar
          –  Extra CPU used to regenerate data




2009                Solaris/Linux Performance Measurement and Tuning          5/1/09   Slide 83
Cached RAID5

       •  Nonvolatile cache
         –  No need for recovery log disk
       •  Fast service time for writes
         –  Interconnect transfer time only
       •  Cache optimizes RAID5
         –  Makes all backend writes full stripe




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 84
Cached Stripe

       •  Write caching for stripes
         –  Greatly reduced service time
         –  Very worthwhile for small transfers
         –  Large transfers should not be cached
         –  In many cases, 128KB is crossover point from small to large
       •  Optimizations
         –  Rewriting same block cancels in cache
         –  Small sequential writes coalesce




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 85
Capacity Model Measurements

       •  Derived from iostat outputs
       extended disk statistics
       disk       r/s    w/s     Kr/s      Kw/s wait actv         svc_t      %w   %b
       sd9       33.1    8.7    271.4      71.3    0.0     2.3        15.8    0   27
       •  Utilization U = %b / 100 = 0.27
       •  Throughput X = r/s + w/s = 41.8
       •  Size K = Kr/s + Kw/s / X = 8.2K
       •  Concurrency N = actv = 2.3
       •  Service time S = U / X = 6.5ms
       •  Response time R = svc_t = 15.8ms




2009               Solaris/Linux Performance Measurement and Tuning                5/1/09   Slide 86
Cache Throughput

       •  Hard to model clustering and write cancellation
          improvements
       •  Make pessimistic assumption that throughput is unchanged
       •  Primary benefit of cache is fast response time
       •  Writes can flood cache and saturate back-end disks
         –  Service times suddenly go from 3ms to 300ms
         –  Very hard to figure out when this will happen
         –  Paranoia is a good policy….




2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 87
Concluding Summary
       Walk out of here with the most useful content fresh in your mind!




2009          Solaris/Linux Performance Measurement and Tuning             5/1/09   Slide 88
Quick Tips #1 - Disk

       •  The system will usually have a disk bottleneck
       •  Track how busy is the busiest disk of all
       •  Look for unbalanced, busy or slow disks with iostat
       •  Options: timestamp, look for busy controllers, ignore idle disks:
       % iostat -xnzCM -T d 30
       Tue Jan 21 09:19:21 2003                 extended device statistics
          r/s    w/s   Mr/s    Mw/s wait actv wsvc_t asvc_t %w %b device
        141.0    8.6    0.6     0.0 0.0 1.5      0.0   10.0   0 25 c0
          3.3    0.0    0.0     0.0 0.0 0.0      0.0    6.5   0   2 c0t0d0
        137.7    8.6    0.6     0.0 0.0 1.5      0.0   10.1   0 74 c0t1d0

       Watch out for sd_max_throttle limiting throughput when set too low
       Watch out for RAID cache being flooded on writes, causes sudden very
        large increase in write service time



2009                  Solaris/Linux Performance Measurement and Tuning        5/1/09   Slide 89
Quick Tips #2 - Network

       •  If you ever see a slow machine that also appears to be idle, you should
          suspect a network lookup problem. i.e. the system is waiting for some
          other system to respond.
       •  Poor Network Filesystem response times may be hard to see
          –    Use iostat -xn 30 on a Solaris client
          –    wsvc_t is the time spent in the client waiting to send a request
          –    asvc_t is the time spent in the server responding
          –    %b will show 100% whenever any requests are being processed, it does NOT
               mean that the network server is maxed out, as an NFS server is a complex
               system that can serve many requests at once.

       •  Name server delays are also hard to detect
          –  Overloaded LDAP or NIS servers can cause problems
          –  DNS configuration errors or server problems often cause 30s delays as the
             request times out


2009                    Solaris/Linux Performance Measurement and Tuning           5/1/09   Slide 90
Quick Tips #3 - Memory

       •  Avoid the common vmstat misconceptions
          –  The first line is average since boot, so ignore it
       •  Linux, Other Unix and earlier Solaris Releases
          –  Ignore “free” memory
          –  Use high page scanner “sr” activity as your RAM shortage indicator
       •  Solaris 8 and Later Releases
          –  Use “free” memory to see how much is left for code to use
          –  Use non-zero page scanner “sr” activity as your RAM shortage indicator
       •  Don’t panic when you see page-ins and page-outs in vmstat
       •  Normal filesystem activity uses paging
       solaris9% vmstat 30
       kthr      memory            page            disk           faults     cpu
       rbw     swap free re    mf pi po fr de sr f0 s0 s1 s6    in   sy  cs us sy id
       0 0 0 2367832 91768 3   31 2 1 1 0 0 0 0 0 0            511  404 350 0 0 99
       0 0 0 2332728 75704 3   29 0 0 0 0 0 0 0 0 0            508  537 410 0 0 99




2009                   Solaris/Linux Performance Measurement and Tuning                5/1/09   Slide 91
Quick Tips #4 - CPU

       •  Look for a long run queue (vmstat procs r) - and add CPUs
          –  To speedup with a zero run queue you need faster CPUs, not more of them

       •  Check for CPU system time dominating user time
          –  Most systems should have lots more Usr than Sys, as they are running
             application code
          –  But... dedicated NFS servers should be 100% Sys
          –  And... dedicated web servers have high Sys as well
          –  So... assume that lots of network service drives Sys time

       •  Watch out for processes that hog the CPU
          –  Big problem on user desktop systems - look for looping web browsers
          –  Web search engines may get queries that loop
          –  Use resource management or limit cputime (ulimit -t) in startup scripts to
             terminate web queries



2009                  Solaris/Linux Performance Measurement and Tuning                    5/1/09   Slide 92
Quick Tips #5 - I/O Wait

       •  Look for processes blocked waiting for disk I/O (vmstat procs b)
          –  This is what causes CPU time to be counted as wait not idle
          –  Nothing else ever causes CPU wait time!
       •  CPU wait time is a subset of idle time, consumes no resources
          –  CPU wait time is not calculated properly on multiprocessor machines
             on older Solaris releases, it is greatly inflated!
          –  CPU wait time is no longer calculated, zero in Solaris 10
          –  Bottom line - don’t worry about CPU wait time, it’s a broken metric
       •  Look at individual process wait time using microstates
          –  prstat -m or SE toolkit process monitoring
       •  Look at I/O wait time using iostat asvc_t


2009                 Solaris/Linux Performance Measurement and Tuning         5/1/09   Slide 93
Quick Tips #6 - iostat

       •  For Solaris remember “expenses” iostat -xPncez 30
       •  Add -M for Megabytes, and -T d for timestamped logging
       •  Use 30 second interval to avoid spikes in load. Watch
          asvc_t which is the response time for Solaris
       •  Look for regular disks over 5% busy that have response
          times of more than 10ms as a problem.
       •  If you have cached hardware RAID, look for response
          times of more than 5ms as a problem.
       •  Ignore large response times on idle disks that have
          filesystems - its not a problem and the cause is the fsflush
          process

2009               Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 94
Recipe to fix a slow system
       •  Essential Background Information
          –    What is the business function of the system?
          –    Who and where are the users?
          –    Who says there is a problem, and what is slow?
          –    What changed recently and what is on the way?
       •  What is the system configuration?
          –  CPU/RAM/Disk/Net/OS/Patches, what application software is in use?
       •  What are the busy processes on the system doing?
          –  use top, prstat, pea.se or /usr/ucb/ps uax | head
       •  Report CPU and disk utilization levels, iostat -xPncezM -T d 30
          –  What is making the disks busy?
       •  What is the network name service configuration?
          –  How much network activity is there? Use netstat -i 30 or nx.se 30
       •  Is there enough memory?
          –  Check free memory and the scan rate with vmstat 30


2009                   Solaris/Linux Performance Measurement and Tuning          5/1/09   Slide 95
Further Reading - Books

       General Solaris/Unix/Linux Performance Tuning
          –  System Performance Tuning (2nd Edition) by Gian-Paolo D. Musumeci and Mike
             Loukides; O'Reilly & Associates
       Solaris Performance Tuning Books
          –  Solaris Performance and Tools, Richard McDougall, Jim Mauro, Brendan Gregg;
             Prentice Hall
          –  Configuring and Tuning Databases on the Solaris Platform, Allan Packer; Prentice Hall
          –  Sun Performance and Tuning, by Adrian Cockcroft and Rich Pettit; Prentice Hall
       Sun BluePrints™
          –  Capacity Planning for Internet Services, Adrian Cockcroft and Bill Walker; Prentice Hall
          –  Resource Management, Richard McDougall, Adrian Cockcroft et al. Prentice Hall
       Linux
          –  Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D.
             Sherer
          –  Google has a Linux specific search mode http://www.google.com/linux




2009                    Solaris/Linux Performance Measurement and Tuning                        5/1/09   Slide 96
Questions?
                            (The End)




2009   Solaris/Linux Performance Measurement and Tuning   5/1/09   Slide 97

More Related Content

What's hot

OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
ScyllaDB
 

What's hot (20)

Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Open vSwitch Offload: Conntrack and the Upstream Kernel
Open vSwitch Offload: Conntrack and the Upstream KernelOpen vSwitch Offload: Conntrack and the Upstream Kernel
Open vSwitch Offload: Conntrack and the Upstream Kernel
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Getting Started with EasyBuild - Tutorial Part 2
Getting Started with EasyBuild - Tutorial Part 2Getting Started with EasyBuild - Tutorial Part 2
Getting Started with EasyBuild - Tutorial Part 2
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
OpenStack Orchestration (Heat)
OpenStack Orchestration (Heat)OpenStack Orchestration (Heat)
OpenStack Orchestration (Heat)
 
Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux Kernel
 

Viewers also liked

Viewers also liked (20)

Solaris 11.2 What's New
Solaris 11.2 What's NewSolaris 11.2 What's New
Solaris 11.2 What's New
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
 
IBM POWER8 as an HPC platform
IBM POWER8 as an HPC platformIBM POWER8 as an HPC platform
IBM POWER8 as an HPC platform
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Bitcoin explained
Bitcoin explainedBitcoin explained
Bitcoin explained
 
Blockchain
BlockchainBlockchain
Blockchain
 
Oracle Solaris Software Integration
Oracle Solaris Software IntegrationOracle Solaris Software Integration
Oracle Solaris Software Integration
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
Puppet + Windows Nano Server
Puppet + Windows Nano ServerPuppet + Windows Nano Server
Puppet + Windows Nano Server
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3
 
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud Infrastructure
 
The Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSThe Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPS
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 

Similar to Solaris Linux Performance, Tools and Tuning

Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
Ross Lawley
 
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
MySQLConference
 

Similar to Solaris Linux Performance, Tools and Tuning (20)

Quixote
QuixoteQuixote
Quixote
 
Inception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premiumInception Pack Vol 2: Bizarre premium
Inception Pack Vol 2: Bizarre premium
 
Magee Dday2 Fixing App Performance Italiano
Magee Dday2 Fixing App Performance ItalianoMagee Dday2 Fixing App Performance Italiano
Magee Dday2 Fixing App Performance Italiano
 
Deploying and Scaling using AWS
Deploying and Scaling using AWSDeploying and Scaling using AWS
Deploying and Scaling using AWS
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Maintaining and Caring for your EPM Environment
Maintaining and Caring for your EPM EnvironmentMaintaining and Caring for your EPM Environment
Maintaining and Caring for your EPM Environment
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
 
Lightweight Grids With Terracotta
Lightweight Grids With TerracottaLightweight Grids With Terracotta
Lightweight Grids With Terracotta
 
Sol linux cmg-t_1_1.pptx
Sol linux cmg-t_1_1.pptxSol linux cmg-t_1_1.pptx
Sol linux cmg-t_1_1.pptx
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Ruby
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Care and Maintenance of Your EPM Environment
Care and Maintenance of Your EPM EnvironmentCare and Maintenance of Your EPM Environment
Care and Maintenance of Your EPM Environment
 
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
 
Storage Sizing for SAP
Storage Sizing for SAPStorage Sizing for SAP
Storage Sizing for SAP
 
Make Oracle scream with Flash Storage - Kaminario
Make Oracle scream with Flash Storage - KaminarioMake Oracle scream with Flash Storage - Kaminario
Make Oracle scream with Flash Storage - Kaminario
 
Netcetera Proactive Management Service
Netcetera Proactive Management ServiceNetcetera Proactive Management Service
Netcetera Proactive Management Service
 

More from Adrian Cockcroft

More from Adrian Cockcroft (20)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 

Recently uploaded

Recently uploaded (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Solaris Linux Performance, Tools and Tuning

  • 1. Solaris/Linux Performance Measurement, Tools and Tuning Adrian Cockcroft, acockcroft@netflix.com May 1, 2009 2009 5/1/09 Page 1
  • 2. Abstract •  This course focuses on the measurement sources and tuning parameters available in Unix and Linux, including TCP/IP measurement and tuning, workload analysis, complex storage subsystems, and with a deep dive on advanced Solaris metrics such as microstates and extended system accounting. •  The meaning and behavior of metrics is covered in detail. Common fallacies, misleading indicators, sources of measurement error and other traps for the unwary will be exposed. •  Free tools for Capacity Planning are covered in detail in a different slide deck, interleaved for this event. •  Updated slide decks live at http://www.slideshare.net/adrianco 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 2
  • 3. Sources •  Adrian Cockcroft –  Sun Microsystems 1988-2004, Distinguished Engineer –  eBay Research Labs 2004-2007, Distinguished Engineer –  Netflix 2007, Director - Web Engineering – Personalization Systems –  CMG 2007 Michelson Award Winner for lifetime contribution to computer measurement –  Note: I am a Netflix employee, but this material does not refer to and is not endorsed by Netflix. It is based on the author's work over the last 20+ years. •  Books by the author –  Sun Performance and Tuning, Prentice Hall, 1994, 1998 (2nd Ed) –  Resource Management, Prentice Hall, 2000 –  Capacity Planning for Internet Services, Prentice Hall, 2001 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 3
  • 4. Contents •  Capacity Planning and Performance Definitions •  Workload Characteristics and Analysis •  Implications of Virtualization and Cloud Computing •  Metric collection interfaces •  Free Tools for capacity planning (separate slide deck) •  CPU - measurement issues and virtualization •  Network - Internet Servers and TCP/IP essentials •  Memory – The memory-go-round, Swap space instrumentation •  Disks - virtualization, SSDs, filesystems, simple disks and RAID •  Quick tips and Recipes •  References 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 4
  • 5. Definitions 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 5
  • 6. Capacity Planning Definitions •  Capacity –  Resource utilization and headroom •  Planning –  Predicting future needs by analyzing historical data and modeling future scenarios •  Performance Monitoring –  Collecting and reporting on performance data •  Unix/Linux (apologies to users of OSX, HP-UX, AIX etc.) –  Emphasis on Solaris and Linux –  Much of the discussion is independent of the OS 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 6
  • 7. Measurement Terms and Definitions •  Bandwidth - gross work per unit time [unattainable] •  Throughput - net work per unit time •  Peak throughput - at maximum acceptable response time •  Utilization - busy time relative to elapsed time [can be misleading] •  Queue length - number of requests waiting •  Service time - time to process a unit of work after waiting •  Response time - time to complete a unit of work including waiting •  Key Performance Indicator (KPI) – a measurement you have decided to watch because it has some business value 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 7
  • 8. Service Level Agreements (SLA) •  Behavioral goals for the system in terms of KPIs •  Response time target –  Rule of thumb: Estimate 95th percentile response time as three times mean response time –  e.g. if SLA says 1 second response, measured average should be less than 333ms •  Utilization Target (a proxy for Response Time) –  Specified as a minimum and maximum –  Minimum utilization target to keep costs down –  Maximum utilization target for good response times and capacity headroom for future workload fluctuations 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 8
  • 9. Capacity Planning Requirements •  We care about CPU, Memory, Network and Disk resources, and Application response times •  We need to know how much of each resource we are using now, and will use in the future •  We need to know how much headroom we have to handle higher loads •  We want to understand how headroom varies, and how it relates to application response times and throughput •  The application workload must be characterized so we can understand and manage system behaviours •  We want to be able to find the bottleneck in an under-performing system 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 9
  • 10. Workloads 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 10
  • 11. Workload Characteristics: One by One Constant Workloads •  e.g. Numerical computation, compute intensive batch •  Trivial to model, utilization and duration define the work 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 11
  • 12. Simple Random Arrivals •  Random arrival of transactions with fixed mean service time –  Little’s Law: QueueLength = Throughput * Response –  Utilization Law: Utilization = Throughput * ServiceTime •  Complex models are often reduced to this model –  By averaging over longer time periods since the formulas only work if you have stable averages –  By wishful thinking (i.e. how to fool yourself) •  e.g. Unix Load Average is actually CPU Queue Length –  Throughput up a little, load average up a lot = slow system –  So load average is a proxy metric for response time –  High load average per CPU implies slow response times 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 12
  • 13. Mixed random arrivals of transactions with stable mean service times •  Think of the grocery store checkout analogy –  Trolleys full of shopping vs. baskets full of shopping –  Baskets are quick to service, but get stuck behind trolleys –  Relative mixture of transaction types starts to matter •  Many transactional systems handle a mixture –  Databases, web services •  Consider separating fast and slow transactions –  So that we have a “10 items or less” line just for baskets –  Separate pools of servers for different services –  Don’t mix OLTP with DSS queries in databases •  Performance is often thread-limited –  Thread limit and slow transactions constrains maximum throughput –  Throughput = Queue / ResponseTime •  Model using analytical solvers like PDQ 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 13
  • 14. Load dependent servers – non-stable mean service times •  Mean service time increases at high throughput –  Due to non-scalable algorithms, lock contention –  System runs out of memory and starts paging or frequent GC •  Systems have “tipping points” –  Hysteresis means they don’t come back when load drops –  This is why you have to kill catatonic systems •  Model using simulation tools like Hyperformix, Opnet –  Behaviour is non-linear and hard to model –  Practical option is to avoid tipping points –  Best designs shed load to be stable at the limit 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 14
  • 15. Self-similar / fractal workloads – bursty rather than random •  Self-similar –  Looks “random” at close up, stays “random” as you zoom out –  Work arrives in bursts, transactions aren’t independent –  Bursts cluster together in super-bursts, etc. •  Network packet streams tend to be fractal •  Common in practice, too hard to model –  Probably the most common reason why your model is wrong! 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 15
  • 16. State Dependent Services •  Personalized services that store user history –  Transactions for new users are quick –  Transactions for users with lots of state/history are slower –  As user base builds state and ages you get into lots of trouble… •  Social Networks, Recommendation Services –  Facebook, Flickr, Netflix, Pandora, Twitter etc. •  “Abandon hope all ye who enter here” –  Not tractable to model, repeatable tests are tricky –  Long fat tail response time distribution and timeouts –  Excessively long service times for some users –  Solutions: careful algorithm design, lots of caching 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 16
  • 17. Workload Modelling Survivalism •  Simplify the workload algorithms –  move from hard or impossible to simpler models –  use caching and pre-compute to get constant service times •  Stand further away –  averaging is your friend – gets rid of complex fluctuations •  Minimalist Models –  most models are far too complex – the classic beginners error… –  the art of modelling is to only model what really matters •  Don’t model details you don’t use –  model peak hour of the week, not day to day fluctuations –  e.g. “Will the web site survive next Sunday night?” 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 17
  • 18. Metrics 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 18
  • 19. Measurement Data Interfaces •  Several generic raw access methods –  Read the kernel directly –  Structured system data –  Process data –  Network data –  Accounting data –  Application data •  Command based data interfaces –  Scrape data from vmstat, iostat, netstat, sar, ps –  Higher overhead, lower resolution, missing metrics •  Data available is always platform and release specific… 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 19
  • 20. Reading kernel memory - kvm •  The only way to get data in very old Unix variants •  Use kernel namelist symbol table and open /dev/kmem •  Solaris wraps up interface in kvm library •  Advantages –  Still the only way to get at some kinds of data –  Low overhead, fast bulk data capture •  Disadvantages –  Too much intimate implementation detail exposed –  No locking protection to ensure consistent data –  Highly non-portable, unstable over releases and patches –  Tools break when kernel moves between 32 and 64bit address support 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 20
  • 21. Structured Kernel Statistics - kstat •  Solaris 2 introduced kstat and extended usage in each release •  Used by Solaris 2 vmstat, iostat, sar, network interface stats, etc. •  Advantages –  The recommended and supported Solaris metric access API –  Does not require setuid root commands to access for reads –  Individual named metrics stable over releases –  Consistent data using locking, but low overhead –  Unchanged when kernel moves to 64bit address support –  Extensible to add metrics without breaking existing code •  Disadvantages –  Somewhat complex hierarchical kstat_chain structure –  State changes (device online/offline) cause kstat_chain rebuild 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 21
  • 22. Kernel Trace - TNF, Dtrace, ktrace •  Solaris, Linux, Windows and other Unixes have similar features –  Solaris has TNF probes and prex command to control them –  User level probe library for hires tracepoints allows instrumentation of multithreaded applications –  Kernel level probes allow disk I/O and scheduler tracing •  Advantages –  Low overhead, microsecond resolution –  I/O trace capability is extremely useful •  Disadvantages –  Too much data to process with simple tracing capabilities –  Trace buffer can overflow or cause locking issues 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 22
  • 23. Dtrace – Dynamic Tracing •  One of the most exiting new features in Solaris 10, rave reviews •  Book: quot;Solaris Performance and Toolsquot; by Richard McDougall and Brendan Gregg •  Advantages –  No overhead when it is not in use –  Low overhead probes can be put anywhere/everywhere –  Trace data is correlated and filtered at source, get exactly the data you want, very sophisticated data providers included –  Bundled, supported, designed to be safe for production systems •  Disadvantages –  Solaris specific, but being ported to BSD/Linux –  No high level tools support yet –  Yet another (awk-like) scripting language to learn 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 23
  • 24. Hardware counters •  Solaris cpustat for X86 and UltraSPARC pipeline and cache counters •  Solaris busstat for server backplanes and I/O buses, corestat for multi-core systems •  Intel Trace Collector, Vampir for Linux •  Most modern CPUs and systems have counters •  Advantages –  See what is really happening, more accurate than kernel stats –  Cache usage useful for tuning code algorithms –  Pipeline usage useful for HPC tuning for megaflops –  Backplane and memory bank usage useful for database servers •  Disadvantages –  Raw data is confusing, lots of architectural background info needed –  Most tools focus on developer code tuning 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 24
  • 25. Configuration information •  System configuration data comes from too many sources! –  Solaris device tree displayed by prtconf and prtdiag –  Solaris 8 adds dynamic configuration notification device picld –  SCSI device info using iostat -E in Solaris –  Logical volume info from product specific vxprint and metastat –  Hardware RAID info from product specific tools –  Critical storage config info must be accessed over ethernet… –  Linux device tree in /proc is a bit easier to navigate •  It is very hard to combine all this data! •  DMTF CIM objects try to address this, but no-one seems to use them… •  Free tool - Config Engine: http://www.cfengine.org 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 25
  • 26. Application instrumentation Examples •  Oracle V$ Tables – detailed metrics used by many tools •  Apache logging for web services •  ARM standard instrumentation •  Custom do-it-yourself and log file scraping •  Advantages –  Focussed application specific information –  Business metrics are needed to do real capacity planning •  Disadvantages –  No common access methods –  ARM is a collection interface only, vendor specific tools, data –  Very few applications are instrumented, even fewer have support from performance tools vendors 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 26
  • 27. Kernel values, tunables and defaults •  There is often far too much emphasis on kernel tweaks –  There really are few “magic bullet” tunables –  It rarely makes a significant difference •  Fix the system configuration or tune the application instead! •  Very few adjustable components –  “No user serviceable parts inside” –  But Unix has so much history people think it is like a 70’s car –  Solaris really is dynamic, adaptive and self-tuning –  Most other “traditional Unix” tunables are just advisory limits –  Tweaks may be workarounds for bugs/problems –  Patch or OS release removes the problem - remove the tweak Solaris Tunable Parameters Reference Manual (if you must…) –  http://docs.sun.com/app/docs/doc/817-0404 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 27
  • 28. Process based data - /proc •  Used by ps, proctool and debuggers, pea.se, proc(1) tools on Solaris •  Solaris and Linux both have /proc/pid/metric hierarchy •  Linux also includes system information in /proc rather than kstat •  Advantages –  The recommended and supported process access API –  Metric data structures reasonably stable over releases –  Consistent data using locking –  Solaris microstate data provides accurate process state timers •  Disadvantages –  High overhead for open/read/close for every process –  Linux reports data as ascii text, Solaris as binary structures 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 28
  • 29. Network protocol data •  Based on a streams module interface in Solaris •  Solaris 2 ndd interface used to configure protocols and interfaces •  Solaris 2 mib interface used by netstat -s and snmpd to get TCP stats etc. •  Advantages –  Individual named metrics reasonably stable over releases –  Consistent data using locking –  Extensible to add metrics without breaking existing code –  Solaris ndd can retune TCP online without reboot –  System data is often also made available via SNMP prototcol •  Disadvantages –  Underlying API is not supported, SNMP access is preferred 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 29
  • 30. Tracing and profiling •  Tracing Tools –  truss - shows system calls made by a process –  sotruss / apitrace - shows shared library calls –  prex - controls TNF tracing for user and kernel code –  snoop/tcpdump – network traces for analysis with wireshark •  Profiling Tools –  Compiler profile feedback using -xprofile=collect and use –  Sampled profile relink using -p and prof/gprof –  Function call tree profile recompile using -pg and gprof –  Shared library call profiling setenv LD_PROFILE and gprof •  Accurate CPU timing for process using /usr/proc/bin/ptime •  Microstate process information using pea.se and pw.se 10:40:16 name lwmx pid ppid uid usr% sys% wait% chld% size rss pf nis_cachemgr 5 176 1 0 1.40 0.19 0.00 0.00 16320 11584 0.0 jre 1 17255 3184 5743 11.80 0.19 0.00 0.00 178112 110336 0.0 sendmail 1 16751 1 0 1.01 0.43 0.00 0.43 18624 16384 0.0 se.sparc.5.6 1 16741 1186 9506 5.90 0.47 0.00 0.00 16320 14976 0.0 imapd 1 16366 198 5710 6.88 1.09 1.02 0.00 34048 29888 0.1 dtmail 10 16364 9070 5710 0.75 1.12 0.00 0.00 102144 94400 0.0 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 30
  • 31. Free Tools (See Separate Slide Deck) http://www.slideshare.net/adrianco 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 31
  • 32. Headroom 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 32
  • 33. What would you say if you were asked: How busy is that system? A: I have no idea… A: 10% A: Why do you want to know? A: I’m sorry, you don’t understand your question…. 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 33
  • 34. Headroom Estimation •  CPU Capacity –  Relatively easy to figure out •  Network Usage –  Use bytes not packets/s •  Memory Capacity –  Tricky - easier in Solaris 8 •  Disk Capacity –  Can be very complex 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 34
  • 35. Headroom •  Headroom is available usable resources –  Total Capacity minus Peak Utilization and Margin –  Applies to usr+sysRAM, Net, Disk and OS CPU, CPU for Peak Period 100 Margin 90 80 Headroom 70 60 CPU % 50 40 Utilization 30 20 10 0 Time 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 35
  • 36. Utilization •  Utilization is the proportion of busy time •  Always defined over a time interval OnCPU Scheduling for Each CPU Mean CPU Util OnCPU and 0.56 usr+sys CPU for Peak Period 100 0 90 80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 70 Microseconds 60 CPU % 50 40 Utilization 30 20 10 0 Time 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 36
  • 37. Response Time •  Response Time = Queue time + Service time •  The Usual Assumptions… –  Steady state averages –  Random arrivals –  Constant service time –  M servers processing the same queue •  Approximations –  Queue length = Throughput * Response Time (Little's Law) –  Utilzation = Throughput * Service Time (utilization law) –  Response Time = Service Time / (1 - UtilizationM) 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 37
  • 38. Response Time Curves The traditional view of Utilization as a proxy for response time Systems with many CPUs can run at higher utilization levels, but degrade more rapidly when they run out of capacity Headroom margin should be set according to a response time target. Response Time Curves R = S / (1 - (U%)m) 10.00 Response Time Increase Factor 9.00 8.00 One CPU 7.00 Two CPUs 6.00 Four CPUs 5.00 Eight CPUs Headroom 16 CPUs 4.00 margin 32 CPUs 3.00 64 CPUs 2.00 1.00 0.00 0 10 20 30 40 50 60 70 80 90 100 Total System Utilization % 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 38
  • 39. So what's the problem with Utilization? •  Unsafe assumptions! Complex adaptive systems are not simple! •  Random arrivals? –  Bursty traffic with long tail arrival rate distribution •  Constant service time? –  Variable clock rate CPUs, inverse load dependent service time –  Complex transactions, request and response dependent •  M servers processing the same queue? –  Virtual servers with varying non-integral concurrency –  Non-identical servers or CPUs, Hyperthreading, Multicore, NUMA •  Measurement Errors? –  Mechanisms with built in bias, e.g. sampling from the scheduler clock –  Platform and release specific systemic changes in accounting of interrupt time 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 39
  • 40. Variable Clock Rate CPUs •  Laptop and other low power devices do this all the time –  Watch CPU usage of a video application and toggle mains/battery power…. •  Server CPU Power Optimization - AMD PowerNow!™ –  AMD Opteron server CPU detects overall utilization and reduces clock rate –  Actual speeds vary, but for example could reduce from 2.6GHz to 1.2GHz –  Changes are not understood or reported by operating system metrics –  Speed changes can occur every few milliseconds (thermal shock issues) –  Dual core speed varies per socket, Quad core varies per core –  Quad core can dynamically stop entire cores to save power •  Possible scenario: –  You estimate 20% utilization at 2.6GHz –  You see 45% reported in practice (at 1.2GHz) –  Load doubles, reported utilization drops to 40% (at 2.6GHz) –  Actual mapping of utilization to clock rate is unknown at this point •  Note: Older and quot;low powerquot; Opterons used in blades fix clock rate 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 40
  • 41. Virtual Machine Monitors •  VMware, Xen, IBM LPARs etc. –  Non-integral and non-constant fractions of a machine –  Naiive operating systems and applications that don't expect this behavior –  However, lots of recent tools development from vendors •  Average CPU count must be reported for each measurement interval •  VMM overhead varies, application scaling characteristics may be affected 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 41
  • 42. Threaded CPU Pipelines •  CPU microarchitecture optimizations –  Extra register sets working with one execution pipeline –  When the CPU stalls on a memory read, it switches registers/threads –  Operating system sees multiple schedulable entities (CPUs) •  Intel Hyperthreading –  Each CPU core has an extra thread to use spare cycles –  Typical benefit is 20%, so total capacity is 1.2 CPUs –  I.e. Second thread much slower when first thread is busy –  Hyperthreading aware optimizations in recent operating systems •  Sun “CoolThreads” –  quot;Niagaraquot; SPARC CPU has eight cores, one shared floating point unit –  Each CPU core has four threads, but each core is a very simple design –  Behaves like 32 slow CPUs for integer, snail like uniprocessor for FP –  Overall throughput is very high, performance per watt is exceptional –  Niagara 2 has dedicated FPU and 8 threads per core (total 64 threads) 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 42
  • 43. Measurement Errors •  Mechanisms with built in bias –  e.g. sampling from the scheduler clock underestimates CPU usage –  Solaris 9 and before, Linux, AIX, HP-UX “sampled CPU time” –  Solaris 10 and HP-UX “measured CPU time” far more accurate –  Solaris microstate process accounting always accurate but in Solaris 10 microstates are also used to generate system-wide CPU •  Accounting of interrupt time –  Platform and release specific systemic changes –  Solaris 8 - sampled interrupt time spread over usr/sys/idle –  Solaris 9 - sampled interrupt time accumulated into sys only –  Solaris 10 - accurate interrupt time spread over usr/sys/idle –  Solaris 10 Update 1 - accurate interrupt time in sys only 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 43
  • 44. CPU time measurements •  Biased sample CPU measurements –  See 1998 Paper quot;Unix CPU Time Measurement Errorsquot; –  Microstate measurements are accurate, but are platform and tool specific. Sampled metrics are more inaccurate at low utilization •  CPU time is sampled by the 100Hz clock interrupt –  sampling theory says this is accurate for an unbiased sample –  the sample is very biased, as the clock also schedules the CPU –  daemons that wakeup on the clock timer can hide in the gaps –  problem gets worse as the CPU gets faster •  Increase clock interrupt rate? (Solaris) –  set hires_tick=1 sets rate to 1000Hz, good for realtime wakeups –  harder to hide CPU usage, but slightly higher overhead •  Use measured CPU time at per-process level –  microstate accounting takes timestamp on each state change –  very accurate and also provides extra information –  still doesn’t allow for interrupt overhead –  Prstat -m and the pea.se command uses this accurate measurement 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 44
  • 45. More CPU Measurement Issues •  Load average differences –  Just includes CPU queue (Solaris) –  Includes CPU and Disk (Linux) – which is a broken metric •  Wait for I/O is a misleading subset of idle time –  Metric removed in Solaris 10 – always zero –  Ignore it in all other Unix/Linux releases –  Only makes sense on uni-processor systems 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 45
  • 46. How to plot Headroom •  Measure and report absolute CPU power if you can get it… •  Plot shows headroom in blue, margin in red, total power tracking day/ night workload variation, plotted as mean + two standard deviations. 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 46
  • 47. “Cockcroft Headroom Plot” •  Scatter plot of response time (ms) vs. Throughput (KB) from iostat metrics •  Histograms on axes •  Throughput time series plot •  Shows distributions and shape of response time •  Fits throughput weighted inverse gaussian curve •  Coded using quot;Rquot; statistics package •  Blogged development at http://perfcap.blogspot.com/search?q=chp 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 47
  • 48. How busy is that system again? •  Check your assumptions… •  Record and plot absolute capacity for each measurement interval •  Plot response time as a function of throughput, not just utilization •  SOA response characteristics are complicated… •  More detailed discussion in CMG06 Paper and blog entries –  “Utilization is Virtually Useless as a Metric” - Adrian Cockcroft - CMG06 http://perfcap.blogspot.com/search?q=utilization http://perfcap.blogspot.com/search?q=chp 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 48
  • 49. CPU 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 49
  • 50. CPU Capacity Measurements •  CPU Capacity is defined by CPU type and clock rate, or a benchmark rating like SPECrateInt2000 •  CPU throughput - CPU scheduler transaction rate –  measured as the number of voluntary context switches •  CPU Queue length –  CPU load average gives an approximation via a time decayed average of number of jobs running and ready to run •  CPU response time –  Solaris microstate accounting measures scheduling delay •  CPU utilization –  Defined as busy time divided by elapsed time for each CPU –  Badly distorted and undermined by virtualization…… 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 50
  • 51. Controlling and CPUs in Solaris •  psrinfo - show CPU status and clock rate •  Corestat - show internal behavior of multi-core CPUs •  psradm - enable/disable CPUs •  pbind - bind a process to a CPU •  psrset - create sets of CPUs to partition a system –  At least one CPU must remain in the default set, to run kernel services like NFS threads –  All CPUs still take interrupts from their assigned sources –  Processes can be bound to sets •  mpstat shows per-CPU counters (per set in Solaris 9) CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 45 1 0 232 0 780 234 106 201 0 950 72 28 0 0 1 29 1 0 243 0 810 243 115 186 0 1045 69 31 0 0 2 27 1 0 235 0 827 243 110 199 0 1000 75 25 0 0 3 26 0 0 217 0 794 227 120 189 0 925 70 30 0 0 4 9 0 0 234 92 403 94 84 1157 0 625 66 34 0 0 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 51
  • 52. Monitoring CPU mutex lock statistics •  To fix mutex contention change the application workload or upgrade to a newer OS release •  Locking strategies are too complex to be patched •  Lockstat Command –  very powerful and easy to use –  Solaris 8 extends lockstat to include kernel CPU time profiling –  dynamically changes all locks to be instrumented –  displays lots of useful data about which locks are contending # lockstat sleep 5 Adaptive mutex spin: 3318 events Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 601 18% 18% 1.00 1 flock_lock cleanlocks+0x10 302 9% 27% 1.00 7 0xf597aab0 dev_get_dev_info+0x4c 251 8% 35% 1.00 1 0xf597aab0 mod_rele_dev_by_major+0x2c 245 7% 42% 1.00 3 0xf597aab0 cdev_size+0x74 160 5% 47% 1.00 7 0xf5b3c738 ddi_prop_search_common+0x50 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 52
  • 53. Network 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 53
  • 54. Network interface and NFS metrics •  Network interface throughput counters from kstat on Solaris –  rbytes, obytes — read and output byte counts –  multircv, multixmt — multicast byte counts –  brdcstrcv, brdcstxmt — broadcast byte counts –  norcvbuf, noxmtbuf — buffer allocation failure counts •  Linux netstat shows byte throughput, (Solaris doesn’t) •  NFS Client Statistics Shown in iostat on Solaris crun% iostat -xnP extended device Statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 crun:vold(pid363) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 servdist:/usr/dist 0.0 0.5 0.0 7.9 0.0 0.0 0.0 20.7 0 1 servhome:/export/home/adrianc 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 servhome:/var/mail 0.0 1.3 0.0 10.4 0.0 0.2 0.0 128.0 0 2 c0t2d0s0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t2d0s2 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 54
  • 55. TCP - A Simple Approach •  Capacity and Throughput Metrics to Watch •  Connections –  Current number of established connections –  New outgoing connection rate (active opens) –  Outgoing connection attempt failure rate –  New incoming connection rate (passive opens) –  Incoming connection attempt failure rate (resets) •  Throughput –  Input and output byte rates –  Input and output segment rates –  Output byte retransmit percentage 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 55
  • 56. Obtaining Measurements •  Get the TCP MIB via SNMP or netstat -s •  Standard TCP metric names: –  tcpCurrEstab: current number of established connections –  tcpActiveOpens: number of outgoing connections since boot –  tcpAttemptFails: number of outgoing failures since boot –  tcpPassiveOpens: number of incoming connections since boot –  tcpOutRsts: number of resets sent to reject connection –  tcpEstabResets: resets sent to terminate established connections –  (tcpOutRsts - tcpEstabResets): incoming connection failures –  tcpOutDataSegs, tcpInDataSegs: data transfer in segments –  tcpRetransSegs: retransmitted segments 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 56
  • 57. Internet Server Issues •  TCP Connections are expensive –  TCP is optimized for reliable data on long lived connections –  Making a connection uses a lot more CPU than moving data –  Connection setup handshake involves several round trip delays –  Each open connection consumes about 1 KB plus data buffers •  Pending connections cause “listen queue” issues •  Each new connection goes through a “slow start” ramp up •  Other TCP Issues –  TCP windows can limit high latency high speed links –  Lost or delayed data causes time-outs and retransmissions 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 57
  • 58. TCP Sequence Diagram for HTTP Get 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 58
  • 59. Stalled HTTP Get and Persistent HTTP 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 59
  • 60. Memory 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 60
  • 61. Memory Capacity Measurements •  Physical Memory Capacity Utilization and Limits –  Kernel memory, Shared Memory segment –  Executable code, stack and heap –  File system cache usage, Unused free memory •  Virtual Memory Capacity - Paging/Swap Space –  When there is no more available swap, Unix stops working •  Memory Throughput –  Hardware counter metrics can track CPU to Memory traffic –  Page in and page out rates •  Memory Response Time –  Platform specific hardware memory latency makes a difference, but hard to measure –  Time spent waiting for page-in is part of Solaris microstate accounting 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 61
  • 62. Page Size Optimization •  Systems may support large pages for reduced overhead –  Solaris support is more dynamic/flexible than Linux at present •  Intimate Shared Memory locks large pages in RAM –  No swap space reservation –  Used for large database server Shared Global Area •  No good metrics to track usage and fragmentation issues •  Solaris ppgsz command can set heap and stack pagesize •  SPARC Architecture –  Base page size is 8KB, Large pages are 4MB •  Intel/AMD x86 Architectures –  Base page size is 4KB, Large pages are 2MB 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 62
  • 63. Cache principles •  Temporal locality - “close in time” –  If you need something frequently, keep it near you –  If you don’t use it for a while, put it back –  If you change it, save the change by putting it back •  Spacial locality - “close in space - nearby” –  If you go to get one thing, get other stuff that is nearby –  You may save a trip by prefetching things –  You can waste bandwidth if you fetch too much you don’t use •  Caches work well with randomness –  Randomness prevents worst case behaviour –  Deterministic patterns often cause cache busting accesses •  Very careful cache friendly tuning can give great speedups 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 63
  • 64. The memory go round - Unix/Linux •  Memory usage flows between subsystems Kernel System V Memory Shared Buffers Memory shm_unlink kernel kernel shmget alloc free Head delete exit Free read brk RAM write pagein List mmap reclaim reclaim Process Filesystem Stack and Ta il Cache Heap pageout pageout scanner scanner 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 64
  • 65. The memory go round - Solaris 8 and Later •  Memory usage flows between subsystems Kernel System V Memory Shared Buffers Memory shm_unlink kernel kernel shmget alloc free Head Free RAM List exit read brk delete write pagein mmap Filesystem reclaim Cache Process Stack and Ta il Heap pageout scanner 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 65
  • 66. Solaris Swap Space •  Swap is very confusing and badly instrumented! # se swap.se ani_max 54814 ani_resv 19429 ani_free 37981 availrmem 13859 swapfs_minfree 1972 ramres 11887 swap_resv 19429 swap_alloc 16833 swap_avail 47272 swap_free 49868 Misleading data printed by swap -s 134664 K allocated + 20768 K reserved = 155432 K used, 378176 K available Corrected labels: 134664 K allocated + 20768 K unallocated = 155432 K reserved, 378176 K available Mislabelled sar -r 1 freeswap (really swap available) 756352 blocks Useful swap data: Total swap 520 M available 369 M reserved 151 M Total disk 428 M Total RAM 92 M # swap -s total: 134056k bytes allocated + 20800k reserved = 154856k used, 378752k available # sar -r 1 18:40:51 freemem freeswap 18:40:52 4152 756912 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 66
  • 67. Disk 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 67
  • 68. Disk Capacity Measurements •  Detailed metrics vary by platform •  Easy for the simple disk cases •  Hard for cached RAID subsystems •  Almost Impossible for shared disk subsystems and SANs –  Another system or volume can be sharing a backend spindle, when it gets busy your own volume can saturate, even though you did not change your own workload! 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 68
  • 69. Storage Utilization •  Storage virtualization broke utilization metrics a long time ago •  Host server measures busy time on a quot;diskquot; –  Simple disk, quot;single serverquot; response time gets high near 100% utilization –  Cached RAID LUN, one I/O stream can report 100% utilization, but full capacity supports many threads of I/O since there are many disks and RAM buffering •  New metric - quot;Capability Utilizationquot; –  Adjusted to report proportion of actual capacity for current workload mix –  Measured by tools such as Ortera Atlas (http://www.ortera.com) 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 69
  • 70. Solaris Filesystem Issues ufs - standard, reliable, good for lots of small files ufs with transaction log - faster writes and recovery tmpfs - fastest if you have enough RAM, volatile NFS NFS2 - safe and common, 8KB blocks, slow writes NFS3 - more readahead and writebehind, faster default 32KB block size - fast sequential, may be slow random default TCP instead of UDP, more robust over WAN NFS4 - adds stateful behavior cachefs - good for read-mostly NFS speedup Veritas VxFS - useful on old Solaris releases Solaris 8 UFS Upgrade ufs was extended to be more competitive with VxFS transaction log unbuffered direct access option and snapshot backup capability now available “for free” with Solaris 8 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 70
  • 71. Solaris 10 ZFS - What it doesn't have.... •  Nice features –  No extra cost - its bundled in a free OS –  No volume manager - its built in –  No space management - file systems use a common pool –  No long wait for newfs to finish - create a 3TB file system in a second –  No fsck - its transactional commit means its consistent on disk –  No slow writes - disk write caches are enabled and flushed reliably –  No random or small writes - all writes are large batched sequential –  No rsync - snapshots can be differenced and replicated remotely –  No silent data corruption - all data is checksummed as it is read –  No bad archives - all the data in the file system is scrubbed regularly –  No penalty for software RAID - RAID-Z has a clever optimization –  No downtime - mirroring, RAID-Z and hot spares –  No immediate maintenance - double parity disks if you need them •  Wish-list –  No way to know how much performance headroom you have! –  No clustering support 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 71
  • 72. Linux Filesystems •  There are a large number of options! –  http://en.wikipedia.org/wiki/Comparison_of_file_systems •  EXT3 –  Common default for many Linux distributions –  Efficient for CPU and space, small block size –  relatively simple for reliability and recovery –  Journalling support options can improve performance –  EXT4 came out of development at the end of 2008 •  XFS –  Based on Silicon Graphics XFS, mature and reliable –  Better for large files and streaming throughput –  High Performance Computing heritage 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 72
  • 73. Disk Configurations •  Sequential access is ~10 times faster than random –  Sequential rates are now about 50-100 MB/s per disk –  Random rates are 166 operations/sec, (250/sec at 15000rpm) –  The size of each random read should be as big as possible •  Reads should be cached in main memory –  “The only good fast read is the one you didn’t have to do” –  Database shared memory or filesystem cache is microseconds –  Disk subsystem cache is milliseconds, plus extra CPU load –  Underlying disk is ~6ms, as its unlikely that data is in cache •  Writes should be cached in nonvolatile storage –  Allows write cancellation and coalescing optimizations –  NVRAM inside the system - Direct access to Flash storage –  Solid State Disks based on Flash are the quot;Next Big Thingquot; 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 73
  • 74. Disk Throughput 14000 12000 10000 8000 disk_wK/s disk_rK/s 6000 4000 2000 0 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 74
  • 75. Max and Avg Disk Utilization (Same data) 100 90 80 70 60 disk_max% 50 disk_avg% 40 30 20 10 0 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 75
  • 76. Data from iostat •  What can we see here? sd7 root ufs extended disk statistics disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd7 0.1 1.7 0.1 13.3 0.0 0.2 109.8 0 1 sd15 534.2 17.5 1320.4 35.0 0.0 0.3 0.6 0 26 solid state disks sd45 291.9 23.0 603.2 49.8 0.0 0.2 0.6 0 15 sd60 3.1 0.0 25.3 0.0 0.0 0.0 7.8 0 2 sd61 3.3 0.0 26.4 0.0 0.0 0.0 7.6 0 2 stripe 8K RR sd62 3.2 0.0 26.1 0.0 0.0 0.0 8.1 0 3 sd63 3.8 0.0 30.1 0.0 0.0 0.0 7.2 0 3 sd64 3.6 0.0 28.8 0.0 0.0 0.0 7.4 0 3 sd65 3.8 0.0 31.2 0.0 0.0 0.0 7.3 0 3 sd67 9.7 1.5 77.8 4.3 0.0 0.1 9.0 0 8 stripe sd68 10.7 1.4 85.3 4.2 0.0 0.1 9.0 0 10 sd69 10.0 1.5 79.9 4.2 0.0 0.1 9.0 0 9 sd70 10.4 1.0 83.1 3.2 0.0 0.1 9.1 0 9 sd71 9.9 1.4 78.8 4.6 0.0 0.1 8.7 0 9 cached write log sd72 10.0 1.1 79.9 3.7 0.0 0.1 8.5 0 8 sd75 0.0 27.6 0.0 297.3 0.0 0.0 1.1 0 2 sd210 12.1 0.3 108.9 0.6 0.0 0.1 9.8 0 10 sd211 12.9 0.4 114.8 0.7 0.0 0.1 10.6 0 11 sd212 12.0 0.6 107.1 1.3 0.0 0.1 11.1 0 10 sd213 13.8 0.3 122.2 0.9 0.0 0.2 11.1 0 11 stripe sd214 12.5 0.5 112.1 1.0 0.0 0.1 10.3 0 10 sd215 12.1 0.3 109.5 0.8 0.0 0.1 10.5 0 10 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 76
  • 77. Simple Disks •  Utilization shows capacity usage Measured using iostat %b •  Response time is svc_t svc_t increases due to waiting in the queues caused by bursty loads •  Service time per I/O is Util/IOPS Calculate as(%b/100)/(rps+wps) Decreases due to optimization of queued requests as load increases 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 77
  • 78. Single Disk Parameters •  e.g. Seagate 18GB ST318203FC –  Obtain from www.seagate.com –  RPM = 10000 = 6.0ms = 166/s –  Avg read seek = 5.2ms –  Avg write seek = 6.0ms –  Avg transfer rate = 24.5 MB/s –  Random IOPS •  Approx 166/s for small requests •  Approx 24.5/size for large requests 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 78
  • 79. Mirrored Disks •  All writes go to both disks •  Read policy alternatives –  All reads from one side –  Alternate from side to side –  Split by block number to reduce seek –  Read both and use first to respond •  Simple Capacity Assumption –  Assume duplicated interconnects –  Same capacity as unmirrored 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 79
  • 80. Concatenated and Fat Stripe Disks •  Request size less than interlace •  Requests go to one disk •  Single threaded requests –  Same capacity as single disk •  Multithreaded requests –  Same service time as one disk –  Throughput of N disks if more than N threads are evenly distributed 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 80
  • 81. Striped Disks •  Request size more than interlace •  Requests split over N disks –  Single and multithreaded requests –  N = request size / interlace –  Throughput of N disks •  Service Time Reduction –  Reduced size of request reduces service time for large transfers –  Need to wait for all disks to complete - slowest dominates 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 81
  • 82. RAID5 for Small Requests log •  Writes must calculate parity –  Read parity and old data blocks –  Calculate new parity –  Write log and data and parity –  Triple service time –  One third throughput of one disk •  Read performs like stripe –  Throughput of N-1, service of one –  Degraded mode throughput about one 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 82
  • 83. RAID5 for Large Requests log •  Write full stripe and parity •  Capacity similar to stripe –  Similar read and write performance –  Throughput of N-1 disks –  Service time for size reduced by N-1 –  Less interconnect load than mirror •  Degraded Mode –  Throughput halved and service similar –  Extra CPU used to regenerate data 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 83
  • 84. Cached RAID5 •  Nonvolatile cache –  No need for recovery log disk •  Fast service time for writes –  Interconnect transfer time only •  Cache optimizes RAID5 –  Makes all backend writes full stripe 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 84
  • 85. Cached Stripe •  Write caching for stripes –  Greatly reduced service time –  Very worthwhile for small transfers –  Large transfers should not be cached –  In many cases, 128KB is crossover point from small to large •  Optimizations –  Rewriting same block cancels in cache –  Small sequential writes coalesce 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 85
  • 86. Capacity Model Measurements •  Derived from iostat outputs extended disk statistics disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd9 33.1 8.7 271.4 71.3 0.0 2.3 15.8 0 27 •  Utilization U = %b / 100 = 0.27 •  Throughput X = r/s + w/s = 41.8 •  Size K = Kr/s + Kw/s / X = 8.2K •  Concurrency N = actv = 2.3 •  Service time S = U / X = 6.5ms •  Response time R = svc_t = 15.8ms 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 86
  • 87. Cache Throughput •  Hard to model clustering and write cancellation improvements •  Make pessimistic assumption that throughput is unchanged •  Primary benefit of cache is fast response time •  Writes can flood cache and saturate back-end disks –  Service times suddenly go from 3ms to 300ms –  Very hard to figure out when this will happen –  Paranoia is a good policy…. 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 87
  • 88. Concluding Summary Walk out of here with the most useful content fresh in your mind! 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 88
  • 89. Quick Tips #1 - Disk •  The system will usually have a disk bottleneck •  Track how busy is the busiest disk of all •  Look for unbalanced, busy or slow disks with iostat •  Options: timestamp, look for busy controllers, ignore idle disks: % iostat -xnzCM -T d 30 Tue Jan 21 09:19:21 2003 extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 141.0 8.6 0.6 0.0 0.0 1.5 0.0 10.0 0 25 c0 3.3 0.0 0.0 0.0 0.0 0.0 0.0 6.5 0 2 c0t0d0 137.7 8.6 0.6 0.0 0.0 1.5 0.0 10.1 0 74 c0t1d0 Watch out for sd_max_throttle limiting throughput when set too low Watch out for RAID cache being flooded on writes, causes sudden very large increase in write service time 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 89
  • 90. Quick Tips #2 - Network •  If you ever see a slow machine that also appears to be idle, you should suspect a network lookup problem. i.e. the system is waiting for some other system to respond. •  Poor Network Filesystem response times may be hard to see –  Use iostat -xn 30 on a Solaris client –  wsvc_t is the time spent in the client waiting to send a request –  asvc_t is the time spent in the server responding –  %b will show 100% whenever any requests are being processed, it does NOT mean that the network server is maxed out, as an NFS server is a complex system that can serve many requests at once. •  Name server delays are also hard to detect –  Overloaded LDAP or NIS servers can cause problems –  DNS configuration errors or server problems often cause 30s delays as the request times out 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 90
  • 91. Quick Tips #3 - Memory •  Avoid the common vmstat misconceptions –  The first line is average since boot, so ignore it •  Linux, Other Unix and earlier Solaris Releases –  Ignore “free” memory –  Use high page scanner “sr” activity as your RAM shortage indicator •  Solaris 8 and Later Releases –  Use “free” memory to see how much is left for code to use –  Use non-zero page scanner “sr” activity as your RAM shortage indicator •  Don’t panic when you see page-ins and page-outs in vmstat •  Normal filesystem activity uses paging solaris9% vmstat 30 kthr memory page disk faults cpu rbw swap free re mf pi po fr de sr f0 s0 s1 s6 in sy cs us sy id 0 0 0 2367832 91768 3 31 2 1 1 0 0 0 0 0 0 511 404 350 0 0 99 0 0 0 2332728 75704 3 29 0 0 0 0 0 0 0 0 0 508 537 410 0 0 99 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 91
  • 92. Quick Tips #4 - CPU •  Look for a long run queue (vmstat procs r) - and add CPUs –  To speedup with a zero run queue you need faster CPUs, not more of them •  Check for CPU system time dominating user time –  Most systems should have lots more Usr than Sys, as they are running application code –  But... dedicated NFS servers should be 100% Sys –  And... dedicated web servers have high Sys as well –  So... assume that lots of network service drives Sys time •  Watch out for processes that hog the CPU –  Big problem on user desktop systems - look for looping web browsers –  Web search engines may get queries that loop –  Use resource management or limit cputime (ulimit -t) in startup scripts to terminate web queries 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 92
  • 93. Quick Tips #5 - I/O Wait •  Look for processes blocked waiting for disk I/O (vmstat procs b) –  This is what causes CPU time to be counted as wait not idle –  Nothing else ever causes CPU wait time! •  CPU wait time is a subset of idle time, consumes no resources –  CPU wait time is not calculated properly on multiprocessor machines on older Solaris releases, it is greatly inflated! –  CPU wait time is no longer calculated, zero in Solaris 10 –  Bottom line - don’t worry about CPU wait time, it’s a broken metric •  Look at individual process wait time using microstates –  prstat -m or SE toolkit process monitoring •  Look at I/O wait time using iostat asvc_t 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 93
  • 94. Quick Tips #6 - iostat •  For Solaris remember “expenses” iostat -xPncez 30 •  Add -M for Megabytes, and -T d for timestamped logging •  Use 30 second interval to avoid spikes in load. Watch asvc_t which is the response time for Solaris •  Look for regular disks over 5% busy that have response times of more than 10ms as a problem. •  If you have cached hardware RAID, look for response times of more than 5ms as a problem. •  Ignore large response times on idle disks that have filesystems - its not a problem and the cause is the fsflush process 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 94
  • 95. Recipe to fix a slow system •  Essential Background Information –  What is the business function of the system? –  Who and where are the users? –  Who says there is a problem, and what is slow? –  What changed recently and what is on the way? •  What is the system configuration? –  CPU/RAM/Disk/Net/OS/Patches, what application software is in use? •  What are the busy processes on the system doing? –  use top, prstat, pea.se or /usr/ucb/ps uax | head •  Report CPU and disk utilization levels, iostat -xPncezM -T d 30 –  What is making the disks busy? •  What is the network name service configuration? –  How much network activity is there? Use netstat -i 30 or nx.se 30 •  Is there enough memory? –  Check free memory and the scan rate with vmstat 30 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 95
  • 96. Further Reading - Books General Solaris/Unix/Linux Performance Tuning –  System Performance Tuning (2nd Edition) by Gian-Paolo D. Musumeci and Mike Loukides; O'Reilly & Associates Solaris Performance Tuning Books –  Solaris Performance and Tools, Richard McDougall, Jim Mauro, Brendan Gregg; Prentice Hall –  Configuring and Tuning Databases on the Solaris Platform, Allan Packer; Prentice Hall –  Sun Performance and Tuning, by Adrian Cockcroft and Rich Pettit; Prentice Hall Sun BluePrints™ –  Capacity Planning for Internet Services, Adrian Cockcroft and Bill Walker; Prentice Hall –  Resource Management, Richard McDougall, Adrian Cockcroft et al. Prentice Hall Linux –  Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer –  Google has a Linux specific search mode http://www.google.com/linux 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 96
  • 97. Questions? (The End) 2009 Solaris/Linux Performance Measurement and Tuning 5/1/09 Slide 97