SlideShare a Scribd company logo
1 of 74
Download to read offline
Static Analysis
   and Code Optimizations
in Glasgow Haskell Compiler

           Ilya Sergey
        ilya.sergey@gmail.com




              12.12.12

                                1
The Goal

Discuss what happens when we run
       ghc -O MyProgram.hs




                                   2
The Plan
•   Recall how laziness is implemented in GHC and what drawbacks it
    might cause;

•   Introduce the worker/wrapper transformation -
    an optimization technique implemented in GHC;

•   Realize why we need static analysis to do the transformations;

•   Take a brief look at the GHC compilation pipeline and the Core
    language;

•   Meet two types of static analysis: forward and backwards;

•   Recall some basics of denotational semantics and take a look at the
    mathematical basics of some analyses in GHC;

•   Introduce and motivate the CPR analysis.



                                                                          3
Why Laziness
 Might be Harmful
            and

How the Harm Can Be Reduced



                              4
module Main where

import System.Environment
import Text.Printf

main = do
    [n] <- map read `fmap` getArgs
    printf "%fn" (mysum n)

mysum :: Double -> Double
mysum n = myfoldl (+) 0 [1..n]

myfoldl :: (a -> b -> a) -> a -> [b] -> a
myfoldl f z0 xs0 = lgo z0 xs0
             where
                lgo z []      = z
                lgo z (x:xs) = lgo (f z x) xs




                                                5
Compile and run
> ghc --make -RTS -rtsopts Sum.hs
> time ./Sum 1e6 +RTS -K100M
500000500000.0

real! 0m0.583s
user! 0m0.509s
sys! 0m0.068s


Compile optimized and run
> ghc --make -fforce-recomp -RTS -rtsopts -O Sum.hs
> time ./Sum 1e6
500000500000.0

real! 0m0.153s
user! 0m0.101s
sys! 0m0.011s


                                                      6
Collecting Runtime Statistics

  Profiling results for the non-optimized program
> ghc --make -RTS -rtsopts -fforce-recomp Sum.hs
> ./Sum 1e6 +RTS -sstderr -K100M



              225,137,464 bytes allocated in the heap
              195,297,088 bytes copied during GC
                      107 MB total memory in use

        INIT       time    0.00s     (    0.00s   elapsed)
        MUT        time    0.21s     (    0.24s   elapsed)
        GC         time    0.36s     (    0.43s   elapsed)
        EXIT       time    0.00s     (    0.00s   elapsed)
        Total      time    0.58s     (    0.67s   elapsed)

        %GC        time      63.2%       (64.0% elapsed)


                                                             7
Collecting Runtime Statistics

  Profiling results for the optimized program
> ghc --make -RTS -rtsopts -fforce-recomp -O Sum.hs
> ./Sum 1e6 +RTS -sstderr -K100M



              92,082,480 bytes allocated in the heap
                  30,160 bytes copied during GC
                      1 MB total memory in use

        INIT      time    0.00s     (    0.00s   elapsed)
        MUT       time    0.07s     (    0.08s   elapsed)
        GC        time    0.00s     (    0.00s   elapsed)
        EXIT      time    0.00s     (    0.00s   elapsed)
        Total     time    0.07s     (    0.08s   elapsed)

        %GC       time       1.1%       (1.4% elapsed)


                                                            8
Time Profiling

  Profiling results for the non-optimized program

> ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs
> ./Sum 1e6 +RTS -p -K100M



          !   total time =         0.24 secs
          !   total alloc = 124,080,472 bytes

          COST CENTRE MODULE   %time %alloc

          mysum       Main      52.7   74.1
          myfoldl.lgo Main      43.6   25.8
          myfoldl     Main       3.7    0.0




                                                         9
Time Profiling

    Profiling results for the optimized program

> ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs
> ./Sum 1e6 +RTS -p -K100M



          !   total time =           0.14 secs
          !   total alloc =    92,080,364 bytes

          COST CENTRE MODULE    %time %alloc

          mysum       Main       92.1   99.9
          myfoldl.lgo Main        7.9    0.0




                                                            10
Memory Profiling
 Profiling results for the non-optimized program
> ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs
> ./Sum 1e6 +RTS -hy -p -K100M
> hp2ps -e8in -c Sum.hp

         Sum 1e6 +RTS -p -hy -K100M               3,127,720 bytes x seconds                 Wed Dec 12 15:01 2012
            bytes




          18M


          16M
                                                                                                    Double

          14M


          12M


          10M                                                                                       *



           8M


           6M
                                                                                                    BLACKHOLE
           4M


           2M


           0M
             0.0    0.0     0.0       0.1   0.1      0.1      0.1     0.1     0.2   0.2   seconds



                                                                                                                    11
Memory Profiling
 Profiling results for the optimized program
> ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs
> ./Sum 1e6 +RTS -hy -p -K100M
> hp2ps -e8in -c Sum.hp

          Sum 1e6 +RTS -p -hy -K100M         2,377 bytes x seconds                          Wed Dec 12 15:02 2012
             bytes




                                                                                     ARR_WORDS
            30k

                                                                                     []


            25k                                                                      MUT_ARR_PTRS_CLEAN


                                                                                     Handle__

            20k
                                                                                     MUT_VAR_CLEAN


                                                                                     ->[]
            15k
                                                                                     Buffer


                                                                                     WEAK
            10k

                                                                                     IO


             5k                                                                      ForeignPtrContents


                                                                                     (,)

             0k
               0.0      0.0      0.0   0.1      0.1       0.1        0.1   seconds




                                                                                                                    12
The Problem
     Too Many Allocation of Double objects

The cause:
Too many thunks allocated for lazily computed values
mysum :: Double -> Double
mysum n = myfoldl (+) 0 [1..n]

myfoldl :: (a -> b -> a) -> a -> [b] -> a
myfoldl f z0 xs0 = lgo z0 xs0
             where
                lgo z []      = z
                lgo z (x:xs) = lgo (f z x) xs


In our example the computation of Double values is
delayed by the calls to lgo.
                                                       13
Intermezzo
          Call-by-Value                         Call-by-Need

Arguments of a function call              Arguments of a function call
are fully evaluated                       are not evaluated
before the invocation.                    before the invocation.
                                          Instead, a pointer (thunk)
                                          to the code is created, and,
                                          once evaluated, the value is
                                          memoized.
Thunk (Urban Dictionary):
To sneak up on someone and bean him with a heavy blow
to the back of the head.

“Jim got thunked going home last night. Serves him right for walking
in a dark alley with all his paycheck in his pocket.”
                                                                         14
How to thunk a thunk

•   Apply its delayed value as a function;

•   Examine its value in a case-expression.

    case p of
      (a, b) -> f a b


p will be evaluated to the weak-head normal form,
sufficient to examine whether it is a pair.

However, its components will remain unevaluated (i.e., thunks).

Remark:
Only evaluation of boxed values can be delayed via thunks.


                                                                  15
Our Example from CBN’s Perspective
  mysum :: Double -> Double
  mysum n = myfoldl (+) 0 [1..n]

  myfoldl :: (a -> b -> a) -> a -> [b] -> a
  myfoldl f z0 xs0 = lgo z0 xs0
               where
                  lgo z []      = z
                  lgo z (x:xs) = lgo (f z x) xs


      mysum 3
 =) myfoldl (+) 0 (1:2:3:[])
 =) lgo z1 (1:2:3:[])                   z1 -> 0
 =) lgo z2 (2:3:[])                     z2 -> 1 + !z1
 =) lgo z3 (3:[])                       z3 -> 2 + !z2
 =) lgo z4 []                           z4 -> 3 + !z3
 =) !z4
                          Now GC can do the job...
                                                        16
Getting Rid of Redundant Thunks

Obvious Solution:
Replace CBN by CBV, so no need in thunk.

Obvious Problem:
The semantics of a “lazy” program can
change unpredictably.

   f x e = if x > 0
           then x + 1
           else e

   f 5 (error “Urk”)



                                           17
Getting Rid of Redundant Thunks
Let’s reformulate:
Replace CBN by CBV only for strict functions,
i.e., those that always evaluate their argument
to the WHNF.

 f x e = if x > 0
         then x + 1
         else e

 f 5 (error “Urk”)



 •   f is strict in x

 •   f is non-strict (lazy) in e

                                                  18
A Convenient Definition of Strictness
Definition:
A function f of one argument is strict iff

           f undefined = undefined

Strictness is formulated similarly for functions of
multiple arguments.

 f x e = if x > 0
         then x + 1
         else e

 f 5 (error “Urk”)



                                                      19
Enforcing CBV for Function Calls
      Worker/Wrapper Transformation
            Splitting a function into two parts

f :: (Int, Int) -> Int
f p = e

                                +
f :: (Int, Int) -> Int
f p = case p of (a, b) -> $wf a b

$wf :: Int -> Int -> Int
$wf a b = let p = (a, b) in e


 •   The worker does all the job, but takes unboxed;

 •   The wrapper serves as an impedance matcher and inlined at every
     call site.
                                                                       20
Some Redundant Job Done?

f :: (Int, Int) -> Int
f p = case p of (a, b) -> $wf a b

$wf :: Int -> Int -> Int
$wf a b = let p = (a, b) in e




•   f takes the pair apart and passes components to $wf;

•   $wf construct the pair again.




                                                           21
Strictness to the Rescue
   A strict function always examines its parameter.

So, we just rely on a smart rewriter of case-expressions.

 f :: (Int, Int) -> Int
 f p = (case p of (a, b) -> a) + 1

                            +
 f :: (Int, Int) -> Int
 f p = case p of (a, b) -> $wf a

 $wf :: Int -> Int
 $wf a = let p = (a, error “Urk”)
         in
            (case p of (a, b) -> a) + 1



                                                            22
Strictness to the Rescue
   A strict function always examines its parameter.

So, we just rely on a smart rewriter of case-expressions.

 f :: (Int, Int) -> Int
 f p = (case p of (a, b) -> a) + 1

                            +
 f :: (Int, Int) -> Int
 f p = case p of (a, b) -> $wf a

 $wf :: Int -> Int
 $wf a = a + 1




                                                            23
Our Example
Step 1: Inline myfoldl

mysum :: Double -> Double
mysum n = myfoldl (+) 0 [1..n]

myfoldl :: (a -> b -> a) -> a -> [b] -> a
myfoldl f z0 xs0 = lgo z0 xs0
             where
                lgo z []      = z
                lgo z (x:xs) = lgo (f z x) xs




                                                24
Our Example
Step 2: Analyze Strictness and Absence

mysum :: Double -> Double
mysum n = lgo 0 n
  where
    lgo :: Double -> [Double] -> Double
    lgo z []      = z
    lgo z (x:xs) = lgo (z + x) xs



Result: lgo is strict in its both arguments




                                              25
Our Example
Step 3: Worker/Wrapper Split

mysum :: Double -> Double
mysum n = lgo 0 n
  where
    lgo :: Double -> [Double] -> Double
    lgo z []      = z
    lgo z (x:xs) = lgo (z + x) xs




                                          26
Our Example
Step 3: Worker/Wrapper Split

mysum :: Double -> Double
mysum n = lgo 0 n
  where
    lgo :: Double -> [Double] -> Double
    lgo z xs      = case z of D# d -> $wlgo d xs

    $wlgo :: Double# -> [Double] -> Double
    $wlgo d []     = D# d
    $wlgo d (x:xs) = lgo ((D# d) + x) xs



$wlgo takes unboxed doubles as an argument.



                                                   27
Our Example
Step 4: Inline lgo in the Worker

mysum :: Double -> Double
mysum n = lgo 0 n
  where
    lgo :: Double -> [Double] -> Double
    lgo z xs      = case z of D# d -> $wlgo d xs

    $wlgo :: Double# -> [Double] -> Double
    $wlgo d []     = D# d
    $wlgo d (x:xs) = lgo ((D# d) + x) xs




                                                   28
Our Example
Step 4: Inline lgo in the Worker

mysum :: Double -> Double
mysum n = lgo 0 n
  where
    lgo :: Double -> [Double] -> Double
    lgo z xs      = case z of D# d -> $wlgo d xs

     $wlgo :: Double# -> [Double] -> Double
     $wlgo d []     = D# d
     $wlgo d (x:xs)
        = case ((D# d) + x) of D# d' -> $wlgo d' xs




 •    lgo is invoked just once;

 •    No intermediate thunks for d is constructed.

                                                      29
A Brief Look
at GHC’s Guts



                30
GHC Compilation Pipeline
A number of Intermediate Languages

•   Haskell Source

•   Core

•   Spineless Tagless G-Machine

•   C--

•   C / Machine Code / LLVM Code


 Most of interesting optimizations
           happen here


                                      31
32
GHC Core


•   A tiny language, to which Haskell sources are de-sugared;

•   Based on explicitly typed System F with type equality
    coercions;

•   Used as a base platform for analyses and optimizations;

•   All names are fully-qualified;

•   if-then-else is compiled to case-expressions;

•   Variables have additional metadata;

•   Type class constraints are compiled into record parameters.



                                                                  33
Core Syntax
data Expr b
  = Var! Id
  | Lit   Literal
  | App   (Expr b) (Expr b)
  | Lam   b (Expr b)
  | Let   (Bind b) (Expr b)
  | Case (Expr b) b Type [Alt b]!
  | Cast (Expr b) Coercion
  | Tick (Tickish Id) (Expr b)
  | Type Type
  | Coercion Coercion

data Bind b = NonRec b (Expr b)
!           | Rec [(b, (Expr b))]

type Alt b = (AltCon, [b], Expr b)

data AltCon
  = DataAlt DataCon
  | LitAlt Literal
  | DEFAULT
                                     34
Core Output (Demo)




•A factorial function
• mysum




                         35
How to Get Core

Desugared Core
> ghc -ddump-ds Sum.hs



Core with Strictness Annotations
> ghc -ddump-stranal Sum.hs



Core after Worker/Wrapper Split
> ghc -ddump-worker-wrapper Sum.hs




            More at http://www.haskell.org/ghc/docs/2.10/users_guide/user_41.html
                                                                                    36
Strictness and Absence
        Analyses
      in a Nutshell


                         37
Two Types
of Modular Program Analyses
• Forward analysis
 •   “Run” the program with abstract input and infer the abstract
     result;

 •   Examples: sign analysis, interval analysis, type checking/
     inference.

• Backwards analysis
 •   From the expected abstract result of the program infer the
     abstract values of its inputs.



                                                                    38
Strictness from the definition
     as a forward analysis
                     f ?=?
 A function with multiple parameters
                 f x y z = ...
(f ? > >), (f > ? >), (f > > ?)
    What if there are nested, recursive definitions?



                                                      39
Strictness as a backwards analysis
              (Informally)

                  f x y z = ...

     If the result of f applied to some arguments
           is going to be evaluated to WHNF,
         what can we say about its parameters?


Backwards analysis provides this contextual information.


                                                           40
Defining the Contexts (formally)
            Denotational Semantics
 •   Answers the question what a program is;

 •   Introduced by Dana Scott and Christopher Strachey to reason
     about imperative programs as state transformers;

 •   The effect of program execution is modeled by relating a
     program to a mathematical function;

 •   Main purpose: constructing different domains for program
     interpretation and analysis;

 •   Secondary purpose: introducing ordering on program objects.




                                                                   41
Simple Denotational Semantics of Core

Definition
Domain - a set of meanings for different programs

What is the meaning of undefined
or a non-terminating program?

                ?   - “bottom”

 JundefinedK = ?
 Jf x = f xK = ?
                                                    42
Simple Denotational Semantics of Core
 ? is the least defined element in our domain
 Once evaluated, it terminates the program

 Adding bottom to a set of values is called lifting

 Example:      Z?
   ...     2        1    0    1    2 ...

                         ?
                                                      43
Simple Denotational Semantics of Core
    Denotational semantics of a literal is itself

                    J1K = 1

   ...     2        1    0    1    2 ...

                        ?
            Should be interpreted as

    ...? v     2, ? v     1, ? v 0, ? v 1, . . .

                                                    44
Elements of Domain Theory
Partial order v
 x v y - x is “less defined than” y
•   reflexive:   8x x v x
•   transitive: if x v y and y v z then x v z

•   antisymmetric: if x v y and y v x then x = y

Least upper bound z = x t y
xvz
yvz
x v z 0 and y v z 0 =) z v z 0

                                                   45
Simple Denotational Semantics of Core
Algebraic Data Types
data Maybe a = Nothing | Just a




           Just (Just ?)              Just 2

           Nothing                Just ?

                         ?

                                               46
Simple Denotational Semantics of Core
Monotone functions
     f is monotone i↵ x v y () f x v f y


  Denotational semantics of first-order Core functions -
   monotone functions on the lifted domain of values.


  Complete domain for denotational semantics of Core
               is defined recursively.




                                                          47
Simple Denotational Semantics of Core
Monotone functions as domain elements
                              8
       ⇢                      < 1       if x = 0
           1 if x = 0
f x=                     g x=   2       if x = 1
           ? otherwise        :
                                ?       otherwise


 Functions are compared point-wise: f v g


 Recursive definitions are computed as successive
 chains of increasingly more defined functions.

                                                    48
Projections: Defining Usage Contexts

Definition:
A monotone function   p is a projection if for every object d
                           pdvd                Shrinking
                       p(p d) = p d            Idempotent

In point-free style

                            p v ID
                       p p=p


                                                                49
Intuition behind Projections
 •   Projections remove information from objects;

 •   Projections is a way to describe which parts of an
     object are essential for the computation;

 •   Projection will be used as a synonym to context.

Examples
ID = x.x
BOT = x.?
F1 = (x, y).(?, y)
F2 = g. p.g(F1 p) - a projection if g is monotone


                                                          50
More Facts about Projections

Theorem:
If P is a set of projections then
tP exists and is a projection.



Lemma:
Let p1 and p2 be projections.
Then p1 v p2 =) p1 p2 = p1 .




                                    51
Higher-Order Projections

Let p, q be projections, then

                ⇢
                    p f   q   if f is a function
(q ! p)f =
                    ?         otherwise

            ⇢
                (p d1 , q d2 ) if f is a pair and f = (d1 , d2 )
(p, q)f =
                ?              otherwise


These are projections, too.

                                                                   52
Modeling Usage with Projections
               f = x. . . .
What does it mean “f is not using its argument”?
                f z=f ?
                                  What happens
or                               to the argument


     (ID ! ID)f = (BOT ! ID)f


                 What happens
                 to the result
                                                   53
Modeling Usage with Projections

      (ID ! ID)f = (BOT ! ID)f

                    m
(ID ! ID)f = (ID ! ID)((BOT ! ID)f )
|   {z  }    |   {z  } |   {z  }
     p              p             q

                    m
               p f = p (q f )

q is a safe projection in the context of p

                                             54
Safety Condition for Projections
                        p f = p (q f )

p   defines a context, i.e., how we are going to use a value;
q   defines, how much information we can remove
    from the object, so it won’t change from p’s perspective.

The goal of a backwards absence/strictness analysis -
to find a safe projection for a given value and a context

•   The context: how the result of the function
    is going to be used;

•   The output: how arguments can be safely changed.

                                                                55
Safe Usage Projections: Example
                       p f = p (q f )

f :: (Int, Int, Int)   -> [a] -> (Int, Bool)
f (a, b, c) = case a   of
               0 ->    error "urk"
               _ ->    y -> case b of
                               0 -> (c, null y)
                               _ -> (c, False)




           p                               q
       ID ! ID                          ID ! ID
 ID ! ID ! (BOT , ID)        (ID, ID, BOT ) ! ID ! ID
  ID ! ID ! (ID, BOT )          ID ! BOT ! ID

                                                        56
What about Strictness?
Usage context is modeled by the identity projection.
Unfortunately, it is to weak for the strictness property.
 The problem:
  •    ID treats ⊥ as any other value;

  •    It is not helpful to establish a context for detecting f ⊥ = ⊥.

 A solution:
   •   Introduce a specific element in the domain for “true divergence”;

   •   Devise a specific projection that maps ⊥ to the true divergence.


                                                                          57
Extending the Domain for True Divergence

               - lightning bolt




                    ?



              8f f      =

                                           58
Modeling Strictness with Projections
               S   =
              S ?=
               S x = x, otherwise

Checking if the function f uses its argument strictly
               S   f =S    f   S
Indeed,
           (S f ) ? = (S f S) ?
     =)     S (f ?) = S (f (S ?))
     =)     S (f ?) = S (f )
     =)     S (f ?) = S
     =)      S (f ?) =
     =)          f ?=?
                                                        59
Conservative Nature of the Analysis
•   From the backwards perspective each function is a
    “projection transformer”: it transforms a result context to a
    safe projection (not always the best one);

•   The set of all safe projections of a function is incomputable,
    as it requires examining all contexts;

•   Instead, the optimal “threshold” result projection is chosen.

          v
         ID
            ⇤
          q
         q2
          q1

                      p1       p2        ⇤   p3   v
                                     p
                                                                     60
How to screw the Strictness Analysis
   fact :: Int -> Int
   fact n = if n == 0
            then n
            else n * (fact $ n - 1)


   Let’s take a look on the strictness signatures (demo)


  Conclusion
  Polymorphism and type classes introduce implicit calls
  to non-strict functions and constructors, which make it
  harder to infer strictness.




                                                            61
Forward Analysis Example


Constructed Product Result
         Analysis

 Defines if a function can profitably return
       multiple results in registers.



                                             62
Example and Motivation
 dm :: Int -> Int -> (Int, Int)
 dm x y = (x `div` y, x `mod` y)



        We would like to express that
     dm can return its result pair unboxed.

   Unboxed tuples are built-in types in GHC.

     The calling convention for a function
        that returns an unboxed tuple
arranges to return the components on registers.

                                                  63
Worker/Wrapper Split to the Rescue

   dm :: Int -> Int -> (Int, Int)
   dm x y = case $wdmy, y of
            (x `div` x x `mod` y)
              (# r1, r2 #) -> (r1, r2)



   $wdm :: Int -> Int -> (# Int, Int #)
   $wdm x y = (# x `div` y, x `mod` y #)




      •   The worker does actually all the job;

      •   The wrapper serves as an impedance
          matcher;


                                                  64
The Essence of the Transformation
  If the result of the worker is scrutinized immediately...
  case dm x y of
    (p, q) -> e

  Inline the worker
  case (case $wdm x y of
         (# r1, r2 #) -> (r1, r2)) of
    (p, q) -> e

  The tuple is returned unboxed
  case $wdm x y of
    (# p, q #) -> e


  The result pair construction has been moved
  from the body of dm to its call site.

                                                              65
General CPR Worker/Wrapper Split
   An arbitrary function returning a product
   f :: Int -> (Int, Int)
   f x = e


  The wrapper
   f :: Int -> (Int, Int)
   f x = case $wf x of
           (# r1, r2 #) -> (r1, r2)


   The worker
   $wf :: Int -> (# Int, Int #)
   $wf = case e of
           (r1, r2) -> (# r1, r2 #)


                                               66
When is the W/W Split Beneficial?
     f :: Int -> (Int, Int)
     f x = case $wf x of
             (# r1, r2 #) -> (r1, r2)

     $wf :: Int -> (# Int, Int #)
     $wf = case e of
             (r1, r2) -> (# r1, r2 #)



      •   The worker takes the pair apart;

      •   The wrapper reconstructs it again.

The insight
Things are getting worse unless the case expression in $wf
is certain to cancel with the construction of the pair in e.

                                                               67
When is the W/W Split Beneficial?

 We should only perform the CPR W/W transformation
 if the result of the function is allocated by the function itself.


Definition:
A function has the CPR (constructed product result) property,
if it allocates its result product itself.



    The goal of the CPR analysis is to infer this property.


                                                                      68
CPR Analysis Informally


•   The analysis is modular: it’s based on the function
    definition only, but not its uses;

•   Implemented in the form of an augmented type
    system, which tracks explicit product constructions;

•   Forwards analysis: assumes all arguments are
    non-explicitly constructed products.




                                                           69
Examples
Has CPR property                 is CPR
f :: Int -> (Int, Int)
f x y = if x <= y
        then (x, y)
        else f (x - 1) (y + 1)

                                   depends on CPR(f)
Does not have CPR property
g :: Int -> (Int, Int)
f x y = if x <= y
        then (x, y)
        else genRange x

                                    external function
CPR property in Core metadata: demo
                                                        70
A program that benefits from CPR
 tak :: Int -> Int -> Int -> Int

 tak x y z = if not(y   < x)   then z
             else tak   (tak   (x-1) y z)
 ! !                    (tak   (y-1) z x)
      !                 (tak   (z-1) x y)

 main = do
 ! [xs,ys,zs] <- getArgs
 ! print (tak (read xs) (read ys) (read zs))




  •   Taken from the nofib benchmark suite

  •   A result from tak is consumed by itself,
      so both parts of the worker collapse

  •   Memory consumption gain: 99.5%
                                                 71
nofib: Strictness + Absence + CPR
   ----------------------------------------------------
            Program            Size    Allocs  Runtime
   ----------------------------------------------------
               ansi           -1.3%    -12.1%     0.00
             banner           -1.4%    -18.7%     0.00
             boyer2           -1.3%    -31.8%     0.00
           clausify           -1.3%    -35.0%     0.03
      comp_lab_zift           -1.3%     +0.2%    +0.0%
          compress2           -1.4%    -32.7%    +1.4%
                cse           -1.4%    -15.8%     0.00
            mandel2           -1.4%    -28.0%     0.00
             puzzle           -1.3%    +16.5%     0.16
               rfib           -1.4%    -99.7%     0.02
               x2n1           -1.2%    -81.2%     0.01
                    ... and 90 more ...
   ----------------------------------------------------
                Min           -1.5%    -95.0%   -16.2%
                Max           -0.7%    +16.5%    +3.2%
     Geometric Mean           -1.3%    -16.9%    -3.3%



                                                          72
Conclusion
•   Lazy programs allocate a lot of thunks;
    it might cause performance problems due to a big chunk of GC work;

•   Allocating thunks can be avoided by changing call/return contract
    of a function;

•   Worker/Wrapper transformation is a cheap way to enforce argument
    unboxing/evaluation;

•   We need Strictness and Absence analysis so the W/W split would not
    change a program semantics;

•   We need CPR analysis so CPR W/W split would be beneficial;

•   There are two types of analyses: forward and backwards;
    Strictness and Absence are backwards ones, CPR is a forward analysis;

•   Projections are a convenient way to model contexts
    in a backwards analysis.

                                                                        Thanks
                                                                                 73
References
•   Profiling and optimization

    •   B. O’Sullivan et al. Real World Haskell, Chapter 25

    •   E. Z.Yang. Anatomy of a Thunk Leak
        http://blog.ezyang.com/2011/05/anatomy-of-a-thunk-leak/
        The Haskell Heap
        http://blog.ezyang.com/2011/04/the-haskell-heap/


•   Strictness and CPR Analyses
    •   http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand

    •   http://www.haskell.org/haskellwiki/Lazy_vs._non-strict


    •   C. Baker-Finch et al. Constructed Product Result Analysis for Haskell

•   Denotational Semantics and Projections

    •   G. Winskel. Formal Semantics of Programming Languages

    •   P. Wadler, R. J. M. Hughes. Projections for strictness analysis.


                                                                                74

More Related Content

What's hot

Kernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementKernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementAnne Nicolas
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~Kohei KaiGai
 
計算機性能の限界点とその考え方
計算機性能の限界点とその考え方計算機性能の限界点とその考え方
計算機性能の限界点とその考え方Naoto MATSUMOTO
 
Implementing distributed mclock in ceph
Implementing distributed mclock in cephImplementing distributed mclock in ceph
Implementing distributed mclock in ceph병수 박
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
 
A synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time JavaA synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time JavaUniversidad Carlos III de Madrid
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStromKohei KaiGai
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)Kohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresLinaro
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
 
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1Yukio Saito
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32HANCOM MDS
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
 

What's hot (20)

Kernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementKernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power Management
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
計算機性能の限界点とその考え方
計算機性能の限界点とその考え方計算機性能の限界点とその考え方
計算機性能の限界点とその考え方
 
No Heap Remote Objects for Distributed real-time Java
No Heap Remote Objects for Distributed real-time JavaNo Heap Remote Objects for Distributed real-time Java
No Heap Remote Objects for Distributed real-time Java
 
Implementing distributed mclock in ceph
Implementing distributed mclock in cephImplementing distributed mclock in ceph
Implementing distributed mclock in ceph
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
2011.jtr.pbasanta.
2011.jtr.pbasanta.2011.jtr.pbasanta.
2011.jtr.pbasanta.
 
A synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time JavaA synchronous scheduling service for distributed real-time Java
A synchronous scheduling service for distributed real-time Java
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 

Viewers also liked

Static Code Analysis
Static Code AnalysisStatic Code Analysis
Static Code AnalysisAnnyce Davis
 
Haskell Symposium 2010: An LLVM backend for GHC
Haskell Symposium 2010: An LLVM backend for GHCHaskell Symposium 2010: An LLVM backend for GHC
Haskell Symposium 2010: An LLVM backend for GHCdterei
 
Static code analysis with sonar qube
Static code analysis with sonar qubeStatic code analysis with sonar qube
Static code analysis with sonar qubeHayi Nukman
 
Best Practices of Static Code Analysis in the SDLC
Best Practices of Static Code Analysis in the SDLCBest Practices of Static Code Analysis in the SDLC
Best Practices of Static Code Analysis in the SDLCParasoft_Mitchell
 
Sonar Tool - JAVA code analysis
Sonar Tool - JAVA code analysisSonar Tool - JAVA code analysis
Sonar Tool - JAVA code analysisPrashant Gupta
 
Poster Analysis Source Code
Poster Analysis Source CodePoster Analysis Source Code
Poster Analysis Source Codekirstysals
 
Static code analysis
Static code analysisStatic code analysis
Static code analysisRune Sundling
 
Source Code Analysis with SAST
Source Code Analysis with SASTSource Code Analysis with SAST
Source Code Analysis with SASTBlueinfy Solutions
 
RIPS - static code analyzer for vulnerabilities in PHP
RIPS - static code analyzer for vulnerabilities in PHPRIPS - static code analyzer for vulnerabilities in PHP
RIPS - static code analyzer for vulnerabilities in PHPSorina Chirilă
 
ScalaプログラマのためのHaskell入門
ScalaプログラマのためのHaskell入門ScalaプログラマのためのHaskell入門
ScalaプログラマのためのHaskell入門Yasuaki Takebe
 
Hp fortify source code analyzer(sca)
Hp fortify source code analyzer(sca)Hp fortify source code analyzer(sca)
Hp fortify source code analyzer(sca)Nagaraju Repala
 
これから Haskell を書くにあたって
これから Haskell を書くにあたってこれから Haskell を書くにあたって
これから Haskell を書くにあたってTsuyoshi Matsudate
 
A2 - broken authentication and session management(OWASP thailand chapter Apri...
A2 - broken authentication and session management(OWASP thailand chapter Apri...A2 - broken authentication and session management(OWASP thailand chapter Apri...
A2 - broken authentication and session management(OWASP thailand chapter Apri...Noppadol Songsakaew
 
A8 cross site request forgery (csrf) it 6873 presentation
A8 cross site request forgery (csrf)   it 6873 presentationA8 cross site request forgery (csrf)   it 6873 presentation
A8 cross site request forgery (csrf) it 6873 presentationAlbena Asenova-Belal
 
Simplified Security Code Review Process
Simplified Security Code Review ProcessSimplified Security Code Review Process
Simplified Security Code Review ProcessSherif Koussa
 
Java Source Code Analysis using SonarQube
Java Source Code Analysis using SonarQubeJava Source Code Analysis using SonarQube
Java Source Code Analysis using SonarQubeAngelin R
 
Static Analysis For Security and DevOps Happiness w/ Justin Collins
Static Analysis For Security and DevOps Happiness w/ Justin CollinsStatic Analysis For Security and DevOps Happiness w/ Justin Collins
Static Analysis For Security and DevOps Happiness w/ Justin CollinsSonatype
 
OWASP A1 - Injection | The art of manipulation
OWASP A1 - Injection | The art of manipulationOWASP A1 - Injection | The art of manipulation
OWASP A1 - Injection | The art of manipulationPavan M
 

Viewers also liked (20)

Static Code Analysis
Static Code AnalysisStatic Code Analysis
Static Code Analysis
 
Static Code Analysis
Static Code AnalysisStatic Code Analysis
Static Code Analysis
 
Haskell Symposium 2010: An LLVM backend for GHC
Haskell Symposium 2010: An LLVM backend for GHCHaskell Symposium 2010: An LLVM backend for GHC
Haskell Symposium 2010: An LLVM backend for GHC
 
Static code analysis with sonar qube
Static code analysis with sonar qubeStatic code analysis with sonar qube
Static code analysis with sonar qube
 
Best Practices of Static Code Analysis in the SDLC
Best Practices of Static Code Analysis in the SDLCBest Practices of Static Code Analysis in the SDLC
Best Practices of Static Code Analysis in the SDLC
 
Sonar Tool - JAVA code analysis
Sonar Tool - JAVA code analysisSonar Tool - JAVA code analysis
Sonar Tool - JAVA code analysis
 
Poster Analysis Source Code
Poster Analysis Source CodePoster Analysis Source Code
Poster Analysis Source Code
 
Static code analysis
Static code analysisStatic code analysis
Static code analysis
 
Source Code Analysis with SAST
Source Code Analysis with SASTSource Code Analysis with SAST
Source Code Analysis with SAST
 
RIPS - static code analyzer for vulnerabilities in PHP
RIPS - static code analyzer for vulnerabilities in PHPRIPS - static code analyzer for vulnerabilities in PHP
RIPS - static code analyzer for vulnerabilities in PHP
 
ScalaプログラマのためのHaskell入門
ScalaプログラマのためのHaskell入門ScalaプログラマのためのHaskell入門
ScalaプログラマのためのHaskell入門
 
Hp fortify source code analyzer(sca)
Hp fortify source code analyzer(sca)Hp fortify source code analyzer(sca)
Hp fortify source code analyzer(sca)
 
これから Haskell を書くにあたって
これから Haskell を書くにあたってこれから Haskell を書くにあたって
これから Haskell を書くにあたって
 
A2 - broken authentication and session management(OWASP thailand chapter Apri...
A2 - broken authentication and session management(OWASP thailand chapter Apri...A2 - broken authentication and session management(OWASP thailand chapter Apri...
A2 - broken authentication and session management(OWASP thailand chapter Apri...
 
A8 cross site request forgery (csrf) it 6873 presentation
A8 cross site request forgery (csrf)   it 6873 presentationA8 cross site request forgery (csrf)   it 6873 presentation
A8 cross site request forgery (csrf) it 6873 presentation
 
Simplified Security Code Review Process
Simplified Security Code Review ProcessSimplified Security Code Review Process
Simplified Security Code Review Process
 
Java Source Code Analysis using SonarQube
Java Source Code Analysis using SonarQubeJava Source Code Analysis using SonarQube
Java Source Code Analysis using SonarQube
 
Static Analysis For Security and DevOps Happiness w/ Justin Collins
Static Analysis For Security and DevOps Happiness w/ Justin CollinsStatic Analysis For Security and DevOps Happiness w/ Justin Collins
Static Analysis For Security and DevOps Happiness w/ Justin Collins
 
Secure Code Review 101
Secure Code Review 101Secure Code Review 101
Secure Code Review 101
 
OWASP A1 - Injection | The art of manipulation
OWASP A1 - Injection | The art of manipulationOWASP A1 - Injection | The art of manipulation
OWASP A1 - Injection | The art of manipulation
 

Similar to Static Analysis and Code Optimizations in Glasgow Haskell Compiler

Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection HeroTier1app
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbageTier1 App
 
**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdf**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdfagnathavasi
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterEDB
 
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийOlga Lavrentieva
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore schedulingNicolas Navet
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Tier1 App
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsTier1app
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceTier1app
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxNaoto MATSUMOTO
 
Bluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysisBluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysis병수 박
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS processRiyaj Shamsudeen
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL softbasemarketing
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the wayMark Price
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 

Similar to Static Analysis and Code Optimizations in Glasgow Haskell Compiler (20)

Become a Garbage Collection Hero
Become a Garbage Collection HeroBecome a Garbage Collection Hero
Become a Garbage Collection Hero
 
Pick diamonds from garbage
Pick diamonds from garbagePick diamonds from garbage
Pick diamonds from garbage
 
**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdf**Understanding_CTS_Log_Messages.pdf
**Understanding_CTS_Log_Messages.pdf
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even Faster
 
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore scheduling
 
Multicore scheduling in automotive ECUs
Multicore scheduling in automotive ECUsMulticore scheduling in automotive ECUs
Multicore scheduling in automotive ECUs
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day Devops
 
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo Conference
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
 
Bluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysisBluestore oio adaptive_throttle_analysis
Bluestore oio adaptive_throttle_analysis
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Static Analysis and Code Optimizations in Glasgow Haskell Compiler

  • 1. Static Analysis and Code Optimizations in Glasgow Haskell Compiler Ilya Sergey ilya.sergey@gmail.com 12.12.12 1
  • 2. The Goal Discuss what happens when we run ghc -O MyProgram.hs 2
  • 3. The Plan • Recall how laziness is implemented in GHC and what drawbacks it might cause; • Introduce the worker/wrapper transformation - an optimization technique implemented in GHC; • Realize why we need static analysis to do the transformations; • Take a brief look at the GHC compilation pipeline and the Core language; • Meet two types of static analysis: forward and backwards; • Recall some basics of denotational semantics and take a look at the mathematical basics of some analyses in GHC; • Introduce and motivate the CPR analysis. 3
  • 4. Why Laziness Might be Harmful and How the Harm Can Be Reduced 4
  • 5. module Main where import System.Environment import Text.Printf main = do [n] <- map read `fmap` getArgs printf "%fn" (mysum n) mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs 5
  • 6. Compile and run > ghc --make -RTS -rtsopts Sum.hs > time ./Sum 1e6 +RTS -K100M 500000500000.0 real! 0m0.583s user! 0m0.509s sys! 0m0.068s Compile optimized and run > ghc --make -fforce-recomp -RTS -rtsopts -O Sum.hs > time ./Sum 1e6 500000500000.0 real! 0m0.153s user! 0m0.101s sys! 0m0.011s 6
  • 7. Collecting Runtime Statistics Profiling results for the non-optimized program > ghc --make -RTS -rtsopts -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M 225,137,464 bytes allocated in the heap 195,297,088 bytes copied during GC 107 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.21s ( 0.24s elapsed) GC time 0.36s ( 0.43s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.58s ( 0.67s elapsed) %GC time 63.2% (64.0% elapsed) 7
  • 8. Collecting Runtime Statistics Profiling results for the optimized program > ghc --make -RTS -rtsopts -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -sstderr -K100M 92,082,480 bytes allocated in the heap 30,160 bytes copied during GC 1 MB total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.07s ( 0.08s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.07s ( 0.08s elapsed) %GC time 1.1% (1.4% elapsed) 8
  • 9. Time Profiling Profiling results for the non-optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -p -K100M ! total time = 0.24 secs ! total alloc = 124,080,472 bytes COST CENTRE MODULE %time %alloc mysum Main 52.7 74.1 myfoldl.lgo Main 43.6 25.8 myfoldl Main 3.7 0.0 9
  • 10. Time Profiling Profiling results for the optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -p -K100M ! total time = 0.14 secs ! total alloc = 92,080,364 bytes COST CENTRE MODULE %time %alloc mysum Main 92.1 99.9 myfoldl.lgo Main 7.9 0.0 10
  • 11. Memory Profiling Profiling results for the non-optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp Sum 1e6 +RTS -p -hy -K100M 3,127,720 bytes x seconds Wed Dec 12 15:01 2012 bytes 18M 16M Double 14M 12M 10M * 8M 6M BLACKHOLE 4M 2M 0M 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.1 0.2 0.2 seconds 11
  • 12. Memory Profiling Profiling results for the optimized program > ghc --make -RTS -rtsopts -prof -fforce-recomp -O Sum.hs > ./Sum 1e6 +RTS -hy -p -K100M > hp2ps -e8in -c Sum.hp Sum 1e6 +RTS -p -hy -K100M 2,377 bytes x seconds Wed Dec 12 15:02 2012 bytes ARR_WORDS 30k [] 25k MUT_ARR_PTRS_CLEAN Handle__ 20k MUT_VAR_CLEAN ->[] 15k Buffer WEAK 10k IO 5k ForeignPtrContents (,) 0k 0.0 0.0 0.0 0.1 0.1 0.1 0.1 seconds 12
  • 13. The Problem Too Many Allocation of Double objects The cause: Too many thunks allocated for lazily computed values mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs In our example the computation of Double values is delayed by the calls to lgo. 13
  • 14. Intermezzo Call-by-Value Call-by-Need Arguments of a function call Arguments of a function call are fully evaluated are not evaluated before the invocation. before the invocation. Instead, a pointer (thunk) to the code is created, and, once evaluated, the value is memoized. Thunk (Urban Dictionary): To sneak up on someone and bean him with a heavy blow to the back of the head. “Jim got thunked going home last night. Serves him right for walking in a dark alley with all his paycheck in his pocket.” 14
  • 15. How to thunk a thunk • Apply its delayed value as a function; • Examine its value in a case-expression. case p of (a, b) -> f a b p will be evaluated to the weak-head normal form, sufficient to examine whether it is a pair. However, its components will remain unevaluated (i.e., thunks). Remark: Only evaluation of boxed values can be delayed via thunks. 15
  • 16. Our Example from CBN’s Perspective mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs mysum 3 =) myfoldl (+) 0 (1:2:3:[]) =) lgo z1 (1:2:3:[]) z1 -> 0 =) lgo z2 (2:3:[]) z2 -> 1 + !z1 =) lgo z3 (3:[]) z3 -> 2 + !z2 =) lgo z4 [] z4 -> 3 + !z3 =) !z4 Now GC can do the job... 16
  • 17. Getting Rid of Redundant Thunks Obvious Solution: Replace CBN by CBV, so no need in thunk. Obvious Problem: The semantics of a “lazy” program can change unpredictably. f x e = if x > 0 then x + 1 else e f 5 (error “Urk”) 17
  • 18. Getting Rid of Redundant Thunks Let’s reformulate: Replace CBN by CBV only for strict functions, i.e., those that always evaluate their argument to the WHNF. f x e = if x > 0 then x + 1 else e f 5 (error “Urk”) • f is strict in x • f is non-strict (lazy) in e 18
  • 19. A Convenient Definition of Strictness Definition: A function f of one argument is strict iff f undefined = undefined Strictness is formulated similarly for functions of multiple arguments. f x e = if x > 0 then x + 1 else e f 5 (error “Urk”) 19
  • 20. Enforcing CBV for Function Calls Worker/Wrapper Transformation Splitting a function into two parts f :: (Int, Int) -> Int f p = e + f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e • The worker does all the job, but takes unboxed; • The wrapper serves as an impedance matcher and inlined at every call site. 20
  • 21. Some Redundant Job Done? f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a b $wf :: Int -> Int -> Int $wf a b = let p = (a, b) in e • f takes the pair apart and passes components to $wf; • $wf construct the pair again. 21
  • 22. Strictness to the Rescue A strict function always examines its parameter. So, we just rely on a smart rewriter of case-expressions. f :: (Int, Int) -> Int f p = (case p of (a, b) -> a) + 1 + f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = let p = (a, error “Urk”) in (case p of (a, b) -> a) + 1 22
  • 23. Strictness to the Rescue A strict function always examines its parameter. So, we just rely on a smart rewriter of case-expressions. f :: (Int, Int) -> Int f p = (case p of (a, b) -> a) + 1 + f :: (Int, Int) -> Int f p = case p of (a, b) -> $wf a $wf :: Int -> Int $wf a = a + 1 23
  • 24. Our Example Step 1: Inline myfoldl mysum :: Double -> Double mysum n = myfoldl (+) 0 [1..n] myfoldl :: (a -> b -> a) -> a -> [b] -> a myfoldl f z0 xs0 = lgo z0 xs0 where lgo z [] = z lgo z (x:xs) = lgo (f z x) xs 24
  • 25. Our Example Step 2: Analyze Strictness and Absence mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs Result: lgo is strict in its both arguments 25
  • 26. Our Example Step 3: Worker/Wrapper Split mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z [] = z lgo z (x:xs) = lgo (z + x) xs 26
  • 27. Our Example Step 3: Worker/Wrapper Split mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs $wlgo takes unboxed doubles as an argument. 27
  • 28. Our Example Step 4: Inline lgo in the Worker mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = lgo ((D# d) + x) xs 28
  • 29. Our Example Step 4: Inline lgo in the Worker mysum :: Double -> Double mysum n = lgo 0 n where lgo :: Double -> [Double] -> Double lgo z xs = case z of D# d -> $wlgo d xs $wlgo :: Double# -> [Double] -> Double $wlgo d [] = D# d $wlgo d (x:xs) = case ((D# d) + x) of D# d' -> $wlgo d' xs • lgo is invoked just once; • No intermediate thunks for d is constructed. 29
  • 30. A Brief Look at GHC’s Guts 30
  • 31. GHC Compilation Pipeline A number of Intermediate Languages • Haskell Source • Core • Spineless Tagless G-Machine • C-- • C / Machine Code / LLVM Code Most of interesting optimizations happen here 31
  • 32. 32
  • 33. GHC Core • A tiny language, to which Haskell sources are de-sugared; • Based on explicitly typed System F with type equality coercions; • Used as a base platform for analyses and optimizations; • All names are fully-qualified; • if-then-else is compiled to case-expressions; • Variables have additional metadata; • Type class constraints are compiled into record parameters. 33
  • 34. Core Syntax data Expr b = Var! Id | Lit Literal | App (Expr b) (Expr b) | Lam b (Expr b) | Let (Bind b) (Expr b) | Case (Expr b) b Type [Alt b]! | Cast (Expr b) Coercion | Tick (Tickish Id) (Expr b) | Type Type | Coercion Coercion data Bind b = NonRec b (Expr b) ! | Rec [(b, (Expr b))] type Alt b = (AltCon, [b], Expr b) data AltCon = DataAlt DataCon | LitAlt Literal | DEFAULT 34
  • 35. Core Output (Demo) •A factorial function • mysum 35
  • 36. How to Get Core Desugared Core > ghc -ddump-ds Sum.hs Core with Strictness Annotations > ghc -ddump-stranal Sum.hs Core after Worker/Wrapper Split > ghc -ddump-worker-wrapper Sum.hs More at http://www.haskell.org/ghc/docs/2.10/users_guide/user_41.html 36
  • 37. Strictness and Absence Analyses in a Nutshell 37
  • 38. Two Types of Modular Program Analyses • Forward analysis • “Run” the program with abstract input and infer the abstract result; • Examples: sign analysis, interval analysis, type checking/ inference. • Backwards analysis • From the expected abstract result of the program infer the abstract values of its inputs. 38
  • 39. Strictness from the definition as a forward analysis f ?=? A function with multiple parameters f x y z = ... (f ? > >), (f > ? >), (f > > ?) What if there are nested, recursive definitions? 39
  • 40. Strictness as a backwards analysis (Informally) f x y z = ... If the result of f applied to some arguments is going to be evaluated to WHNF, what can we say about its parameters? Backwards analysis provides this contextual information. 40
  • 41. Defining the Contexts (formally) Denotational Semantics • Answers the question what a program is; • Introduced by Dana Scott and Christopher Strachey to reason about imperative programs as state transformers; • The effect of program execution is modeled by relating a program to a mathematical function; • Main purpose: constructing different domains for program interpretation and analysis; • Secondary purpose: introducing ordering on program objects. 41
  • 42. Simple Denotational Semantics of Core Definition Domain - a set of meanings for different programs What is the meaning of undefined or a non-terminating program? ? - “bottom” JundefinedK = ? Jf x = f xK = ? 42
  • 43. Simple Denotational Semantics of Core ? is the least defined element in our domain Once evaluated, it terminates the program Adding bottom to a set of values is called lifting Example: Z? ... 2 1 0 1 2 ... ? 43
  • 44. Simple Denotational Semantics of Core Denotational semantics of a literal is itself J1K = 1 ... 2 1 0 1 2 ... ? Should be interpreted as ...? v 2, ? v 1, ? v 0, ? v 1, . . . 44
  • 45. Elements of Domain Theory Partial order v x v y - x is “less defined than” y • reflexive: 8x x v x • transitive: if x v y and y v z then x v z • antisymmetric: if x v y and y v x then x = y Least upper bound z = x t y xvz yvz x v z 0 and y v z 0 =) z v z 0 45
  • 46. Simple Denotational Semantics of Core Algebraic Data Types data Maybe a = Nothing | Just a Just (Just ?) Just 2 Nothing Just ? ? 46
  • 47. Simple Denotational Semantics of Core Monotone functions f is monotone i↵ x v y () f x v f y Denotational semantics of first-order Core functions - monotone functions on the lifted domain of values. Complete domain for denotational semantics of Core is defined recursively. 47
  • 48. Simple Denotational Semantics of Core Monotone functions as domain elements 8 ⇢ < 1 if x = 0 1 if x = 0 f x= g x= 2 if x = 1 ? otherwise : ? otherwise Functions are compared point-wise: f v g Recursive definitions are computed as successive chains of increasingly more defined functions. 48
  • 49. Projections: Defining Usage Contexts Definition: A monotone function p is a projection if for every object d pdvd Shrinking p(p d) = p d Idempotent In point-free style p v ID p p=p 49
  • 50. Intuition behind Projections • Projections remove information from objects; • Projections is a way to describe which parts of an object are essential for the computation; • Projection will be used as a synonym to context. Examples ID = x.x BOT = x.? F1 = (x, y).(?, y) F2 = g. p.g(F1 p) - a projection if g is monotone 50
  • 51. More Facts about Projections Theorem: If P is a set of projections then tP exists and is a projection. Lemma: Let p1 and p2 be projections. Then p1 v p2 =) p1 p2 = p1 . 51
  • 52. Higher-Order Projections Let p, q be projections, then ⇢ p f q if f is a function (q ! p)f = ? otherwise ⇢ (p d1 , q d2 ) if f is a pair and f = (d1 , d2 ) (p, q)f = ? otherwise These are projections, too. 52
  • 53. Modeling Usage with Projections f = x. . . . What does it mean “f is not using its argument”? f z=f ? What happens or to the argument (ID ! ID)f = (BOT ! ID)f What happens to the result 53
  • 54. Modeling Usage with Projections (ID ! ID)f = (BOT ! ID)f m (ID ! ID)f = (ID ! ID)((BOT ! ID)f ) | {z } | {z } | {z } p p q m p f = p (q f ) q is a safe projection in the context of p 54
  • 55. Safety Condition for Projections p f = p (q f ) p defines a context, i.e., how we are going to use a value; q defines, how much information we can remove from the object, so it won’t change from p’s perspective. The goal of a backwards absence/strictness analysis - to find a safe projection for a given value and a context • The context: how the result of the function is going to be used; • The output: how arguments can be safely changed. 55
  • 56. Safe Usage Projections: Example p f = p (q f ) f :: (Int, Int, Int) -> [a] -> (Int, Bool) f (a, b, c) = case a of 0 -> error "urk" _ -> y -> case b of 0 -> (c, null y) _ -> (c, False) p q ID ! ID ID ! ID ID ! ID ! (BOT , ID) (ID, ID, BOT ) ! ID ! ID ID ! ID ! (ID, BOT ) ID ! BOT ! ID 56
  • 57. What about Strictness? Usage context is modeled by the identity projection. Unfortunately, it is to weak for the strictness property. The problem: • ID treats ⊥ as any other value; • It is not helpful to establish a context for detecting f ⊥ = ⊥. A solution: • Introduce a specific element in the domain for “true divergence”; • Devise a specific projection that maps ⊥ to the true divergence. 57
  • 58. Extending the Domain for True Divergence - lightning bolt ? 8f f = 58
  • 59. Modeling Strictness with Projections S = S ?= S x = x, otherwise Checking if the function f uses its argument strictly S f =S f S Indeed, (S f ) ? = (S f S) ? =) S (f ?) = S (f (S ?)) =) S (f ?) = S (f ) =) S (f ?) = S =) S (f ?) = =) f ?=? 59
  • 60. Conservative Nature of the Analysis • From the backwards perspective each function is a “projection transformer”: it transforms a result context to a safe projection (not always the best one); • The set of all safe projections of a function is incomputable, as it requires examining all contexts; • Instead, the optimal “threshold” result projection is chosen. v ID ⇤ q q2 q1 p1 p2 ⇤ p3 v p 60
  • 61. How to screw the Strictness Analysis fact :: Int -> Int fact n = if n == 0 then n else n * (fact $ n - 1) Let’s take a look on the strictness signatures (demo) Conclusion Polymorphism and type classes introduce implicit calls to non-strict functions and constructors, which make it harder to infer strictness. 61
  • 62. Forward Analysis Example Constructed Product Result Analysis Defines if a function can profitably return multiple results in registers. 62
  • 63. Example and Motivation dm :: Int -> Int -> (Int, Int) dm x y = (x `div` y, x `mod` y) We would like to express that dm can return its result pair unboxed. Unboxed tuples are built-in types in GHC. The calling convention for a function that returns an unboxed tuple arranges to return the components on registers. 63
  • 64. Worker/Wrapper Split to the Rescue dm :: Int -> Int -> (Int, Int) dm x y = case $wdmy, y of (x `div` x x `mod` y) (# r1, r2 #) -> (r1, r2) $wdm :: Int -> Int -> (# Int, Int #) $wdm x y = (# x `div` y, x `mod` y #) • The worker does actually all the job; • The wrapper serves as an impedance matcher; 64
  • 65. The Essence of the Transformation If the result of the worker is scrutinized immediately... case dm x y of (p, q) -> e Inline the worker case (case $wdm x y of (# r1, r2 #) -> (r1, r2)) of (p, q) -> e The tuple is returned unboxed case $wdm x y of (# p, q #) -> e The result pair construction has been moved from the body of dm to its call site. 65
  • 66. General CPR Worker/Wrapper Split An arbitrary function returning a product f :: Int -> (Int, Int) f x = e The wrapper f :: Int -> (Int, Int) f x = case $wf x of (# r1, r2 #) -> (r1, r2) The worker $wf :: Int -> (# Int, Int #) $wf = case e of (r1, r2) -> (# r1, r2 #) 66
  • 67. When is the W/W Split Beneficial? f :: Int -> (Int, Int) f x = case $wf x of (# r1, r2 #) -> (r1, r2) $wf :: Int -> (# Int, Int #) $wf = case e of (r1, r2) -> (# r1, r2 #) • The worker takes the pair apart; • The wrapper reconstructs it again. The insight Things are getting worse unless the case expression in $wf is certain to cancel with the construction of the pair in e. 67
  • 68. When is the W/W Split Beneficial? We should only perform the CPR W/W transformation if the result of the function is allocated by the function itself. Definition: A function has the CPR (constructed product result) property, if it allocates its result product itself. The goal of the CPR analysis is to infer this property. 68
  • 69. CPR Analysis Informally • The analysis is modular: it’s based on the function definition only, but not its uses; • Implemented in the form of an augmented type system, which tracks explicit product constructions; • Forwards analysis: assumes all arguments are non-explicitly constructed products. 69
  • 70. Examples Has CPR property is CPR f :: Int -> (Int, Int) f x y = if x <= y then (x, y) else f (x - 1) (y + 1) depends on CPR(f) Does not have CPR property g :: Int -> (Int, Int) f x y = if x <= y then (x, y) else genRange x external function CPR property in Core metadata: demo 70
  • 71. A program that benefits from CPR tak :: Int -> Int -> Int -> Int tak x y z = if not(y < x) then z else tak (tak (x-1) y z) ! ! (tak (y-1) z x) ! (tak (z-1) x y) main = do ! [xs,ys,zs] <- getArgs ! print (tak (read xs) (read ys) (read zs)) • Taken from the nofib benchmark suite • A result from tak is consumed by itself, so both parts of the worker collapse • Memory consumption gain: 99.5% 71
  • 72. nofib: Strictness + Absence + CPR ---------------------------------------------------- Program Size Allocs Runtime ---------------------------------------------------- ansi -1.3% -12.1% 0.00 banner -1.4% -18.7% 0.00 boyer2 -1.3% -31.8% 0.00 clausify -1.3% -35.0% 0.03 comp_lab_zift -1.3% +0.2% +0.0% compress2 -1.4% -32.7% +1.4% cse -1.4% -15.8% 0.00 mandel2 -1.4% -28.0% 0.00 puzzle -1.3% +16.5% 0.16 rfib -1.4% -99.7% 0.02 x2n1 -1.2% -81.2% 0.01 ... and 90 more ... ---------------------------------------------------- Min -1.5% -95.0% -16.2% Max -0.7% +16.5% +3.2% Geometric Mean -1.3% -16.9% -3.3% 72
  • 73. Conclusion • Lazy programs allocate a lot of thunks; it might cause performance problems due to a big chunk of GC work; • Allocating thunks can be avoided by changing call/return contract of a function; • Worker/Wrapper transformation is a cheap way to enforce argument unboxing/evaluation; • We need Strictness and Absence analysis so the W/W split would not change a program semantics; • We need CPR analysis so CPR W/W split would be beneficial; • There are two types of analyses: forward and backwards; Strictness and Absence are backwards ones, CPR is a forward analysis; • Projections are a convenient way to model contexts in a backwards analysis. Thanks 73
  • 74. References • Profiling and optimization • B. O’Sullivan et al. Real World Haskell, Chapter 25 • E. Z.Yang. Anatomy of a Thunk Leak http://blog.ezyang.com/2011/05/anatomy-of-a-thunk-leak/ The Haskell Heap http://blog.ezyang.com/2011/04/the-haskell-heap/ • Strictness and CPR Analyses • http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand • http://www.haskell.org/haskellwiki/Lazy_vs._non-strict • C. Baker-Finch et al. Constructed Product Result Analysis for Haskell • Denotational Semantics and Projections • G. Winskel. Formal Semantics of Programming Languages • P. Wadler, R. J. M. Hughes. Projections for strictness analysis. 74