SlideShare a Scribd company logo
1 of 134
Download to read offline
Drift Analysis - A Tutorial

       Per Kristian Lehre

        ASAP Research Group
     School of Computer Science
    University of Nottingham, UK
PerKristian.Lehre@nottingham.ac.uk


          July 6, 2012
What is Drift Analysis?
What is Drift Analysis?




       Prediction of the long term behaviour of a process X
           hitting time, stability, occupancy time etc.
       from properties of ∆.
What is Drift Analysis1 ?



                                                       “Drift” ≥ ε0




           a                                                       Yk          b




     1
         NB! Drift is a different concept than genetic drift in evolutionary biology.
Runtime of (1+1) EA on Linear Functions [3]




                             Droste, Jansen & Wegener (2002)
Runtime of (1+1) EA on Linear Functions [2]




                 Doerr, Johannsen, and Winzen (GECCO 2010)
Some history
   Origins
       Stability of equilibria in ODEs (Lyapunov, 1892)
       Stability of Markov Chains (see eg [14])
       1982 paper by Hajek [6]
             Simulated annealing [19]
   Drift Analysis of Evolutionary Algorithms
        Introduced to EC in 2001 by He and Yao [7, 8]
             (1+1) EA on linear functions: O(n ln n) [7]
             (1+1) EA on maximum matching by Giel and Wegener [5]
       Simplified drift in 2008 by Oliveto and Witt [18]
       Multiplicative drift by Doerr et al [2]
             (1+1) EA on linear functions: en ln(n) + O(n) [22]
       Variable drift by Johannsen [11] and Mitavskiy et al. [15]
       Population drift by L. [12]
About this tutorial...




       Assumes no or little background in probability theory
       Main focus will be on drift theorems and their proofs
           Some theorems are presented in a simplified form
           full details are available in the references
       A few simple applications will be shown

       Please feel free to interrupt me with questions!
General Assumptions




                    a                            Yk     b



          Xk a stochastic process2 in some general state space X
          Yk := g(Xk ) were g : X → R is a “distance function”
          Two stopping times τa and τb

              τa := min{k ≥ 0 | Yk ≤ a}   τb := min{k ≥ 0 | Yk ≥ b}

          where we assume −∞ ≤ a < b < ∞ and Y0 ∈ (a, b).
     2
         not necessarily Markovian.
Overview of Tutorial
                                           E [Yk+1 − Yk | Fk ]




           a                                            Yk                     b
                     3
   Drift Condition                    Statement             Note
   E [Yk+1 | Fk ] ≤ Yk − ε0           E [τa ] ≤             Additive drift [7, 10]




      3
          Some drift theorems need additional conditions.
Overview of Tutorial
                                           E [Yk+1 − Yk | Fk ]




           a                                            Yk                     b
                     3
   Drift Condition                    Statement             Note
   E [Yk+1 | Fk ] ≤ Yk − ε0           E [τa ] ≤             Additive drift [7, 10]


   E [Yk+1 | Fk ] ≥ Yk − ε0           ≤ E [τa ]             Additive drift (lower b.) [7, 9]




      3
          Some drift theorems need additional conditions.
Overview of Tutorial
                                           E [Yk+1 − Yk | Fk ]




           a                                            Yk                     b
                     3
   Drift Condition                    Statement             Note
   E [Yk+1 | Fk ] ≤ Yk − ε0           E [τa ] ≤             Additive drift [7, 10]


   E [Yk+1 | Fk ] ≥ Yk − ε0           ≤ E [τa ]             Additive drift (lower b.) [7, 9]
   E [Yk+1 | Fk ] ≤ Yk                E [τa ] ≤             Supermartingale [16]




      3
          Some drift theorems need additional conditions.
Overview of Tutorial
                                           E [Yk+1 − Yk | Fk ]




           a                                            Yk                     b
                     3
   Drift Condition                    Statement             Note
   E [Yk+1 | Fk ] ≤ Yk − ε0           E [τa ] ≤             Additive drift [7, 10]


   E [Yk+1 | Fk ] ≥ Yk − ε0           ≤ E [τa ]             Additive drift (lower b.) [7, 9]
   E [Yk+1 | Fk ] ≤ Yk                E [τa ] ≤             Supermartingale [16]
   E [Yk+1 | Fk ] ≤ (1 − δ)Yk         E [τa ] ≤             Multiplicative drift [2, 4]
                                      Pr (τa > B3 ) ≤       [1]
   E [Yk+1 | Fk ] ≥ (1 − δ)Yk         ≤ E [τa ]             Multipl. drift (lower b.) [13]




      3
          Some drift theorems need additional conditions.
Overview of Tutorial
                                           E [Yk+1 − Yk | Fk ]




           a                                            Yk                     b
                     3
   Drift Condition                    Statement             Note
   E [Yk+1 | Fk ] ≤ Yk − ε0           E [τa ] ≤             Additive drift [7, 10]


   E [Yk+1 | Fk ] ≥ Yk − ε0         ≤ E [τa ]               Additive drift (lower b.) [7, 9]
   E [Yk+1 | Fk ] ≤ Yk              E [τa ] ≤               Supermartingale [16]
   E [Yk+1 | Fk ] ≤ (1 − δ)Yk       E [τa ] ≤               Multiplicative drift [2, 4]
                                    Pr (τa > B3 ) ≤         [1]
   E [Yk+1     | Fk ] ≥ (1 − δ)Yk   ≤ E [τa ]               Multipl. drift (lower b.) [13]
   E [Yk+1     | Fk ] ≤ Yk − h(Yk ) E [τa ] ≤               Variable drift [11]



      3
          Some drift theorems need additional conditions.
Overview of Tutorial
                                           E [Yk+1 − Yk | Fk ]




           a                                            Yk                    b
                     3
   Drift Condition                    Statement             Note
   E [Yk+1 | Fk ] ≤ Yk − ε0         E [τa ] ≤               Additive drift [7, 10]
                                    Pr (τa > B1 ) ≤         [6]
                                    Pr (τb < B2 ) ≤         Simplified drift [6, 17]
   E [Yk+1     | Fk ] ≥ Yk − ε0     ≤ E [τa ]               Additive drift (lower b.) [7, 9]
   E [Yk+1     | Fk ] ≤ Yk          E [τa ] ≤               Supermartingale [16]
   E [Yk+1     | Fk ] ≤ (1 − δ)Yk   E [τa ] ≤               Multiplicative drift [2, 4]
                                    Pr (τa > B3 ) ≤         [1]
   E [Yk+1     | Fk ] ≥ (1 − δ)Yk   ≤ E [τa ]               Multipl. drift (lower b.) [13]
   E [Yk+1     | Fk ] ≤ Yk − h(Yk ) E [τa ] ≤               Variable drift [11]



      3
          Some drift theorems need additional conditions.
Overview of Tutorial
                                           E [Yk+1 − Yk | Fk ]




           a                                            Yk                    b
                     3
   Drift Condition                    Statement             Note
   E [Yk+1 | Fk ] ≤ Yk − ε0     E [τa ] ≤                   Additive drift [7, 10]
                                Pr (τa > B1 ) ≤             [6]
                                Pr (τb < B2 ) ≤             Simplified drift [6, 17]
   E [Yk+1 | Fk ] ≥ Yk − ε0     ≤ E [τa ]                   Additive drift (lower b.) [7, 9]
   E [Yk+1 | Fk ] ≤ Yk          E [τa ] ≤                   Supermartingale [16]
   E [Yk+1 | Fk ] ≤ (1 − δ)Yk   E [τa ] ≤                   Multiplicative drift [2, 4]
                                Pr (τa > B3 ) ≤             [1]
   E [Yk+1 | Fk ] ≥ (1 − δ)Yk   ≤ E [τa ]                   Multipl. drift (lower b.) [13]
   E [Yk+1 | Fk ] ≤ Yk − h(Yk ) E [τa ] ≤                   Variable drift [11]
                       λYk
   E eλYk+1 | Fk ≤ eα0          Pr (τb < B) ≤               Population drift [12]

      3
          Some drift theorems need additional conditions.
Part 1 - Basic Probability Theory
Basic Probability Theory

                     Probability Triple (Ω, F , Pr)
                           Ω : Sample space
                           F : σ-algebra (family of events)
                           Pr : F → R probability function
                           (satisfying probability axioms)




   Ω
Basic Probability Theory

                     Probability Triple (Ω, F , Pr)
                           Ω : Sample space
                           F : σ-algebra (family of events)
                           Pr : F → R probability function
                           (satisfying probability axioms)




   Ω
Basic Probability Theory

                     Probability Triple (Ω, F , Pr)
                           Ω : Sample space
                           F : σ-algebra (family of events)
                           Pr : F → R probability function
                           (satisfying probability axioms)
                     Events
                           E ∈F




   Ω
Basic Probability Theory

                R       Probability Triple (Ω, F , Pr)
            X
                            Ω : Sample space
                    6
                            F : σ-algebra (family of events)
                    5       Pr : F → R probability function
                            (satisfying probability axioms)
                    4   Events
                            E ∈F
                    3
                        Random Variable
                    2       X : Ω → R and X −1 : B → F
                            X = y ⇐⇒ {ω ∈ Ω | X(ω) = y}
                    1

   Ω
Basic Probability Theory

                R       Probability Triple (Ω, F , Pr)
            X
                            Ω : Sample space
                    6
                            F : σ-algebra (family of events)
                    5       Pr : F → R probability function
                            (satisfying probability axioms)
                    4   Events
                            E ∈F
                    3
                        Random Variable
                    2       X : Ω → R and X −1 : B → F
                            X = y ⇐⇒ {ω ∈ Ω | X(ω) = y}
                    1   Expectation
   Ω                        E [X] :=    y   y Pr (X = y).
Conditional Expectation



                                         Pr (X = x ∧ E)
                      Pr (X = x | E) =
                                             Pr (E)
Conditional Expectation



                                                        Pr (X = x ∧ E)
        E [X | E] :=       x Pr (X = x | E) =       x
                       x                        x
                                                            Pr (E)
    E [X | Z = z]
Conditional Expectation



                                                        Pr (X = x ∧ E)
        E [X | E] :=       x Pr (X = x | E) =       x
                       x                        x
                                                            Pr (E)
    E [X | Z = z]
     E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)
Conditional Expectation



                                                         Pr (X = x ∧ E)
         E [X | E] :=       x Pr (X = x | E) =       x
                        x                        x
                                                             Pr (E)
     E [X | Z = z]
      E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)

   Definition
   Y = E [X | G ] if
    1. Y is G -measurable, ie, Y −1 (A) ∈ G for all A ∈ B
    2. E [|Y |] < ∞
    3. E [Y IF ] = E [XIF ] for all F ∈ G
Three Properties




      E [a1 X1 + a2 X2 | G ] = a1 E [X1 | G ] + a2 E [X2 | G ]

      If X is G -measurable then E [X | G ] = X

      If H ⊂ G , then E [E [X | G ] | H ] = E [X | H ]
Stochastic Processes and Filtration


   Definition
       A stochastic process is a sequence of rv Y1 , Y2 , . . .
       A filtration is an increasing family of sub-σ-algebras of F

                             F0 ⊆ F1 ⊆ · · · ⊆ F

       A stochastic process Yk is adapted to a filtration Fk
       if Yk is Fk -measurable for all k

   =⇒ Informally, Fk represents the information that has been
      revealed about the process during the first k steps.
Stopping Time


  Definition
  A rv. τ : Ω → N is called a stopping time if for all k ≥ 0

                            {τ ≤ k} ∈ Fk


      The information obtained until step k is sufficient
      to decide whether the event {τ ≤ k} is true or not.

  Example
      The smallest k such that Yk < a in a stochastic process.
      The runtime of an evolutionary algorithm
Martingales

  Definition (Supermartingale)
  Any process Y st ∀k
    1. Y is adapted to F
    2. E [|Yk |] < ∞            0   20   40       60   80   100

    3. E [Yk+1 | Fk ] ≤ Yk                    k
Martingales

  Definition (Supermartingale)
  Any process Y st ∀k
    1. Y is adapted to F
    2. E [|Yk |] < ∞                           0   20   40       60   80   100

    3. E [Yk+1 | Fk ] ≤ Yk                                   k



   Example
   Let ∆1 , ∆2 , . . . be rvs with −∞ < E [∆k+1 | Fk ] ≤ −ε0 for k ≥ 0
   Then the following sequence is a super-martingale

               Yk := ∆1 + · · · + ∆k


     E [Yk+1 | Fk ] = ∆1 + · · · + ∆k + E [∆k+1 | Fk ]
                    ≤ ∆ 1 + · · · + ∆ k − ε0                 < Yk
Martingales

  Definition (Supermartingale)
  Any process Y st ∀k
    1. Y is adapted to F
    2. E [|Yk |] < ∞                           0   20   40       60   80   100

    3. E [Yk+1 | Fk ] ≤ Yk                                   k



   Example
   Let ∆1 , ∆2 , . . . be rvs with −∞ < E [∆k+1 | Fk ] ≤ −ε0 for k ≥ 0
   Then the following sequence is a super-martingale

               Yk := ∆1 + · · · + ∆k       Zk := Yk + kε0


     E [Zk+1 | Fk ] = ∆1 + · · · + ∆k + E [∆k+1 | Fk ] + (k + 1)ε0
                       ≤ ∆1 + · · · + ∆k − ε0 + (k + 1)ε0 = Zk
Martingales

   Lemma
   If Y is a supermartingale, then E [Yk | F0 ] ≤ Y0 for all fixed k ≥ 0.
Martingales

   Lemma
   If Y is a supermartingale, then E [Yk | F0 ] ≤ Y0 for all fixed k ≥ 0.

   Proof.
            E [Yk | F0 ] = E [E [Yk | Fk ] | F0 ]
                        ≤ E [Yk−1 | F0 ] ≤ · · · ≤ E [Y0 | F0 ] = Y0
                                             by induction on k
Martingales

   Lemma
   If Y is a supermartingale, then E [Yk | F0 ] ≤ Y0 for all fixed k ≥ 0.

   Proof.
            E [Yk | F0 ] = E [E [Yk | Fk ] | F0 ]
                        ≤ E [Yk−1 | F0 ] ≤ · · · ≤ E [Y0 | F0 ] = Y0
                                             by induction on k



   Example
   Where is the process Y in the previous example after k steps?

                 Y0 = Z0 ≥ E [Zk | F0 ] = E [Yk + kε0 | F0 ]

   Hence, E [Yk | F0 ] ≤ Y0 − ε0 k, which is not surprising...
Part 2 - Additive Drift
Additive Drift
                                          ε0




     a=0                                       Yk     b

    (C1+) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0
    (C1−) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0
Additive Drift
                                             ε0




     a=0                                          Yk               b

    (C1+) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0
    (C1−) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0

   Theorem ([7, 9, 10])
   Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R.
   Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞.
       If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 ≤ b/ε0 .
       If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 .
Obtaining Supermartingales from Drift Conditions




  (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0

       Yk not necessarily a supermartingale,
       because (C1) assumes Yk > a
Obtaining Supermartingales from Drift Conditions
   Definition (Stopped Process)
   Let Y be a stochastic process and τ a stopping time.


                                  Yk   if k < τ
                        Yk∧τ :=
                                  Yτ   if k ≥ τ


  (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0

       Yk not necessarily a supermartingale,
       because (C1) assumes Yk > a
       But the “stopped process” Yk∧τa is a supermartingale, so
                   ∀k   Y0 ≥ E [Yk∧τa | F0 ]
                   ∀k   Y0 ≥ E [Yk∧τa + (k ∧ τa )ε0 | F0 ]
Dominated Convergence Theorem

  Theorem
  Suppose Xk is a sequence of random variables such that
  for each outcome in the sample space

                           lim Xk = X.
                           k→∞

  Let Y ≥ 0 be a random variable with E [Y ] < ∞ such that
  for each outcome in the sample space, and for each k

                             |Xk | ≤ Y.

  Then it holds

                  lim E [Xk ] = E lim Xk = E [X]
                  k→∞            k→∞
Proof of Additive Drift Theorem
    (C1+) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0
    (C1−) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0
   Theorem
   Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R
   Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞.
       If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 .
       If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 .
Proof of Additive Drift Theorem
    (C1+) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0
    (C1−) ∀k     E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0
   Theorem
   Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R
   Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞.
       If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 .
       If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 .
   Proof.
   By (C1+), Zk := Yk∧τ + ε0 (k ∧ τ ) is a super-martingale, so

                   Y0 = E [Z0 | F0 ] ≥ E [Zk | F0 ]   ∀k.

   Since Yk is bounded to [0, b], and τ has finite expectation,
   the dominated convergence theorem applies and

      Y0 ≥ lim E [Zk | F0 ] = E [Yτ + ε0 τ | F0 ] = ε0 E [τ | F0 ] .
             k→∞
Proof of Additive Drift Theorem
    (C1+) ∀k      E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0
    (C1−) ∀k      E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0
   Theorem
   Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R
   Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞.
       If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 .
       If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 .
   Proof.
   By (C1−), Zk := Yk∧τ + ε0 (k ∧ τ ) is a sub-martingale, so

                  Y0 = E [Z0 | F0 ] ≤E [Zk | F0 ]    ∀k.

   Since Yk is bounded to [0, b], and τ has finite expectation,
   the dominated convergence theorem applies and

       Y0 ≤ lim E [Zk | F0 ] = E [Yτ + ε0 τ | F0 ] = ε0 E [τ | F0 ] .
            k→∞
Examples: (1+1) EA



  1 (1+1) EA
   1: Sample x(0) uniformly at random from {0, 1}n .
   2: for k = 0, 1, 2, . . . do
   3:   Set y := x(k) , and flip each bit of y with probability 1/n.
   4:
                                 y      if f (y) ≥ f (x(k) )
                     x(k+1) :=
                                 x(k)   otherwise.
   5:   end for
Law of Total Probability




            E [X] = Pr (E) E [X | E] + Pr E E X | E
Example 1: (1+1) EA on LeadingOnes

                                   n    i
                        Lo(x) :=             xj
                                   i=1 j=1

               Leading 1-bits           Remaining bits

       x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
                                            Left-most 0-bit
Example 1: (1+1) EA on LeadingOnes

                                   n    i
                        Lo(x) :=             xj
                                   i=1 j=1

               Leading 1-bits           Remaining bits

       x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
                                            Left-most 0-bit

     Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0.
Example 1: (1+1) EA on LeadingOnes

                                   n    i
                        Lo(x) :=             xj
                                   i=1 j=1

               Leading 1-bits           Remaining bits

       x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
                                            Left-most 0-bit

     Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0.
     Let E be the event that only the left-most 0-bit flipped in y.
Example 1: (1+1) EA on LeadingOnes

                                      n   i
                        Lo(x) :=               xj
                                     i=1 j=1

               Leading 1-bits             Remaining bits

       x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
                                              Left-most 0-bit

     Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0.
     Let E be the event that only the left-most 0-bit flipped in y.
     The sequence Yk is non-increasing, so

       E [Yk+1 − Yk | Yk > 0 ∧ Fk ]
                      ≤ (−1) Pr (E | Yk > 0 ∧ Fk )
                                = (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en.
Example 1: (1+1) EA on LeadingOnes

                                       n   i
                         Lo(x) :=               xj
                                      i=1 j=1

                Leading 1-bits             Remaining bits

       x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .
                                               Left-most 0-bit

     Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0.
     Let E be the event that only the left-most 0-bit flipped in y.
     The sequence Yk is non-increasing, so

       E [Yk+1 − Yk | Yk > 0 ∧ Fk ]
                       ≤ (−1) Pr (E | Yk > 0 ∧ Fk )
                                 = (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en.
     By the additive drift theorem, E [τ | F0 ] ≤ enY0 ≤ en2 .
Example 2: (1+1) EA on Linear Functions
      Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define

                    f (x) := w1 x1 + w2 x2 + · · · + wn xn
Example 2: (1+1) EA on Linear Functions
      Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define

                    f (x) := w1 x1 + w2 x2 + · · · + wn xn

      Let Yk be the function value that “remains” at time k, ie
                   n             n                    n
                                            (k)                      (k)
         Yk :=          wi   −         wi x i     =         wi 1 − x i     .
                  i=1            i=1                  i=1
Example 2: (1+1) EA on Linear Functions
      Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define

                    f (x) := w1 x1 + w2 x2 + · · · + wn xn

      Let Yk be the function value that “remains” at time k, ie
                   n             n                          n
                                            (k)                           (k)
         Yk :=          wi   −         wi x i         =          wi 1 − x i     .
                  i=1            i=1                       i=1

      Let Ei be the event that only bit i flipped in y, then
                                                           E [Yk+1 − Yk | Ei ∧ Fk ]

                                                (k)
        ≤                            wi x i           −1
Example 2: (1+1) EA on Linear Functions
      Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define

                    f (x) := w1 x1 + w2 x2 + · · · + wn xn

      Let Yk be the function value that “remains” at time k, ie
                   n              n                       n
                                            (k)                          (k)
         Yk :=          wi   −         wi x i         =         wi 1 − x i     .
                  i=1            i=1                      i=1

      Let Ei be the event that only bit i flipped in y, then
                                 n
        E [Yk+1 − Yk | Fk ] ≤          Pr (Ei | Fk )E [Yk+1 − Yk | Ei ∧ Fk ]
                                i=1
                             n−1 n
             1           1                      (k)               Yk    wmin
        ≤          1−                  wi x i         −1 ≤−          ≤−
             n           n                                        en     en
                                 i=1
Example 2: (1+1) EA on Linear Functions
      Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define

                    f (x) := w1 x1 + w2 x2 + · · · + wn xn

      Let Yk be the function value that “remains” at time k, ie
                   n              n                       n
                                            (k)                          (k)
         Yk :=          wi   −         wi x i         =         wi 1 − x i     .
                  i=1            i=1                      i=1

      Let Ei be the event that only bit i flipped in y, then
                                 n
        E [Yk+1 − Yk | Fk ] ≤          Pr (Ei | Fk )E [Yk+1 − Yk | Ei ∧ Fk ]
                                i=1
                             n−1 n
             1           1                      (k)               Yk    wmin
        ≤          1−                  wi x i         −1 ≤−          ≤−
             n           n                                        en     en
                                 i=1

      By the additive drift theorem, E [τ | F0 ] ≤ en2 (wmax /wmin ).
Remarks on Example Applications

   Example 1: (1+1) EA on LeadingOnes
       The upper bound en2 is very accurate.
       The exact expression is c(n)n2 , where c(n) → (e − 1)/2 [20].

   Example 2: (1+1) EA on Linear Functions
       The upper bound en2 (wmax /wmin ) is correct, but very loose.
       The linear function BinVal has (wmax /wmin ) = 2n−1 .
       The tightest known bound is en log(n) + O(n) [22].



  =⇒ A poor choice of distance function gives an imprecise bound!
What is a good distance function?


   Theorem ([8])
   Assume Y is a homogeneous Markov chain, and τ the time to
   absorption. Then the function g(x) := E [τ | Y0 = x], satisfies

        g(x) = 0                           if x is an absorbing state
        E [g(Yk+1 ) − g(Yk ) | Fk ] = −1   otherwise.


       Distance function g gives exact expected runtime!
       But g requires complete knowledge of the expected runtime!
       Still provides insight into what is a good distance function:
            a good approximation (or guess) for the remaining runtime
Part 3 - Variable Drift
Drift may be Position-Dependant




        Constant Drift            Variable Drift
Drift may be Position-Dependant




   Idea: Find a function g : R → R
 st. the transformed stochastic
 process g(X1 ), g(X2 ), g(X3 ), . . .
 has constant drift.



                                         Variable Drift
Drift may be Position-Dependant




   Idea: Find a function g : R → R
 st. the transformed stochastic
 process g(X1 ), g(X2 ), g(X3 ), . . .
 has constant drift.



                                         Constant Drift
Jensen’s Inequality




   Theorem
   If g : R → R concave, then E [g(X) | F ] ≤ g(E [X | F ]).

       If g (x) < 0, then g is concave.
δYk
Multiplicative Drift


       a                                  Yk          b

       (M) ∀k   E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −δYk
δYk
Multiplicative Drift


       a                                      Yk                    b

       (M) ∀k    E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −δYk

   Theorem ([2, 4])
   Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0
   Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞.
       If (M) holds for a δ > 0, then E [τa | F0 ] ≤ ln(Y0 /a)/δ.
δYk
Multiplicative Drift


        a                                      Yk                   b

       (M) ∀k       E [Yk+1 | Yk > a ∧ Fk ] ≤ (1 − δ)Yk

   Theorem ([2, 4])
   Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0
   Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞.
       If (M) holds for a δ > 0, then E [τa | F0 ] ≤ ln(Y0 /a)/δ.

   Proof.
   g(s) := ln(s/a) is concave, so by Jensen’s inequality

     E [g(Yk+1 ) − g(Yk ) | Yk > a ∧ Fk ]
            ≤ ln(E [Yk+1 | Yk > a ∧ Fk ]) − ln(Yk ) ≤ ln(1 − δ) ≤ −δ.
Example: Linear Functions Revisited
      For any c ∈ (0, 1), define the distance at time k as
                                        n
                                                            (k)
                    Yk := cwmin +               wi 1 − x i
                                      i=1

      We have already seen that
                                            n
                                    1                 (k)
           E [Yk+1 − Yk | Fk ] ≤                 wi x i     −1
                                   en
                                        i=1
                                      Yk − cwmin    Yk (1 − c)
                                 =−              ≤−
                                          en            en
                                                                  1−c
      By the multiplicative drift theorem (a := cwmin and δ :=     en )

                                   en                     nwmax
                E [τa | F0 ] ≤                  ln 1 +
                                  1−c                     cwmin
h(Yk )
Variable Drift Theorem


       a                                             Yk       b

       (V) ∀k    E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −h(Yk )

   Theorem ([15, 11])
   Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0.
   Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞.
   If there exists a function h : R → R such that
       h(x) > 0 and h (x) > 0 for all x ∈ [a, b], and
       drift condition (V) holds, then

                                             Y0
                                                   1
                       E [τa | F0 ] ≤                  dz
                                         a        h(z)
h(Yk )
Variable Drift Theorem


        a                                            Yk        b

        (V) ∀k   E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −h(Yk )

   Theorem ([15, 11])
   Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0.
   Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞.
   If there exists a function h : R → R such that
       h(x) > 0 and h (x) > 0 for all x ∈ [a, b], and
       drift condition (V) holds, then

                                             Y0
                                                   1
                       E [τa | F0 ] ≤                  dz
                                         a        h(z)
   =⇒ The multiplicative drift theorem is the special case h(x) = δx.
Variable Drift Theorem: Proof
                                                    x
                                                         1
                                      g(x) :=                dz
                                                a       h(z)



                                                                   1
                                                                  h(x)


                                                Yk

   Proof.


            E [g(Yk ) − g(Yk+1 ) | Fk ]
Variable Drift Theorem: Proof
                                                    x
                                                         1
                                      g(x) :=                dz
                                                a       h(z)



                                                                   1
                                                                  h(x)


   E [Yk+1 | Fk ]                               Yk

   Proof.
   The function g is concave (g < 0), so by Jensen’s inequality

            E [g(Yk ) − g(Yk+1 ) | Fk ] ≥ g(Yk ) − g(E [Yk+1 | Fk ])
Variable Drift Theorem: Proof
                                                       x
                                                            1
                                      g(x) :=                   dz
                                                   a       h(z)



                                                                      1
                                                                     h(x)


   E [Yk+1 | Fk ]                                  Yk
                                h(Yk )

   Proof.


            E [g(Yk ) − g(Yk+1 ) | Fk ] ≥ g(Yk ) − g(E [Yk+1 | Fk ])
                                              Yk
                                                            1
                                         ≥                      dz
                                             Yk −h(Yk )    h(z)
Variable Drift Theorem: Proof
                                                       x
                                                            1
                                      g(x) :=                   dz
                                                   a       h(z)



                                                                      1
                                                          1          h(x)
                                                        h(Yk )


   E [Yk+1 | Fk ]                                  Yk
                                h(Yk )

   Proof.


            E [g(Yk ) − g(Yk+1 ) | Fk ] ≥ g(Yk ) − g(E [Yk+1 | Fk ])
                                              Yk
                                                            1
                                         ≥                      dz ≥ 1
                                             Yk −h(Yk )    h(z)
Part 4 - Supermartingale
Supermartingale
     a=0                                    Yk     b

      (S1) ∀k   E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ 0
      (S2) ∀k   Var [Yk+1 | Yk > 0 ∧ Fk ] ≥ σ 2
Supermartingale
     a=0                                      Yk                   b

      (S1) ∀k   E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ 0
      (S2) ∀k   Var [Yk+1 | Yk > 0 ∧ Fk ] ≥ σ 2
   Theorem (See eg. [16])
   Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R.
   Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞.
                                                             Y0 (2b−Y0 )
       If (S1) and (S2) hold for σ > 0, then E [τ | F0 ] ≤        σ2
Supermartingale
      a=0                                       Yk                 b

       (S1) ∀k    E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ 0
       (S2) ∀k    Var [Yk+1 | Yk > 0 ∧ Fk ] ≥ σ 2
   Theorem (See eg. [16])
   Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R.
   Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞.
                                                             Y0 (2b−Y0 )
       If (S1) and (S2) hold for σ > 0, then E [τ | F0 ] ≤        σ2
   Proof.
   Let Zk := b2 − (b − Yk )2 , and note that b − Yk ≤ E [b − Yk+1 | Fk ].

     E [Zk+1 − Zk | Fk ] = −E (b − Yk+1 )2 | Fk + (b − Yk )2
             ≤ −E (b − Yk+1 )2 | Fk + E [b − Yk+1 | Fk ]2
                 = −Var [b − Yk+1 ] = −Var [Yk+1 | Fk ] ≤ −σ 2
Part 5 - Hajek’s Theorem
Hajek’s Theorem4

   Theorem
   If there exist λ, ε0 > 0 and D < ∞ such that for all k ≥ 0
    (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
    (C2) (|Yk+1 − Yk | | Fk )         Z and E eλZ = D
   then for any δ ∈ (0, 1)
    (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a−B(1−δ)ε0 )
                                           BD
      (*) Pr (τb < B | Y0 < a) ≤         (1−δ)ηε0   · eη(a−b)
   for some η ≥ min{λ, δε0 λ2 /D} > 0.




     4
         The theorem presented here is a corollary to Theorem 2.3 in [6].
Hajek’s Theorem4

   Theorem
   If there exist λ, ε0 > 0 and D < ∞ such that for all k ≥ 0
    (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
    (C2) (|Yk+1 − Yk | | Fk )         Z and E eλZ = D
   then for any δ ∈ (0, 1)
    (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a−B(1−δ)ε0 )
                                           BD
      (*) Pr (τb < B | Y0 < a) ≤         (1−δ)ηε0   · eη(a−b)
   for some η ≥ min{λ, δε0 λ2 /D} > 0.

          If λ, ε0 , D ∈ O(1) and b − a ∈ Ω(n),
          then there exists a constant c > 0 such that

                              Pr (τb ≤ ecn | F0 ) ≤ e−Ω(n)


     4
         The theorem presented here is a corollary to Theorem 2.3 in [6].
Stochastic Dominance - (|Yk+1 − Yk | | Fk )             Z
   Definition
   Y   Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R
                                1.0


                                0.8


                                0.6


                                0.4


                                0.2


                    3   2   1         1   2   3     4
Stochastic Dominance - (|Yk+1 − Yk | | Fk )                 Z
   Definition
   Y      Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R
                                      1.0


                                      0.8


                                      0.6


                                      0.4


                                      0.2


                       3     2    1         1   2   3   4




   Example
       1. If Y ≤ Z, then Y       Z.
Stochastic Dominance - (|Yk+1 − Yk | | Fk )                                     Z
   Definition
   Y      Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R
                                              1.0


                                              0.8


                                              0.6


                                              0.4


                                              0.2


                           3       2      1            1      2    3   4




   Example
       1. If Y ≤ Z, then Y             Z.
       2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗ ).
          Then |V (Xk+1 ) − V (Xk )| d(Xk+1 , Xk )

                                   Xk+1
                                                       V (Xk+1 )
                   d(Xk , Xk+1 )
                                                                           x∗
                                                    V (Xk )
                          Xk
Condition (C2) implies that “long jumps” must be rare

   Assume that
    (C2) (|Yk+1 − Yk | | Fk )   Z and E eλZ = D

   Then for any j ≥ 0,

            Pr (|Yk+1 − Yk | ≥ j) = Pr eλ|Yk+1 −Yk | ≥ eλj

                                 ≤ E eλ|Yk+1 −Yk | e−λj

                                 ≤ E eλZ e−λj
                                 = De−λj .

   Markov’s inequality
       If X ≥ 0, then Pr (X ≥ k) ≤ E [X] /k.
Moment Generating Function (mgf) E eλZ
  Definition
  The mgf of a rv X is MX (λ) := E eλX for all λ ∈ R.
                                      (n)
      The n-th derivative at t = 0 is MX (0) = E [X n ],
      hence MX provides all moments of X, thus the name.
      If X and Y are independent rv. and a, b ∈ R, then

      MaX+bY (t) = E et(aX+bX) = E etaX E etbX = MX (at)MY (bt)
Moment Generating Function (mgf) E eλZ
  Definition
  The mgf of a rv X is MX (λ) := E eλX for all λ ∈ R.
                                        (n)
      The n-th derivative at t = 0 is MX (0) = E [X n ],
      hence MX provides all moments of X, thus the name.
      If X and Y are independent rv. and a, b ∈ R, then

      MaX+bY (t) = E et(aX+bX) = E etaX E etbX = MX (at)MY (bt)


  Example
      Let X := n Xi where Xi are independent rvs with
                  i=1
      Pr (Xi = 1) = p and Pr (Xi = 0) = 1 − p. Then

        MXi (λ) = (1 − p)eλ·0 + peλ·1
        MX (λ) = MX1 (λ)MX2 (λ) · · · MXn (λ) = (1 − p + peλ )n .
Moment Generating Functions


   Distribution                               mgf
   Bernoulli      Pr (X = 1) = p              1 − p + pet
   Binomial       X ∼ Bin(n, p)               (1 − p + pet )n
                                                   pet
   Geometric      Pr (X = k) = (1 − p)k−1 p   1−(1−p)et
                                              etb −eta
   Uniform        X ∼ U (a, b)                 t(b−a)
   Normal         X ∼ N (µ, σ 2 )             exp(tµ + 1 σ 2 t2 )
                                                       2
Moment Generating Functions


   Distribution                                 mgf
   Bernoulli      Pr (X = 1) = p                1 − p + pet
   Binomial       X ∼ Bin(n, p)                 (1 − p + pet )n
                                                     pet
   Geometric      Pr (X = k) = (1 − p)k−1 p     1−(1−p)et
                                                etb −eta
   Uniform        X ∼ U (a, b)                   t(b−a)
   Normal         X ∼ N (µ, σ 2 )               exp(tµ + 1 σ 2 t2 )
                                                         2


   The mgf. of X ∼ Bin(n, p) at t = ln(2) is

                   (1 − p + pet )n = (1 + p)n ≤ epn .
Condition (C2) often holds trivially

   Example ((1+1) EA)

   Choose x uniformly from {0, 1}n
   for k = 0, 1, 2, ...
       Set x := x(k) , and flip each bit of x with probability p.
       If f (x ) ≥ f (x(k) ), then x(k+1) := x else x(k+1) := x(k)
Condition (C2) often holds trivially

   Example ((1+1) EA)

   Choose x uniformly from {0, 1}n
   for k = 0, 1, 2, ...
       Set x := x(k) , and flip each bit of x with probability p.
       If f (x ) ≥ f (x(k) ), then x(k+1) := x else x(k+1) := x(k)

   Assume
       Fitness function f has unique maximum x∗ ∈ {0, 1}n .
       Distance function is g(x) = H(x, x∗ )
   Then
       |g(x(k+1) ) − g(x(k) )|   Z where Z := H(x(k) , x )
       Z ∼ Bin(n, p) so E eλZ ≤ enp for λ = ln(2)
Simple Application (1+1) EA on Needle
  (1+1) EA with mutation rate p = 1/n on

                              n                 Yk := H(x(k) , 0n )
     Needle(x) :=                   xi           a := (3/4)n
                              i=1
                                                 b := n

  Condition (C2) satisfied5 with D = E eλZ ≤ e where λ = ln(2).
  Condition (C1) satisfied for ε0 := 1/2 because

           E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ (n − a)p − ap = −ε0 .

  Thus, η ≥ min{λ, δε0 λ2 /D} > 1/25 when δ = 1/2 and

        Pr (τa > n + k | F0 ) ≤ e(1/25)(Y0 −a−(n+k)(1−δ)ε0 ) ≤ e−k/100
    Pr τb < en/200 | F0 = e−Ω(n)

    5
        See previous slide.
Proof overview
   Theorem (2.3 in [6])
   Assume that there exists 0 < ρ < 1 and D ≥ 1 such that
    (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk
    (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa
   Then
    (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ).
    (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ).
     (*) Pr (τb < B | Y0 < a) ≤ eη(a−b) BD/(1 − ρ)
    (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk

   Lemma
   Assume that there exists a ε0 > 0 such that
    (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
    (C2) (|Yk+1 − Yk | | Fk )    Z and E eλZ = D < ∞ for a λ > 0.
   then (D1) and (D2) hold for some η and ρ < 1
Theorem
 (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ
 (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D
Assume that (D1) and (D2) hold. Then
(2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ).

Proof.
Theorem
 (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk
 (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa
Assume that (D1) and (D2) hold. Then
 (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ).

Proof.
By the law of total probability, and the conditions (D1) and (D2)

                 E eηYk+1 | Fk ≤ ρeηYk + Deηa                  (1)
Theorem
 (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk
 (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa
Assume that (D1) and (D2) hold. Then
 (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ).

Proof.
By the law of total probability, and the conditions (D1) and (D2)

                 E eηYk+1 | Fk ≤ ρeηYk + Deηa                  (1)

By the law of total expectation,

   E eηYk+1 | F0 = E E eηYk+1 | Fk | F0
Theorem
 (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk
 (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa
Assume that (D1) and (D2) hold. Then
 (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ).

Proof.
By the law of total probability, and the conditions (D1) and (D2)

                 E eηYk+1 | Fk ≤ ρeηYk + Deηa                  (1)

By the law of total expectation, Ineq. (1),

   E eηYk+1 | F0 = E E eηYk+1 | Fk | F0
                    ≤ ρE eηYk | F0 + Deηa
Theorem
 (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk
 (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa
Assume that (D1) and (D2) hold. Then
 (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ).

Proof.
By the law of total probability, and the conditions (D1) and (D2)

                 E eηYk+1 | Fk ≤ ρeηYk + Deηa                       (1)

By the law of total expectation, Ineq. (1), and induction on k

   E eηYk+1 | F0 = E E eηYk+1 | Fk | F0
                   ≤ ρE eηYk | F0 + Deηa
                   ≤ ρk eηY0 + (1 + ρ + ρ2 + · · · + ρk−1 )Deηa .
Proof of (2.8)
   Theorem
    (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ
    (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D
   Assume that (D1) and (D2) hold. Then
    (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ).
    (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ).

   Proof.
   (2.8) follows from Markov’s inequality and (2.6)

             Pr (Yk+1 ≥ b | F0 ) = Pr eηYk+1 ≥ eηb | F0
                                  ≤ E eηYk+1 | F0 e−ηb
Proof of (*)
   Theorem
    (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ
    (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D
   Assume that (D1) and (D2) hold for D ≥ 1. Then
    (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ).
     (*) Pr (τb < B | Y0 < a) ≤ eη(a−b) BD/(1 − ρ)
Proof of (*)
   Theorem
    (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ
    (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D
   Assume that (D1) and (D2) hold for D ≥ 1. Then
    (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ).
     (*) Pr (τb < B | Y0 < a) ≤ eη(a−b) BD/(1 − ρ)

   Proof.
   By the union bound and (2.8)
                                      B
     Pr (τb < B | Y0 < a ∧ F0 ) ≤          Pr (Yk ≥ b | Y0 < a ∧ F0 )
                                     k=1
                          B
                                                1 − ρk       BDeη(a−b)
                      ≤         Deη(a−b) ρk +            ≤
                                                1−ρ           1−ρ
                          k=1
Union Bound

                                                         Ω




                        E1        E2            E4



                                     E3




     Pr (E1 ∨ E2 ∨ · · · ∨ Ek ) ≤ Pr (E1 ) + Pr (E2 ) + · · · + Pr (Ek )
Proof of (2.9)
   Theorem
    (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ
   Assume that (D1) hold. Then
    (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk
Proof of (2.9)
   Theorem
    (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk
   Assume that (D1) hold. Then
    (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk

   Proof.
   By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

            eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0   (2)
Proof of (2.9)
   Theorem
    (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk
   Assume that (D1) hold. Then
    (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk

   Proof.
   By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

            eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0      (2)

   By (2) and the law of total probability

        eηY0 ≥ Pr (τa > k | F0 ) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0
Proof of (2.9)
   Theorem
    (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk
   Assume that (D1) hold. Then
    (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk

   Proof.
   By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

            eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0      (2)

   By (2) and the law of total probability

        eηY0 ≥ Pr (τa > k | F0 ) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0

              = Pr (τa > k | F0 ) E eηYk ρ−k | τa > k ∧ F0
Proof of (2.9)
   Theorem
    (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk
   Assume that (D1) hold. Then
    (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk

   Proof.
   By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

            eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0      (2)

   By (2) and the law of total probability

        eηY0 ≥ Pr (τa > k | F0 ) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0

              = Pr (τa > k | F0 ) E eηYk ρ−k | τa > k ∧ F0
              ≥ Pr (τa > k | F0 ) eηa ρ−k
(C1) and (C2) =⇒ (D1)
   (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
   (C2) (|Yk+1 − Yk | | Fk )   Z and E eλZ = D < ∞ for a λ > 0.
   (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ

  Lemma
  Assume (C1) and (C2). Then (D1) holds when ρ ≥ 1 − ηε0 + η 2 c,
                                           k−2
  and 0 < η ≤ min{λ, ε0 /c} where c := ∞ λ k! E Z k .
                                       k=2

  Proof.
  Let X := (Yk+1 − Yk | Yk > a ∧ Fk ).
  By (C2) it holds, |X| Z, so E X k ≤ E |X|k ≤ E Z k .
  From ex = ∞ xk /(k!) and linearity of expectation
                k=0

                                         ∞
                   ηX                          ηk
           0<E e        = 1 + ηE [X] +            E X k ≤ ρ.
                                               k!
                                         k=2
(C2) =⇒ (D2)

   (C2) (|Yk+1 − Yk | | Fk )   Z and E eλZ = D < ∞ for a λ > 0.
   (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D

  Theorem
  Assume (C2) and 0 < η ≤ λ. Then (D2) holds.

  Proof.
  If Yk ≤ a then Yk+1 − a ≤ Yk+1 − Yk ≤ |Yk+1 − Yk |, so

    E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ E eλ|Yk+1 −Yk | | Yk ≤ a ∧ Fk

  Furthermore, by (C2)

            E eλ|Yk+1 −Yk | | Yk ≤ a ∧ Fk ≤ E eλZ = D.
(C1) and (C2) =⇒ (D1) and (D2)
   (C1)   E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
   (C2)   (|Yk+1 − Yk | | Fk ) Z and E eλZ = D < ∞ for a λ > 0.
   (D1)   E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ
   (D2)   E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D
  Lemma
  Assume (C1) and (C2). Then (D1) and (D2) hold when

           ρ ≥ 1 − ηε0 + η 2 c   and   0 < η ≤ min{λ, ε0 /c}

                 ∞ λk−2
  where c :=     k=2 k! E   Z k = (D − 1 − λE [Z])λ−2 > 0.
  Corollary
  Assume (C1), (C2) and 0 < δ < 1. Then

          η := min{λ, δε0 /c}
          ρ := 1 − (1 − δ)ηε0
(C1) and (C2) =⇒ (D1) and (D2)
   (C1)   E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
   (C2)   (|Yk+1 − Yk | | Fk ) Z and E eλZ = D < ∞ for a λ > 0.
   (D1)   E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ
   (D2)   E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D
  Lemma
  Assume (C1) and (C2). Then (D1) and (D2) hold when

           ρ ≥ 1 − ηε0 + η 2 c   and   0 < η ≤ min{λ, ε0 /c}

                 ∞ λk−2
  where c :=     k=2 k! E   Z k = (D − 1 − λE [Z])λ−2 > 0.
  Corollary
  Assume (C1), (C2) and 0 < δ < 1. Then (D1) and (D2) hold for

          η := min{λ, δε0 /c} =⇒ δε0 ≥ ηc
          ρ := 1 − (1 − δ)ηε0 = 1 − ηε0 + ηδε0 ≥ 1 − ηε + η 2 c
Reformulation of Hajek’s Theorem
   Theorem
   If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0
    (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
    (C2) (|Yk+1 − Yk | | Fk )   Z and E eλZ = D
   then for any δ ∈ (0, 1)
    (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a) ρB
                                    BD
      (*) Pr (τb < B | Y0 < a) ≤   (1−ρ)   · eη(a−b)
   where η := min{λ, δε0 /c} and ρ := 1 − (1 − δ)ηε0
Reformulation of Hajek’s Theorem
   Theorem
   If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0
    (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
    (C2) (|Yk+1 − Yk | | Fk )    Z and E eλZ = D
   then for any δ ∈ (0, 1)
    (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a) ρB
                                    BD
      (*) Pr (τb < B | Y0 < a) ≤   (1−ρ)   · eη(a−b)
   where η := min{λ, δε0 /c} and ρ := 1 − (1 − δ)ηε0

    1. Note that ln(ρ) ≤ ρ − 1 = −(1 − δ)ηε0 so

              Pr (τa > B | F0 ) ≤ eη(Y0 −a) ρB = eη(Y0 −a) eB ln(ρ)
                                 ≤ eη(Y0 −a−B(1−δ)ε0 ) .

    2. c = (D − 1 − λE [Z])λ−2 < D/λ2 so η ≥ min{λ, δε0 λ2 /D}.
Reformulation of Hajek’s Theorem
   Theorem
   If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0
    (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0
    (C2) (|Yk+1 − Yk | | Fk )   Z and E eλZ = D
   then for any δ ∈ (0, 1)
    (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a−B(1−δ)ε0 )
                                     BD
      (*) Pr (τb < B | Y0 < a) ≤   (1−δ)ηε0   · eη(a−b)
   for some η ≥ min{λ, δε0 λ2 /D}.
Simplified Drift Theorem [17]
   We have already seen that
    (C2) (|Yk+1 − Yk | | Fk )          Z and E eλZ = D
   implies Pr (|Yk+1 − Yk | ≥ j) ≤ De−λj for all j ∈ N0 .




     6
         See [17] for the exact statement.
Simplified Drift Theorem [17]
   We have already seen that
    (C2) (|Yk+1 − Yk | | Fk )          Z and E eλZ = D
   implies Pr (|Yk+1 − Yk | ≥ j) ≤ De−λj for all j ∈ N0 .

   The simplified drift theorem replaces (C2) with
     (S) Pr (Yk+1 − Yk ≥ j | Yk < b) ≤ r(n)(1 + δ)−j for all j ∈ N0 .
   and with some additional assumptions, provides a bound of type6

                          Pr τb < 2c(b−a) ≤ 2−Ω(b−a) .                 (3)


          Until 2008, conditions (D1) and (D2) were used in EC.
          (D1) and (D2) can lead to highly tedious calculations.
          Oliveto and Witt were the first in EC to point out that the
          much simpler to verify (C1), along with (S) is sufficient.
     6
         See [17] for the exact statement.
Part 6 - Population Drift
Drift Analysis of Population-based Evolutionary Algorithms


       Evolutionary algorithms generally use populations.
       So far, we have analysed the drift of the (1+1) EA,
       ie an evolutionary algorithm with population size one.
       The state aggregation problem makes analysis of
       population-based EAs with classical drift theorems difficult:
       How to define an appropriate distance function?
            Should reflect the progress of the algorithm
            Often hard to define for single-individual algorithms
            Highly non-trivial for population-based algorithms

   =⇒ This part of the tutorial focuses on a drift theorem for
      populations which alleviates the state aggregation problem.
Population-based Evolutionary Algorithms

                                                        x
                                                                   Pt




   Require: ,
     Finite set X , and initial population P0 ∈ X λ
     Selection mechanism psel : X λ × X → [0, 1]
     Variation operator pmut : X × X → [0, 1]

     for t = 0, 1, 2, . . . until termination condition do
       for i = 1 to λ do
          Sample i-th parent x according to psel (Pt , ·)
          Sample i-th offspring Pt+1 (i) according to pmut (x, ·)
       end for
     end for
Selection and Variation - Example




                                 Tournament Selection wrt



                10001110111011
                                 Bitwise Mutation
                10010110111001
Population Drift




   Central Parameters
       Reproductive rate of selection mechanism psel

                 α0 = max E [#offspring from parent j],
                        1≤j≤λ

       Random walk process corresponding to variation operator pmut

                                Xk+1 ∼ pmut (Xk )
Population Drift




   Central Parameters
       Reproductive rate of selection mechanism psel

                 α0 = max E [#offspring from parent j],
                        1≤j≤λ

       Random walk process corresponding to variation operator pmut

                                Xk+1 ∼ pmut (Xk )
Population Drift




   Central Parameters
       Reproductive rate of selection mechanism psel

                 α0 = max E [#offspring from parent j],
                        1≤j≤λ

       Random walk process corresponding to variation operator pmut

                                Xk+1 ∼ pmut (Xk )
Population Drift




   Central Parameters
       Reproductive rate of selection mechanism psel

                 α0 = max E [#offspring from parent j],
                        1≤j≤λ

       Random walk process corresponding to variation operator pmut

                                Xk+1 ∼ pmut (Xk )
Population Drift [12]


 (C1P) ∀k       E eκ(g(Xk+1 )−g(Xk )) | a < g(Xk ) < b < 1/α0

   Theorem
   Define τb := min{k ≥ 0 | g(Pk (i)) > b for some i ∈ [λ]}.
   If there exists constants α0 ≥ 1 and κ > 0 such that
          psel has reproductive rate less than α0
          the random walk process corresponding to pmut satisfies (C1P)
   and some other conditions hold,7 then for some constants c, c > 0

                           Pr τb ≤ ec(b−a) = e−c (b−a)




     7
         Some details are omitted. See Theorem 1 in [12] for all details.
Population Drift: Decoupling Selection & Variation

 Population drift
 If there exists a κ > 0 such that

           M∆mut (κ) < 1/α0

 where

       ∆mut = g(Xk+1 ) − g(Xk )
       Xk+1 ∼ pmut (Xk )

 and

 α0 = max E [#offspring from parent j],
          j

 then the runtime is exponential.
Population Drift: Decoupling Selection & Variation

 Population drift                        Classical drift [6]
 If there exists a κ > 0 such that       If there exists a κ > 0 such that

           M∆mut (κ) < 1/α0                         M∆ (κ) < 1

 where                                   where

       ∆mut = g(Xk+1 ) − g(Xk )               ∆ = h(Pk+1 ) − h(Pk ),
       Xk+1 ∼ pmut (Xk )

 and

 α0 = max E [#offspring from parent j],
          j

 then the runtime is exponential.        then the runtime is exponential.
Conclusion


      Drift analysis is a powerful tool for analysis of EAs
             Mainly used in EC to bound the expected runtime of EAs
             Useful when the EA has non-monotonic progress,
             eg. when the fitness value is a poor indicator of progress
      The “art” consists in finding a good distance function
             No simple receipe
      A large number of drift theorems are available
             Additive, multiplicative, variable, population drift...
             Significant related literature from other fields than EC
      Not the only tool in the toolbox, also
             Artificial fitness levels, Markov Chain theory, Concentration of
             measure, Branching processes, Martingale theory, Probability
             generating functions, ...
Acknowledgements




  Thanks to
       David Hodge
       Carsten Witt, and
       Daniel Johannsen
  for insightful discussions.
References I

   [1]   Benjamin Doerr and Leslie Ann Goldberg.
         Drift analysis with tail bounds.
         In Proceedings of the 11th international conference on Parallel problem solving
         from nature: Part I, PPSN’10, pages 174–183, Berlin, Heidelberg, 2010.
         Springer-Verlag.
   [2]   Benjamin Doerr, Daniel Johannsen, and Carola Winzen.
         Multiplicative drift analysis.
         In GECCO ’10: Proceedings of the 12th annual conference on Genetic and
         evolutionary computation, pages 1449–1456, New York, NY, USA, 2010. ACM.
   [3]   Stefan Droste, Thomas Jansen, and Ingo Wegener.
         On the analysis of the (1+1) Evolutionary Algorithm.
         Theoretical Computer Science, 276:51–81, 2002.
   [4]   Simon Fischer, Lars Olbrich, and Berthold V¨cking.
                                                     o
         Approximating wardrop equilibria with finitely many agents.
         Distributed Computing, 21(2):129–139, 2008.
   [5]   Oliver Giel and Ingo Wegener.
         Evolutionary algorithms and the maximum matching problem.
         In Proceedings of the 20th Annual Symposium on Theoretical Aspects of
         Computer Science (STACS 2003), pages 415–426, 2003.
References II
   [6]   Bruce Hajek.
         Hitting-time and occupation-time bounds implied by drift analysis with
         applications.
         Advances in Applied Probability, 14(3):502–525, 1982.
   [7]   Jun He and Xin Yao.
         Drift analysis and average time complexity of evolutionary algorithms.
         Artificial Intelligence, 127(1):57–85, March 2001.
   [8]   Jun He and Xin Yao.
         A study of drift analysis for estimating computation time of evolutionary
         algorithms.
         Natural Computing, 3(1):21–35, 2004.
   [9]   Jens J¨gersk¨pper.
               a     u
         Algorithmic analysis of a basic evolutionary algorithm for continuous
         optimization.
         Theoretical Computer Science, 379(3):329–347, 2007.
   [10] Jens J¨gersk¨pper.
              a     u
        A Blend of Markov-Chain and Drift Analysis.
        In Proceedings of the 10th International Conference on Parallel Problem Solving
        from Nature (PPSN 2008), 2008.
References III
   [11] Daniel Johannsen.
        Random combinatorial structures and randomized search heuristics.
        PhD thesis, Universit¨t des Saarlandes, 2010.
                             a
   [12] Per Kristian Lehre.
        Negative drift in populations.
        In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume
        6238 of LNCS, pages 244–253. Springer Berlin / Heidelberg, 2011.
   [13] Per Kristian Lehre and Carsten Witt.
        Black-box search by unbiased variation.
        Algorithmica, pages 1–20, 2012.
   [14] Sean P. Meyn and Richard L. Tweedie.
        Markov Chains and Stochastic Stability.
        Springer-Verlag, 1993.
   [15] B. Mitavskiy, J. E. Rowe, and C. Cannings.
        Theoretical analysis of local search strategies to optimize network communication
        subject to preserving the total number of links.
        International Journal of Intelligent Computing and Cybernetics, 2(2):243–284,
        2009.
   [16] Frank Neumann, Dirk Sudholt, and Carsten Witt.
        Analysis of different mmas aco algorithms on unimodal functions and plateaus.
        Swarm Intelligence, 3(1):35–68, 2009.
References IV

   [17] Pietro Oliveto and Carsten Witt.
        Simplified drift analysis for proving lower bounds inevolutionary computation.
        Algorithmica, pages 1–18, 2010.
        10.1007/s00453-010-9387-z.
   [18] Pietro S. Oliveto and Carsten Witt.
        Simplified drift analysis for proving lower bounds in evolutionary computation.
        Technical Report Reihe CI, No. CI-247/08, SFB 531, Technische Universit¨t a
        Dortmund, Germany, 2008.
   [19] Galen H. Sasaki and Bruce Hajek.
        The time complexity of maximum matching by simulated annealing.
        Journal of the ACM, 35(2):387–403, 1988.
   [20] Dirk Sudholt.
        General lower bounds for the running time of evolutionary algorithms.
        In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume
        6238 of LNCS, pages 124–133. Springer Berlin / Heidelberg, 2010.
   [21] David Williams.
        Probability with Martingales.
        Cambridge University Press, 1991.
References V




   [22] Carsten Witt.
        Optimizing linear functions with randomized search heuristics - the robustness of
        mutation.
        In Christoph D¨rr and Thomas Wilke, editors, 29th International Symposium on
                       u
        Theoretical Aspects of Computer Science (STACS 2012), volume 14 of Leibniz
        International Proceedings in Informatics (LIPIcs), pages 420–431, Dagstuhl,
        Germany, 2012. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.

More Related Content

What's hot

プログラミングコンテストでのデータ構造
プログラミングコンテストでのデータ構造プログラミングコンテストでのデータ構造
プログラミングコンテストでのデータ構造Takuya Akiba
 
凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜
凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜
凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜Tomoki Yoshida
 
ドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみようドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみよう増田 亨
 
プログラミングコンテストでの乱択アルゴリズム
プログラミングコンテストでの乱択アルゴリズムプログラミングコンテストでの乱択アルゴリズム
プログラミングコンテストでの乱択アルゴリズムTakuya Akiba
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
AtCoder Regular Contest 029 解説
AtCoder Regular Contest 029 解説AtCoder Regular Contest 029 解説
AtCoder Regular Contest 029 解説AtCoder Inc.
 
מבוא למדעי הנתונים הרצאה 1
מבוא למדעי הנתונים הרצאה 1מבוא למדעי הנתונים הרצאה 1
מבוא למדעי הנתונים הרצאה 1Igor Kleiner
 
MediaPipeの紹介
MediaPipeの紹介MediaPipeの紹介
MediaPipeの紹介emakryo
 
AtCoder Beginner Contest 034 解説
AtCoder Beginner Contest 034 解説AtCoder Beginner Contest 034 解説
AtCoder Beginner Contest 034 解説AtCoder Inc.
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Dongmin Choi
 
第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)
第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)
第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)tomitomi3 tomitomi3
 
リッチなドメインモデル 名前探し
リッチなドメインモデル 名前探しリッチなドメインモデル 名前探し
リッチなドメインモデル 名前探し増田 亨
 
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
 
[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章Haruki Eguchi
 
最適輸送入門
最適輸送入門最適輸送入門
最適輸送入門joisino
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontElectronic Arts / DICE
 

What's hot (20)

プログラミングコンテストでのデータ構造
プログラミングコンテストでのデータ構造プログラミングコンテストでのデータ構造
プログラミングコンテストでのデータ構造
 
双対性
双対性双対性
双対性
 
凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜
凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜
凸最適化 〜 双対定理とソルバーCVXPYの紹介 〜
 
ドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみようドメイン駆動設計 ( DDD ) をやってみよう
ドメイン駆動設計 ( DDD ) をやってみよう
 
プログラミングコンテストでの乱択アルゴリズム
プログラミングコンテストでの乱択アルゴリズムプログラミングコンテストでの乱択アルゴリズム
プログラミングコンテストでの乱択アルゴリズム
 
Fractional cascading
Fractional cascadingFractional cascading
Fractional cascading
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
AtCoder Regular Contest 029 解説
AtCoder Regular Contest 029 解説AtCoder Regular Contest 029 解説
AtCoder Regular Contest 029 解説
 
מבוא למדעי הנתונים הרצאה 1
מבוא למדעי הנתונים הרצאה 1מבוא למדעי הנתונים הרצאה 1
מבוא למדעי הנתונים הרצאה 1
 
MediaPipeの紹介
MediaPipeの紹介MediaPipeの紹介
MediaPipeの紹介
 
AtCoder Beginner Contest 034 解説
AtCoder Beginner Contest 034 解説AtCoder Beginner Contest 034 解説
AtCoder Beginner Contest 034 解説
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]
 
Arc 010 d
Arc 010 dArc 010 d
Arc 010 d
 
第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)
第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)
第9回 KAIM 金沢人工知能勉強会 進化的計算と最適化 / Evolutionary computation and optimization(移行済)
 
リッチなドメインモデル 名前探し
リッチなドメインモデル 名前探しリッチなドメインモデル 名前探し
リッチなドメインモデル 名前探し
 
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
 
[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章[第2版] Python機械学習プログラミング 第4章
[第2版] Python機械学習プログラミング 第4章
 
ARC#003D
ARC#003DARC#003D
ARC#003D
 
最適輸送入門
最適輸送入門最適輸送入門
最適輸送入門
 
Photogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars BattlefrontPhotogrammetry and Star Wars Battlefront
Photogrammetry and Star Wars Battlefront
 

Recently uploaded

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Recently uploaded (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

Drift Analysis - A Tutorial

  • 1. Drift Analysis - A Tutorial Per Kristian Lehre ASAP Research Group School of Computer Science University of Nottingham, UK PerKristian.Lehre@nottingham.ac.uk July 6, 2012
  • 2.
  • 3. What is Drift Analysis?
  • 4. What is Drift Analysis? Prediction of the long term behaviour of a process X hitting time, stability, occupancy time etc. from properties of ∆.
  • 5. What is Drift Analysis1 ? “Drift” ≥ ε0 a Yk b 1 NB! Drift is a different concept than genetic drift in evolutionary biology.
  • 6. Runtime of (1+1) EA on Linear Functions [3] Droste, Jansen & Wegener (2002)
  • 7. Runtime of (1+1) EA on Linear Functions [2] Doerr, Johannsen, and Winzen (GECCO 2010)
  • 8. Some history Origins Stability of equilibria in ODEs (Lyapunov, 1892) Stability of Markov Chains (see eg [14]) 1982 paper by Hajek [6] Simulated annealing [19] Drift Analysis of Evolutionary Algorithms Introduced to EC in 2001 by He and Yao [7, 8] (1+1) EA on linear functions: O(n ln n) [7] (1+1) EA on maximum matching by Giel and Wegener [5] Simplified drift in 2008 by Oliveto and Witt [18] Multiplicative drift by Doerr et al [2] (1+1) EA on linear functions: en ln(n) + O(n) [22] Variable drift by Johannsen [11] and Mitavskiy et al. [15] Population drift by L. [12]
  • 9. About this tutorial... Assumes no or little background in probability theory Main focus will be on drift theorems and their proofs Some theorems are presented in a simplified form full details are available in the references A few simple applications will be shown Please feel free to interrupt me with questions!
  • 10. General Assumptions a Yk b Xk a stochastic process2 in some general state space X Yk := g(Xk ) were g : X → R is a “distance function” Two stopping times τa and τb τa := min{k ≥ 0 | Yk ≤ a} τb := min{k ≥ 0 | Yk ≥ b} where we assume −∞ ≤ a < b < ∞ and Y0 ∈ (a, b). 2 not necessarily Markovian.
  • 11. Overview of Tutorial E [Yk+1 − Yk | Fk ] a Yk b 3 Drift Condition Statement Note E [Yk+1 | Fk ] ≤ Yk − ε0 E [τa ] ≤ Additive drift [7, 10] 3 Some drift theorems need additional conditions.
  • 12. Overview of Tutorial E [Yk+1 − Yk | Fk ] a Yk b 3 Drift Condition Statement Note E [Yk+1 | Fk ] ≤ Yk − ε0 E [τa ] ≤ Additive drift [7, 10] E [Yk+1 | Fk ] ≥ Yk − ε0 ≤ E [τa ] Additive drift (lower b.) [7, 9] 3 Some drift theorems need additional conditions.
  • 13. Overview of Tutorial E [Yk+1 − Yk | Fk ] a Yk b 3 Drift Condition Statement Note E [Yk+1 | Fk ] ≤ Yk − ε0 E [τa ] ≤ Additive drift [7, 10] E [Yk+1 | Fk ] ≥ Yk − ε0 ≤ E [τa ] Additive drift (lower b.) [7, 9] E [Yk+1 | Fk ] ≤ Yk E [τa ] ≤ Supermartingale [16] 3 Some drift theorems need additional conditions.
  • 14. Overview of Tutorial E [Yk+1 − Yk | Fk ] a Yk b 3 Drift Condition Statement Note E [Yk+1 | Fk ] ≤ Yk − ε0 E [τa ] ≤ Additive drift [7, 10] E [Yk+1 | Fk ] ≥ Yk − ε0 ≤ E [τa ] Additive drift (lower b.) [7, 9] E [Yk+1 | Fk ] ≤ Yk E [τa ] ≤ Supermartingale [16] E [Yk+1 | Fk ] ≤ (1 − δ)Yk E [τa ] ≤ Multiplicative drift [2, 4] Pr (τa > B3 ) ≤ [1] E [Yk+1 | Fk ] ≥ (1 − δ)Yk ≤ E [τa ] Multipl. drift (lower b.) [13] 3 Some drift theorems need additional conditions.
  • 15. Overview of Tutorial E [Yk+1 − Yk | Fk ] a Yk b 3 Drift Condition Statement Note E [Yk+1 | Fk ] ≤ Yk − ε0 E [τa ] ≤ Additive drift [7, 10] E [Yk+1 | Fk ] ≥ Yk − ε0 ≤ E [τa ] Additive drift (lower b.) [7, 9] E [Yk+1 | Fk ] ≤ Yk E [τa ] ≤ Supermartingale [16] E [Yk+1 | Fk ] ≤ (1 − δ)Yk E [τa ] ≤ Multiplicative drift [2, 4] Pr (τa > B3 ) ≤ [1] E [Yk+1 | Fk ] ≥ (1 − δ)Yk ≤ E [τa ] Multipl. drift (lower b.) [13] E [Yk+1 | Fk ] ≤ Yk − h(Yk ) E [τa ] ≤ Variable drift [11] 3 Some drift theorems need additional conditions.
  • 16. Overview of Tutorial E [Yk+1 − Yk | Fk ] a Yk b 3 Drift Condition Statement Note E [Yk+1 | Fk ] ≤ Yk − ε0 E [τa ] ≤ Additive drift [7, 10] Pr (τa > B1 ) ≤ [6] Pr (τb < B2 ) ≤ Simplified drift [6, 17] E [Yk+1 | Fk ] ≥ Yk − ε0 ≤ E [τa ] Additive drift (lower b.) [7, 9] E [Yk+1 | Fk ] ≤ Yk E [τa ] ≤ Supermartingale [16] E [Yk+1 | Fk ] ≤ (1 − δ)Yk E [τa ] ≤ Multiplicative drift [2, 4] Pr (τa > B3 ) ≤ [1] E [Yk+1 | Fk ] ≥ (1 − δ)Yk ≤ E [τa ] Multipl. drift (lower b.) [13] E [Yk+1 | Fk ] ≤ Yk − h(Yk ) E [τa ] ≤ Variable drift [11] 3 Some drift theorems need additional conditions.
  • 17. Overview of Tutorial E [Yk+1 − Yk | Fk ] a Yk b 3 Drift Condition Statement Note E [Yk+1 | Fk ] ≤ Yk − ε0 E [τa ] ≤ Additive drift [7, 10] Pr (τa > B1 ) ≤ [6] Pr (τb < B2 ) ≤ Simplified drift [6, 17] E [Yk+1 | Fk ] ≥ Yk − ε0 ≤ E [τa ] Additive drift (lower b.) [7, 9] E [Yk+1 | Fk ] ≤ Yk E [τa ] ≤ Supermartingale [16] E [Yk+1 | Fk ] ≤ (1 − δ)Yk E [τa ] ≤ Multiplicative drift [2, 4] Pr (τa > B3 ) ≤ [1] E [Yk+1 | Fk ] ≥ (1 − δ)Yk ≤ E [τa ] Multipl. drift (lower b.) [13] E [Yk+1 | Fk ] ≤ Yk − h(Yk ) E [τa ] ≤ Variable drift [11] λYk E eλYk+1 | Fk ≤ eα0 Pr (τb < B) ≤ Population drift [12] 3 Some drift theorems need additional conditions.
  • 18. Part 1 - Basic Probability Theory
  • 19. Basic Probability Theory Probability Triple (Ω, F , Pr) Ω : Sample space F : σ-algebra (family of events) Pr : F → R probability function (satisfying probability axioms) Ω
  • 20. Basic Probability Theory Probability Triple (Ω, F , Pr) Ω : Sample space F : σ-algebra (family of events) Pr : F → R probability function (satisfying probability axioms) Ω
  • 21. Basic Probability Theory Probability Triple (Ω, F , Pr) Ω : Sample space F : σ-algebra (family of events) Pr : F → R probability function (satisfying probability axioms) Events E ∈F Ω
  • 22. Basic Probability Theory R Probability Triple (Ω, F , Pr) X Ω : Sample space 6 F : σ-algebra (family of events) 5 Pr : F → R probability function (satisfying probability axioms) 4 Events E ∈F 3 Random Variable 2 X : Ω → R and X −1 : B → F X = y ⇐⇒ {ω ∈ Ω | X(ω) = y} 1 Ω
  • 23. Basic Probability Theory R Probability Triple (Ω, F , Pr) X Ω : Sample space 6 F : σ-algebra (family of events) 5 Pr : F → R probability function (satisfying probability axioms) 4 Events E ∈F 3 Random Variable 2 X : Ω → R and X −1 : B → F X = y ⇐⇒ {ω ∈ Ω | X(ω) = y} 1 Expectation Ω E [X] := y y Pr (X = y).
  • 24. Conditional Expectation Pr (X = x ∧ E) Pr (X = x | E) = Pr (E)
  • 25. Conditional Expectation Pr (X = x ∧ E) E [X | E] := x Pr (X = x | E) = x x x Pr (E) E [X | Z = z]
  • 26. Conditional Expectation Pr (X = x ∧ E) E [X | E] := x Pr (X = x | E) = x x x Pr (E) E [X | Z = z] E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)
  • 27. Conditional Expectation Pr (X = x ∧ E) E [X | E] := x Pr (X = x | E) = x x x Pr (E) E [X | Z = z] E [X | Z](ω) = E [X | Z = z] , where z = Z(ω) Definition Y = E [X | G ] if 1. Y is G -measurable, ie, Y −1 (A) ∈ G for all A ∈ B 2. E [|Y |] < ∞ 3. E [Y IF ] = E [XIF ] for all F ∈ G
  • 28. Three Properties E [a1 X1 + a2 X2 | G ] = a1 E [X1 | G ] + a2 E [X2 | G ] If X is G -measurable then E [X | G ] = X If H ⊂ G , then E [E [X | G ] | H ] = E [X | H ]
  • 29. Stochastic Processes and Filtration Definition A stochastic process is a sequence of rv Y1 , Y2 , . . . A filtration is an increasing family of sub-σ-algebras of F F0 ⊆ F1 ⊆ · · · ⊆ F A stochastic process Yk is adapted to a filtration Fk if Yk is Fk -measurable for all k =⇒ Informally, Fk represents the information that has been revealed about the process during the first k steps.
  • 30. Stopping Time Definition A rv. τ : Ω → N is called a stopping time if for all k ≥ 0 {τ ≤ k} ∈ Fk The information obtained until step k is sufficient to decide whether the event {τ ≤ k} is true or not. Example The smallest k such that Yk < a in a stochastic process. The runtime of an evolutionary algorithm
  • 31. Martingales Definition (Supermartingale) Any process Y st ∀k 1. Y is adapted to F 2. E [|Yk |] < ∞ 0 20 40 60 80 100 3. E [Yk+1 | Fk ] ≤ Yk k
  • 32. Martingales Definition (Supermartingale) Any process Y st ∀k 1. Y is adapted to F 2. E [|Yk |] < ∞ 0 20 40 60 80 100 3. E [Yk+1 | Fk ] ≤ Yk k Example Let ∆1 , ∆2 , . . . be rvs with −∞ < E [∆k+1 | Fk ] ≤ −ε0 for k ≥ 0 Then the following sequence is a super-martingale Yk := ∆1 + · · · + ∆k E [Yk+1 | Fk ] = ∆1 + · · · + ∆k + E [∆k+1 | Fk ] ≤ ∆ 1 + · · · + ∆ k − ε0 < Yk
  • 33. Martingales Definition (Supermartingale) Any process Y st ∀k 1. Y is adapted to F 2. E [|Yk |] < ∞ 0 20 40 60 80 100 3. E [Yk+1 | Fk ] ≤ Yk k Example Let ∆1 , ∆2 , . . . be rvs with −∞ < E [∆k+1 | Fk ] ≤ −ε0 for k ≥ 0 Then the following sequence is a super-martingale Yk := ∆1 + · · · + ∆k Zk := Yk + kε0 E [Zk+1 | Fk ] = ∆1 + · · · + ∆k + E [∆k+1 | Fk ] + (k + 1)ε0 ≤ ∆1 + · · · + ∆k − ε0 + (k + 1)ε0 = Zk
  • 34. Martingales Lemma If Y is a supermartingale, then E [Yk | F0 ] ≤ Y0 for all fixed k ≥ 0.
  • 35. Martingales Lemma If Y is a supermartingale, then E [Yk | F0 ] ≤ Y0 for all fixed k ≥ 0. Proof. E [Yk | F0 ] = E [E [Yk | Fk ] | F0 ] ≤ E [Yk−1 | F0 ] ≤ · · · ≤ E [Y0 | F0 ] = Y0 by induction on k
  • 36. Martingales Lemma If Y is a supermartingale, then E [Yk | F0 ] ≤ Y0 for all fixed k ≥ 0. Proof. E [Yk | F0 ] = E [E [Yk | Fk ] | F0 ] ≤ E [Yk−1 | F0 ] ≤ · · · ≤ E [Y0 | F0 ] = Y0 by induction on k Example Where is the process Y in the previous example after k steps? Y0 = Z0 ≥ E [Zk | F0 ] = E [Yk + kε0 | F0 ] Hence, E [Yk | F0 ] ≤ Y0 − ε0 k, which is not surprising...
  • 37. Part 2 - Additive Drift
  • 38. Additive Drift ε0 a=0 Yk b (C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0 (C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0
  • 39. Additive Drift ε0 a=0 Yk b (C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0 (C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0 Theorem ([7, 9, 10]) Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R. Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞. If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 ≤ b/ε0 . If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 .
  • 40. Obtaining Supermartingales from Drift Conditions (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 Yk not necessarily a supermartingale, because (C1) assumes Yk > a
  • 41. Obtaining Supermartingales from Drift Conditions Definition (Stopped Process) Let Y be a stochastic process and τ a stopping time. Yk if k < τ Yk∧τ := Yτ if k ≥ τ (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 Yk not necessarily a supermartingale, because (C1) assumes Yk > a But the “stopped process” Yk∧τa is a supermartingale, so ∀k Y0 ≥ E [Yk∧τa | F0 ] ∀k Y0 ≥ E [Yk∧τa + (k ∧ τa )ε0 | F0 ]
  • 42. Dominated Convergence Theorem Theorem Suppose Xk is a sequence of random variables such that for each outcome in the sample space lim Xk = X. k→∞ Let Y ≥ 0 be a random variable with E [Y ] < ∞ such that for each outcome in the sample space, and for each k |Xk | ≤ Y. Then it holds lim E [Xk ] = E lim Xk = E [X] k→∞ k→∞
  • 43. Proof of Additive Drift Theorem (C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0 (C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0 Theorem Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞. If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 . If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 .
  • 44. Proof of Additive Drift Theorem (C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0 (C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0 Theorem Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞. If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 . If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 . Proof. By (C1+), Zk := Yk∧τ + ε0 (k ∧ τ ) is a super-martingale, so Y0 = E [Z0 | F0 ] ≥ E [Zk | F0 ] ∀k. Since Yk is bounded to [0, b], and τ has finite expectation, the dominated convergence theorem applies and Y0 ≥ lim E [Zk | F0 ] = E [Yτ + ε0 τ | F0 ] = ε0 E [τ | F0 ] . k→∞
  • 45. Proof of Additive Drift Theorem (C1+) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −ε0 (C1−) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≥ −ε0 Theorem Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞. If (C1+) holds for an ε0 > 0, then E [τ | F0 ] ≤ Y0 /ε0 . If (C1−) holds for an ε0 > 0, then E [τ | F0 ] ≥ Y0 /ε0 . Proof. By (C1−), Zk := Yk∧τ + ε0 (k ∧ τ ) is a sub-martingale, so Y0 = E [Z0 | F0 ] ≤E [Zk | F0 ] ∀k. Since Yk is bounded to [0, b], and τ has finite expectation, the dominated convergence theorem applies and Y0 ≤ lim E [Zk | F0 ] = E [Yτ + ε0 τ | F0 ] = ε0 E [τ | F0 ] . k→∞
  • 46. Examples: (1+1) EA 1 (1+1) EA 1: Sample x(0) uniformly at random from {0, 1}n . 2: for k = 0, 1, 2, . . . do 3: Set y := x(k) , and flip each bit of y with probability 1/n. 4: y if f (y) ≥ f (x(k) ) x(k+1) := x(k) otherwise. 5: end for
  • 47. Law of Total Probability E [X] = Pr (E) E [X | E] + Pr E E X | E
  • 48. Example 1: (1+1) EA on LeadingOnes n i Lo(x) := xj i=1 j=1 Leading 1-bits Remaining bits x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ . Left-most 0-bit
  • 49. Example 1: (1+1) EA on LeadingOnes n i Lo(x) := xj i=1 j=1 Leading 1-bits Remaining bits x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ . Left-most 0-bit Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0.
  • 50. Example 1: (1+1) EA on LeadingOnes n i Lo(x) := xj i=1 j=1 Leading 1-bits Remaining bits x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ . Left-most 0-bit Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0. Let E be the event that only the left-most 0-bit flipped in y.
  • 51. Example 1: (1+1) EA on LeadingOnes n i Lo(x) := xj i=1 j=1 Leading 1-bits Remaining bits x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ . Left-most 0-bit Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0. Let E be the event that only the left-most 0-bit flipped in y. The sequence Yk is non-increasing, so E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ (−1) Pr (E | Yk > 0 ∧ Fk ) = (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en.
  • 52. Example 1: (1+1) EA on LeadingOnes n i Lo(x) := xj i=1 j=1 Leading 1-bits Remaining bits x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ . Left-most 0-bit Let Yk := n − Lo(x(k) ) be the “remaining” bits in step k ≥ 0. Let E be the event that only the left-most 0-bit flipped in y. The sequence Yk is non-increasing, so E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ (−1) Pr (E | Yk > 0 ∧ Fk ) = (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en. By the additive drift theorem, E [τ | F0 ] ≤ enY0 ≤ en2 .
  • 53. Example 2: (1+1) EA on Linear Functions Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define f (x) := w1 x1 + w2 x2 + · · · + wn xn
  • 54. Example 2: (1+1) EA on Linear Functions Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define f (x) := w1 x1 + w2 x2 + · · · + wn xn Let Yk be the function value that “remains” at time k, ie n n n (k) (k) Yk := wi − wi x i = wi 1 − x i . i=1 i=1 i=1
  • 55. Example 2: (1+1) EA on Linear Functions Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define f (x) := w1 x1 + w2 x2 + · · · + wn xn Let Yk be the function value that “remains” at time k, ie n n n (k) (k) Yk := wi − wi x i = wi 1 − x i . i=1 i=1 i=1 Let Ei be the event that only bit i flipped in y, then E [Yk+1 − Yk | Ei ∧ Fk ] (k) ≤ wi x i −1
  • 56. Example 2: (1+1) EA on Linear Functions Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define f (x) := w1 x1 + w2 x2 + · · · + wn xn Let Yk be the function value that “remains” at time k, ie n n n (k) (k) Yk := wi − wi x i = wi 1 − x i . i=1 i=1 i=1 Let Ei be the event that only bit i flipped in y, then n E [Yk+1 − Yk | Fk ] ≤ Pr (Ei | Fk )E [Yk+1 − Yk | Ei ∧ Fk ] i=1 n−1 n 1 1 (k) Yk wmin ≤ 1− wi x i −1 ≤− ≤− n n en en i=1
  • 57. Example 2: (1+1) EA on Linear Functions Given some constants w1 , . . . , wn ∈ [wmin , wmax ], define f (x) := w1 x1 + w2 x2 + · · · + wn xn Let Yk be the function value that “remains” at time k, ie n n n (k) (k) Yk := wi − wi x i = wi 1 − x i . i=1 i=1 i=1 Let Ei be the event that only bit i flipped in y, then n E [Yk+1 − Yk | Fk ] ≤ Pr (Ei | Fk )E [Yk+1 − Yk | Ei ∧ Fk ] i=1 n−1 n 1 1 (k) Yk wmin ≤ 1− wi x i −1 ≤− ≤− n n en en i=1 By the additive drift theorem, E [τ | F0 ] ≤ en2 (wmax /wmin ).
  • 58. Remarks on Example Applications Example 1: (1+1) EA on LeadingOnes The upper bound en2 is very accurate. The exact expression is c(n)n2 , where c(n) → (e − 1)/2 [20]. Example 2: (1+1) EA on Linear Functions The upper bound en2 (wmax /wmin ) is correct, but very loose. The linear function BinVal has (wmax /wmin ) = 2n−1 . The tightest known bound is en log(n) + O(n) [22]. =⇒ A poor choice of distance function gives an imprecise bound!
  • 59. What is a good distance function? Theorem ([8]) Assume Y is a homogeneous Markov chain, and τ the time to absorption. Then the function g(x) := E [τ | Y0 = x], satisfies g(x) = 0 if x is an absorbing state E [g(Yk+1 ) − g(Yk ) | Fk ] = −1 otherwise. Distance function g gives exact expected runtime! But g requires complete knowledge of the expected runtime! Still provides insight into what is a good distance function: a good approximation (or guess) for the remaining runtime
  • 60. Part 3 - Variable Drift
  • 61. Drift may be Position-Dependant Constant Drift Variable Drift
  • 62. Drift may be Position-Dependant Idea: Find a function g : R → R st. the transformed stochastic process g(X1 ), g(X2 ), g(X3 ), . . . has constant drift. Variable Drift
  • 63. Drift may be Position-Dependant Idea: Find a function g : R → R st. the transformed stochastic process g(X1 ), g(X2 ), g(X3 ), . . . has constant drift. Constant Drift
  • 64. Jensen’s Inequality Theorem If g : R → R concave, then E [g(X) | F ] ≤ g(E [X | F ]). If g (x) < 0, then g is concave.
  • 65. δYk Multiplicative Drift a Yk b (M) ∀k E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −δYk
  • 66. δYk Multiplicative Drift a Yk b (M) ∀k E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −δYk Theorem ([2, 4]) Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0 Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞. If (M) holds for a δ > 0, then E [τa | F0 ] ≤ ln(Y0 /a)/δ.
  • 67. δYk Multiplicative Drift a Yk b (M) ∀k E [Yk+1 | Yk > a ∧ Fk ] ≤ (1 − δ)Yk Theorem ([2, 4]) Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0 Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞. If (M) holds for a δ > 0, then E [τa | F0 ] ≤ ln(Y0 /a)/δ. Proof. g(s) := ln(s/a) is concave, so by Jensen’s inequality E [g(Yk+1 ) − g(Yk ) | Yk > a ∧ Fk ] ≤ ln(E [Yk+1 | Yk > a ∧ Fk ]) − ln(Yk ) ≤ ln(1 − δ) ≤ −δ.
  • 68. Example: Linear Functions Revisited For any c ∈ (0, 1), define the distance at time k as n (k) Yk := cwmin + wi 1 − x i i=1 We have already seen that n 1 (k) E [Yk+1 − Yk | Fk ] ≤ wi x i −1 en i=1 Yk − cwmin Yk (1 − c) =− ≤− en en 1−c By the multiplicative drift theorem (a := cwmin and δ := en ) en nwmax E [τa | F0 ] ≤ ln 1 + 1−c cwmin
  • 69. h(Yk ) Variable Drift Theorem a Yk b (V) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −h(Yk ) Theorem ([15, 11]) Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0. Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞. If there exists a function h : R → R such that h(x) > 0 and h (x) > 0 for all x ∈ [a, b], and drift condition (V) holds, then Y0 1 E [τa | F0 ] ≤ dz a h(z)
  • 70. h(Yk ) Variable Drift Theorem a Yk b (V) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ −h(Yk ) Theorem ([15, 11]) Given a sequence (Yk , Fk ) over an interval [a, b] ⊂ R, a > 0. Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0 ] < ∞. If there exists a function h : R → R such that h(x) > 0 and h (x) > 0 for all x ∈ [a, b], and drift condition (V) holds, then Y0 1 E [τa | F0 ] ≤ dz a h(z) =⇒ The multiplicative drift theorem is the special case h(x) = δx.
  • 71. Variable Drift Theorem: Proof x 1 g(x) := dz a h(z) 1 h(x) Yk Proof. E [g(Yk ) − g(Yk+1 ) | Fk ]
  • 72. Variable Drift Theorem: Proof x 1 g(x) := dz a h(z) 1 h(x) E [Yk+1 | Fk ] Yk Proof. The function g is concave (g < 0), so by Jensen’s inequality E [g(Yk ) − g(Yk+1 ) | Fk ] ≥ g(Yk ) − g(E [Yk+1 | Fk ])
  • 73. Variable Drift Theorem: Proof x 1 g(x) := dz a h(z) 1 h(x) E [Yk+1 | Fk ] Yk h(Yk ) Proof. E [g(Yk ) − g(Yk+1 ) | Fk ] ≥ g(Yk ) − g(E [Yk+1 | Fk ]) Yk 1 ≥ dz Yk −h(Yk ) h(z)
  • 74. Variable Drift Theorem: Proof x 1 g(x) := dz a h(z) 1 1 h(x) h(Yk ) E [Yk+1 | Fk ] Yk h(Yk ) Proof. E [g(Yk ) − g(Yk+1 ) | Fk ] ≥ g(Yk ) − g(E [Yk+1 | Fk ]) Yk 1 ≥ dz ≥ 1 Yk −h(Yk ) h(z)
  • 75. Part 4 - Supermartingale
  • 76. Supermartingale a=0 Yk b (S1) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ 0 (S2) ∀k Var [Yk+1 | Yk > 0 ∧ Fk ] ≥ σ 2
  • 77. Supermartingale a=0 Yk b (S1) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ 0 (S2) ∀k Var [Yk+1 | Yk > 0 ∧ Fk ] ≥ σ 2 Theorem (See eg. [16]) Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R. Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞. Y0 (2b−Y0 ) If (S1) and (S2) hold for σ > 0, then E [τ | F0 ] ≤ σ2
  • 78. Supermartingale a=0 Yk b (S1) ∀k E [Yk+1 − Yk | Yk > 0 ∧ Fk ] ≤ 0 (S2) ∀k Var [Yk+1 | Yk > 0 ∧ Fk ] ≥ σ 2 Theorem (See eg. [16]) Given a sequence (Yk , Fk ) over an interval [0, b] ⊂ R. Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0 ] < ∞. Y0 (2b−Y0 ) If (S1) and (S2) hold for σ > 0, then E [τ | F0 ] ≤ σ2 Proof. Let Zk := b2 − (b − Yk )2 , and note that b − Yk ≤ E [b − Yk+1 | Fk ]. E [Zk+1 − Zk | Fk ] = −E (b − Yk+1 )2 | Fk + (b − Yk )2 ≤ −E (b − Yk+1 )2 | Fk + E [b − Yk+1 | Fk ]2 = −Var [b − Yk+1 ] = −Var [Yk+1 | Fk ] ≤ −σ 2
  • 79. Part 5 - Hajek’s Theorem
  • 80. Hajek’s Theorem4 Theorem If there exist λ, ε0 > 0 and D < ∞ such that for all k ≥ 0 (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D then for any δ ∈ (0, 1) (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a−B(1−δ)ε0 ) BD (*) Pr (τb < B | Y0 < a) ≤ (1−δ)ηε0 · eη(a−b) for some η ≥ min{λ, δε0 λ2 /D} > 0. 4 The theorem presented here is a corollary to Theorem 2.3 in [6].
  • 81. Hajek’s Theorem4 Theorem If there exist λ, ε0 > 0 and D < ∞ such that for all k ≥ 0 (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D then for any δ ∈ (0, 1) (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a−B(1−δ)ε0 ) BD (*) Pr (τb < B | Y0 < a) ≤ (1−δ)ηε0 · eη(a−b) for some η ≥ min{λ, δε0 λ2 /D} > 0. If λ, ε0 , D ∈ O(1) and b − a ∈ Ω(n), then there exists a constant c > 0 such that Pr (τb ≤ ecn | F0 ) ≤ e−Ω(n) 4 The theorem presented here is a corollary to Theorem 2.3 in [6].
  • 82. Stochastic Dominance - (|Yk+1 − Yk | | Fk ) Z Definition Y Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R 1.0 0.8 0.6 0.4 0.2 3 2 1 1 2 3 4
  • 83. Stochastic Dominance - (|Yk+1 − Yk | | Fk ) Z Definition Y Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R 1.0 0.8 0.6 0.4 0.2 3 2 1 1 2 3 4 Example 1. If Y ≤ Z, then Y Z.
  • 84. Stochastic Dominance - (|Yk+1 − Yk | | Fk ) Z Definition Y Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R 1.0 0.8 0.6 0.4 0.2 3 2 1 1 2 3 4 Example 1. If Y ≤ Z, then Y Z. 2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗ ). Then |V (Xk+1 ) − V (Xk )| d(Xk+1 , Xk ) Xk+1 V (Xk+1 ) d(Xk , Xk+1 ) x∗ V (Xk ) Xk
  • 85. Condition (C2) implies that “long jumps” must be rare Assume that (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D Then for any j ≥ 0, Pr (|Yk+1 − Yk | ≥ j) = Pr eλ|Yk+1 −Yk | ≥ eλj ≤ E eλ|Yk+1 −Yk | e−λj ≤ E eλZ e−λj = De−λj . Markov’s inequality If X ≥ 0, then Pr (X ≥ k) ≤ E [X] /k.
  • 86. Moment Generating Function (mgf) E eλZ Definition The mgf of a rv X is MX (λ) := E eλX for all λ ∈ R. (n) The n-th derivative at t = 0 is MX (0) = E [X n ], hence MX provides all moments of X, thus the name. If X and Y are independent rv. and a, b ∈ R, then MaX+bY (t) = E et(aX+bX) = E etaX E etbX = MX (at)MY (bt)
  • 87. Moment Generating Function (mgf) E eλZ Definition The mgf of a rv X is MX (λ) := E eλX for all λ ∈ R. (n) The n-th derivative at t = 0 is MX (0) = E [X n ], hence MX provides all moments of X, thus the name. If X and Y are independent rv. and a, b ∈ R, then MaX+bY (t) = E et(aX+bX) = E etaX E etbX = MX (at)MY (bt) Example Let X := n Xi where Xi are independent rvs with i=1 Pr (Xi = 1) = p and Pr (Xi = 0) = 1 − p. Then MXi (λ) = (1 − p)eλ·0 + peλ·1 MX (λ) = MX1 (λ)MX2 (λ) · · · MXn (λ) = (1 − p + peλ )n .
  • 88. Moment Generating Functions Distribution mgf Bernoulli Pr (X = 1) = p 1 − p + pet Binomial X ∼ Bin(n, p) (1 − p + pet )n pet Geometric Pr (X = k) = (1 − p)k−1 p 1−(1−p)et etb −eta Uniform X ∼ U (a, b) t(b−a) Normal X ∼ N (µ, σ 2 ) exp(tµ + 1 σ 2 t2 ) 2
  • 89. Moment Generating Functions Distribution mgf Bernoulli Pr (X = 1) = p 1 − p + pet Binomial X ∼ Bin(n, p) (1 − p + pet )n pet Geometric Pr (X = k) = (1 − p)k−1 p 1−(1−p)et etb −eta Uniform X ∼ U (a, b) t(b−a) Normal X ∼ N (µ, σ 2 ) exp(tµ + 1 σ 2 t2 ) 2 The mgf. of X ∼ Bin(n, p) at t = ln(2) is (1 − p + pet )n = (1 + p)n ≤ epn .
  • 90. Condition (C2) often holds trivially Example ((1+1) EA) Choose x uniformly from {0, 1}n for k = 0, 1, 2, ... Set x := x(k) , and flip each bit of x with probability p. If f (x ) ≥ f (x(k) ), then x(k+1) := x else x(k+1) := x(k)
  • 91. Condition (C2) often holds trivially Example ((1+1) EA) Choose x uniformly from {0, 1}n for k = 0, 1, 2, ... Set x := x(k) , and flip each bit of x with probability p. If f (x ) ≥ f (x(k) ), then x(k+1) := x else x(k+1) := x(k) Assume Fitness function f has unique maximum x∗ ∈ {0, 1}n . Distance function is g(x) = H(x, x∗ ) Then |g(x(k+1) ) − g(x(k) )| Z where Z := H(x(k) , x ) Z ∼ Bin(n, p) so E eλZ ≤ enp for λ = ln(2)
  • 92. Simple Application (1+1) EA on Needle (1+1) EA with mutation rate p = 1/n on n Yk := H(x(k) , 0n ) Needle(x) := xi a := (3/4)n i=1 b := n Condition (C2) satisfied5 with D = E eλZ ≤ e where λ = ln(2). Condition (C1) satisfied for ε0 := 1/2 because E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ (n − a)p − ap = −ε0 . Thus, η ≥ min{λ, δε0 λ2 /D} > 1/25 when δ = 1/2 and Pr (τa > n + k | F0 ) ≤ e(1/25)(Y0 −a−(n+k)(1−δ)ε0 ) ≤ e−k/100 Pr τb < en/200 | F0 = e−Ω(n) 5 See previous slide.
  • 93. Proof overview Theorem (2.3 in [6]) Assume that there exists 0 < ρ < 1 and D ≥ 1 such that (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa Then (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ). (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ). (*) Pr (τb < B | Y0 < a) ≤ eη(a−b) BD/(1 − ρ) (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk Lemma Assume that there exists a ε0 > 0 such that (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D < ∞ for a λ > 0. then (D1) and (D2) hold for some η and ρ < 1
  • 94. Theorem (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D Assume that (D1) and (D2) hold. Then (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ). Proof.
  • 95. Theorem (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa Assume that (D1) and (D2) hold. Then (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ). Proof. By the law of total probability, and the conditions (D1) and (D2) E eηYk+1 | Fk ≤ ρeηYk + Deηa (1)
  • 96. Theorem (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa Assume that (D1) and (D2) hold. Then (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ). Proof. By the law of total probability, and the conditions (D1) and (D2) E eηYk+1 | Fk ≤ ρeηYk + Deηa (1) By the law of total expectation, E eηYk+1 | F0 = E E eηYk+1 | Fk | F0
  • 97. Theorem (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa Assume that (D1) and (D2) hold. Then (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ). Proof. By the law of total probability, and the conditions (D1) and (D2) E eηYk+1 | Fk ≤ ρeηYk + Deηa (1) By the law of total expectation, Ineq. (1), E eηYk+1 | F0 = E E eηYk+1 | Fk | F0 ≤ ρE eηYk | F0 + Deηa
  • 98. Theorem (D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk (D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa Assume that (D1) and (D2) hold. Then (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ). Proof. By the law of total probability, and the conditions (D1) and (D2) E eηYk+1 | Fk ≤ ρeηYk + Deηa (1) By the law of total expectation, Ineq. (1), and induction on k E eηYk+1 | F0 = E E eηYk+1 | Fk | F0 ≤ ρE eηYk | F0 + Deηa ≤ ρk eηY0 + (1 + ρ + ρ2 + · · · + ρk−1 )Deηa .
  • 99. Proof of (2.8) Theorem (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D Assume that (D1) and (D2) hold. Then (2.6) E eηYk+1 | F0 ≤ ρk eηY0 + Deηa (1 − ρk )/(1 − ρ). (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ). Proof. (2.8) follows from Markov’s inequality and (2.6) Pr (Yk+1 ≥ b | F0 ) = Pr eηYk+1 ≥ eηb | F0 ≤ E eηYk+1 | F0 e−ηb
  • 100. Proof of (*) Theorem (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D Assume that (D1) and (D2) hold for D ≥ 1. Then (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ). (*) Pr (τb < B | Y0 < a) ≤ eη(a−b) BD/(1 − ρ)
  • 101. Proof of (*) Theorem (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D Assume that (D1) and (D2) hold for D ≥ 1. Then (2.8) Pr (Yk ≥ b | F0 ) ≤ ρk eη(Y0 −b) + Deη(a−b) (1 − ρk )/(1 − ρ). (*) Pr (τb < B | Y0 < a) ≤ eη(a−b) BD/(1 − ρ) Proof. By the union bound and (2.8) B Pr (τb < B | Y0 < a ∧ F0 ) ≤ Pr (Yk ≥ b | Y0 < a ∧ F0 ) k=1 B 1 − ρk BDeη(a−b) ≤ Deη(a−b) ρk + ≤ 1−ρ 1−ρ k=1
  • 102. Union Bound Ω E1 E2 E4 E3 Pr (E1 ∨ E2 ∨ · · · ∨ Ek ) ≤ Pr (E1 ) + Pr (E2 ) + · · · + Pr (Ek )
  • 103. Proof of (2.9) Theorem (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ Assume that (D1) hold. Then (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk
  • 104. Proof of (2.9) Theorem (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk Assume that (D1) hold. Then (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk Proof. By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0 (2)
  • 105. Proof of (2.9) Theorem (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk Assume that (D1) hold. Then (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk Proof. By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0 (2) By (2) and the law of total probability eηY0 ≥ Pr (τa > k | F0 ) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0
  • 106. Proof of (2.9) Theorem (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk Assume that (D1) hold. Then (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk Proof. By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0 (2) By (2) and the law of total probability eηY0 ≥ Pr (τa > k | F0 ) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0 = Pr (τa > k | F0 ) E eηYk ρ−k | τa > k ∧ F0
  • 107. Proof of (2.9) Theorem (D1) E eηYk+1 ρ−1 | Yk > a ∧ Fk ≤ eηYk Assume that (D1) hold. Then (2.9) Pr (τa > k | F0 ) ≤ eη(Y0 −a) ρk Proof. By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so eηY0 = Z0 ≥ E [Zk | F0 ] = E eηYk∧τ ρ−k∧τ | F0 (2) By (2) and the law of total probability eηY0 ≥ Pr (τa > k | F0 ) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0 = Pr (τa > k | F0 ) E eηYk ρ−k | τa > k ∧ F0 ≥ Pr (τa > k | F0 ) eηa ρ−k
  • 108. (C1) and (C2) =⇒ (D1) (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D < ∞ for a λ > 0. (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ Lemma Assume (C1) and (C2). Then (D1) holds when ρ ≥ 1 − ηε0 + η 2 c, k−2 and 0 < η ≤ min{λ, ε0 /c} where c := ∞ λ k! E Z k . k=2 Proof. Let X := (Yk+1 − Yk | Yk > a ∧ Fk ). By (C2) it holds, |X| Z, so E X k ≤ E |X|k ≤ E Z k . From ex = ∞ xk /(k!) and linearity of expectation k=0 ∞ ηX ηk 0<E e = 1 + ηE [X] + E X k ≤ ρ. k! k=2
  • 109. (C2) =⇒ (D2) (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D < ∞ for a λ > 0. (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D Theorem Assume (C2) and 0 < η ≤ λ. Then (D2) holds. Proof. If Yk ≤ a then Yk+1 − a ≤ Yk+1 − Yk ≤ |Yk+1 − Yk |, so E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ E eλ|Yk+1 −Yk | | Yk ≤ a ∧ Fk Furthermore, by (C2) E eλ|Yk+1 −Yk | | Yk ≤ a ∧ Fk ≤ E eλZ = D.
  • 110. (C1) and (C2) =⇒ (D1) and (D2) (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D < ∞ for a λ > 0. (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D Lemma Assume (C1) and (C2). Then (D1) and (D2) hold when ρ ≥ 1 − ηε0 + η 2 c and 0 < η ≤ min{λ, ε0 /c} ∞ λk−2 where c := k=2 k! E Z k = (D − 1 − λE [Z])λ−2 > 0. Corollary Assume (C1), (C2) and 0 < δ < 1. Then η := min{λ, δε0 /c} ρ := 1 − (1 − δ)ηε0
  • 111. (C1) and (C2) =⇒ (D1) and (D2) (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D < ∞ for a λ > 0. (D1) E eη(Yk+1 −Yk ) | Yk > a ∧ Fk ≤ ρ (D2) E eη(Yk+1 −a) | Yk ≤ a ∧ Fk ≤ D Lemma Assume (C1) and (C2). Then (D1) and (D2) hold when ρ ≥ 1 − ηε0 + η 2 c and 0 < η ≤ min{λ, ε0 /c} ∞ λk−2 where c := k=2 k! E Z k = (D − 1 − λE [Z])λ−2 > 0. Corollary Assume (C1), (C2) and 0 < δ < 1. Then (D1) and (D2) hold for η := min{λ, δε0 /c} =⇒ δε0 ≥ ηc ρ := 1 − (1 − δ)ηε0 = 1 − ηε0 + ηδε0 ≥ 1 − ηε + η 2 c
  • 112. Reformulation of Hajek’s Theorem Theorem If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0 (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D then for any δ ∈ (0, 1) (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a) ρB BD (*) Pr (τb < B | Y0 < a) ≤ (1−ρ) · eη(a−b) where η := min{λ, δε0 /c} and ρ := 1 − (1 − δ)ηε0
  • 113. Reformulation of Hajek’s Theorem Theorem If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0 (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D then for any δ ∈ (0, 1) (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a) ρB BD (*) Pr (τb < B | Y0 < a) ≤ (1−ρ) · eη(a−b) where η := min{λ, δε0 /c} and ρ := 1 − (1 − δ)ηε0 1. Note that ln(ρ) ≤ ρ − 1 = −(1 − δ)ηε0 so Pr (τa > B | F0 ) ≤ eη(Y0 −a) ρB = eη(Y0 −a) eB ln(ρ) ≤ eη(Y0 −a−B(1−δ)ε0 ) . 2. c = (D − 1 − λE [Z])λ−2 < D/λ2 so η ≥ min{λ, δε0 λ2 /D}.
  • 114. Reformulation of Hajek’s Theorem Theorem If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0 (C1) E [Yk+1 − Yk | Yk > a ∧ Fk ] ≤ −ε0 (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D then for any δ ∈ (0, 1) (2.9) Pr (τa > B | F0 ) ≤ eη(Y0 −a−B(1−δ)ε0 ) BD (*) Pr (τb < B | Y0 < a) ≤ (1−δ)ηε0 · eη(a−b) for some η ≥ min{λ, δε0 λ2 /D}.
  • 115. Simplified Drift Theorem [17] We have already seen that (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D implies Pr (|Yk+1 − Yk | ≥ j) ≤ De−λj for all j ∈ N0 . 6 See [17] for the exact statement.
  • 116. Simplified Drift Theorem [17] We have already seen that (C2) (|Yk+1 − Yk | | Fk ) Z and E eλZ = D implies Pr (|Yk+1 − Yk | ≥ j) ≤ De−λj for all j ∈ N0 . The simplified drift theorem replaces (C2) with (S) Pr (Yk+1 − Yk ≥ j | Yk < b) ≤ r(n)(1 + δ)−j for all j ∈ N0 . and with some additional assumptions, provides a bound of type6 Pr τb < 2c(b−a) ≤ 2−Ω(b−a) . (3) Until 2008, conditions (D1) and (D2) were used in EC. (D1) and (D2) can lead to highly tedious calculations. Oliveto and Witt were the first in EC to point out that the much simpler to verify (C1), along with (S) is sufficient. 6 See [17] for the exact statement.
  • 117. Part 6 - Population Drift
  • 118. Drift Analysis of Population-based Evolutionary Algorithms Evolutionary algorithms generally use populations. So far, we have analysed the drift of the (1+1) EA, ie an evolutionary algorithm with population size one. The state aggregation problem makes analysis of population-based EAs with classical drift theorems difficult: How to define an appropriate distance function? Should reflect the progress of the algorithm Often hard to define for single-individual algorithms Highly non-trivial for population-based algorithms =⇒ This part of the tutorial focuses on a drift theorem for populations which alleviates the state aggregation problem.
  • 119. Population-based Evolutionary Algorithms x Pt Require: , Finite set X , and initial population P0 ∈ X λ Selection mechanism psel : X λ × X → [0, 1] Variation operator pmut : X × X → [0, 1] for t = 0, 1, 2, . . . until termination condition do for i = 1 to λ do Sample i-th parent x according to psel (Pt , ·) Sample i-th offspring Pt+1 (i) according to pmut (x, ·) end for end for
  • 120. Selection and Variation - Example Tournament Selection wrt 10001110111011 Bitwise Mutation 10010110111001
  • 121. Population Drift Central Parameters Reproductive rate of selection mechanism psel α0 = max E [#offspring from parent j], 1≤j≤λ Random walk process corresponding to variation operator pmut Xk+1 ∼ pmut (Xk )
  • 122. Population Drift Central Parameters Reproductive rate of selection mechanism psel α0 = max E [#offspring from parent j], 1≤j≤λ Random walk process corresponding to variation operator pmut Xk+1 ∼ pmut (Xk )
  • 123. Population Drift Central Parameters Reproductive rate of selection mechanism psel α0 = max E [#offspring from parent j], 1≤j≤λ Random walk process corresponding to variation operator pmut Xk+1 ∼ pmut (Xk )
  • 124. Population Drift Central Parameters Reproductive rate of selection mechanism psel α0 = max E [#offspring from parent j], 1≤j≤λ Random walk process corresponding to variation operator pmut Xk+1 ∼ pmut (Xk )
  • 125. Population Drift [12] (C1P) ∀k E eκ(g(Xk+1 )−g(Xk )) | a < g(Xk ) < b < 1/α0 Theorem Define τb := min{k ≥ 0 | g(Pk (i)) > b for some i ∈ [λ]}. If there exists constants α0 ≥ 1 and κ > 0 such that psel has reproductive rate less than α0 the random walk process corresponding to pmut satisfies (C1P) and some other conditions hold,7 then for some constants c, c > 0 Pr τb ≤ ec(b−a) = e−c (b−a) 7 Some details are omitted. See Theorem 1 in [12] for all details.
  • 126. Population Drift: Decoupling Selection & Variation Population drift If there exists a κ > 0 such that M∆mut (κ) < 1/α0 where ∆mut = g(Xk+1 ) − g(Xk ) Xk+1 ∼ pmut (Xk ) and α0 = max E [#offspring from parent j], j then the runtime is exponential.
  • 127. Population Drift: Decoupling Selection & Variation Population drift Classical drift [6] If there exists a κ > 0 such that If there exists a κ > 0 such that M∆mut (κ) < 1/α0 M∆ (κ) < 1 where where ∆mut = g(Xk+1 ) − g(Xk ) ∆ = h(Pk+1 ) − h(Pk ), Xk+1 ∼ pmut (Xk ) and α0 = max E [#offspring from parent j], j then the runtime is exponential. then the runtime is exponential.
  • 128. Conclusion Drift analysis is a powerful tool for analysis of EAs Mainly used in EC to bound the expected runtime of EAs Useful when the EA has non-monotonic progress, eg. when the fitness value is a poor indicator of progress The “art” consists in finding a good distance function No simple receipe A large number of drift theorems are available Additive, multiplicative, variable, population drift... Significant related literature from other fields than EC Not the only tool in the toolbox, also Artificial fitness levels, Markov Chain theory, Concentration of measure, Branching processes, Martingale theory, Probability generating functions, ...
  • 129. Acknowledgements Thanks to David Hodge Carsten Witt, and Daniel Johannsen for insightful discussions.
  • 130. References I [1] Benjamin Doerr and Leslie Ann Goldberg. Drift analysis with tail bounds. In Proceedings of the 11th international conference on Parallel problem solving from nature: Part I, PPSN’10, pages 174–183, Berlin, Heidelberg, 2010. Springer-Verlag. [2] Benjamin Doerr, Daniel Johannsen, and Carola Winzen. Multiplicative drift analysis. In GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 1449–1456, New York, NY, USA, 2010. ACM. [3] Stefan Droste, Thomas Jansen, and Ingo Wegener. On the analysis of the (1+1) Evolutionary Algorithm. Theoretical Computer Science, 276:51–81, 2002. [4] Simon Fischer, Lars Olbrich, and Berthold V¨cking. o Approximating wardrop equilibria with finitely many agents. Distributed Computing, 21(2):129–139, 2008. [5] Oliver Giel and Ingo Wegener. Evolutionary algorithms and the maximum matching problem. In Proceedings of the 20th Annual Symposium on Theoretical Aspects of Computer Science (STACS 2003), pages 415–426, 2003.
  • 131. References II [6] Bruce Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied Probability, 14(3):502–525, 1982. [7] Jun He and Xin Yao. Drift analysis and average time complexity of evolutionary algorithms. Artificial Intelligence, 127(1):57–85, March 2001. [8] Jun He and Xin Yao. A study of drift analysis for estimating computation time of evolutionary algorithms. Natural Computing, 3(1):21–35, 2004. [9] Jens J¨gersk¨pper. a u Algorithmic analysis of a basic evolutionary algorithm for continuous optimization. Theoretical Computer Science, 379(3):329–347, 2007. [10] Jens J¨gersk¨pper. a u A Blend of Markov-Chain and Drift Analysis. In Proceedings of the 10th International Conference on Parallel Problem Solving from Nature (PPSN 2008), 2008.
  • 132. References III [11] Daniel Johannsen. Random combinatorial structures and randomized search heuristics. PhD thesis, Universit¨t des Saarlandes, 2010. a [12] Per Kristian Lehre. Negative drift in populations. In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume 6238 of LNCS, pages 244–253. Springer Berlin / Heidelberg, 2011. [13] Per Kristian Lehre and Carsten Witt. Black-box search by unbiased variation. Algorithmica, pages 1–20, 2012. [14] Sean P. Meyn and Richard L. Tweedie. Markov Chains and Stochastic Stability. Springer-Verlag, 1993. [15] B. Mitavskiy, J. E. Rowe, and C. Cannings. Theoretical analysis of local search strategies to optimize network communication subject to preserving the total number of links. International Journal of Intelligent Computing and Cybernetics, 2(2):243–284, 2009. [16] Frank Neumann, Dirk Sudholt, and Carsten Witt. Analysis of different mmas aco algorithms on unimodal functions and plateaus. Swarm Intelligence, 3(1):35–68, 2009.
  • 133. References IV [17] Pietro Oliveto and Carsten Witt. Simplified drift analysis for proving lower bounds inevolutionary computation. Algorithmica, pages 1–18, 2010. 10.1007/s00453-010-9387-z. [18] Pietro S. Oliveto and Carsten Witt. Simplified drift analysis for proving lower bounds in evolutionary computation. Technical Report Reihe CI, No. CI-247/08, SFB 531, Technische Universit¨t a Dortmund, Germany, 2008. [19] Galen H. Sasaki and Bruce Hajek. The time complexity of maximum matching by simulated annealing. Journal of the ACM, 35(2):387–403, 1988. [20] Dirk Sudholt. General lower bounds for the running time of evolutionary algorithms. In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume 6238 of LNCS, pages 124–133. Springer Berlin / Heidelberg, 2010. [21] David Williams. Probability with Martingales. Cambridge University Press, 1991.
  • 134. References V [22] Carsten Witt. Optimizing linear functions with randomized search heuristics - the robustness of mutation. In Christoph D¨rr and Thomas Wilke, editors, 29th International Symposium on u Theoretical Aspects of Computer Science (STACS 2012), volume 14 of Leibniz International Proceedings in Informatics (LIPIcs), pages 420–431, Dagstuhl, Germany, 2012. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.