SlideShare a Scribd company logo
1 of 27
Download to read offline
Josu Ceberio
Bayesian Analysis for
Algorithm Performance Comparison
Is it possible to compare optimization
algorithms without hypothesis testing?
Is there a reproducibility crisis?
Fuente: Monya Baker (2016) Is there a
reproducibility crisis? Nature, 533, 452-454
Hypothesis
Idea for solving a set
of problems more
efficiently.
Questions
Is my algorithm
better than the state-
of-the-art?
On which problems is
my algorithm better?
Why is my algorithm
better (or worse)?
Experimentation
Compare the performance
of my algorithm with the-
state-of-the-art on some
benchmark of problems.
The analysis of the results
should take into account
the associated
uncertainty.
Conclusions
What conclusions do we
draw from the
experimentation?
How do we answer to the
formulated questions?
Is there a reproducibility crisis?
The Questions
How likely is my proposal to
be the best algorithm to solve
a problem?
How likely is my proposal to
be the best algorithm from
the compared ones?
The Point
STATISTICAL ANALYSIS OF
EXPERIMENTAL RESULTS
NULL HYPOTHESIS
STATISTICAL TESTING
WHAT NHST COMPUTES
p(t(x) > ⌧|H0)<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2//wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH277NScOfAqcXNSRjmavv3V68c0jZgEKojWXddJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL55VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit>
Unknown Behaviour
Observed Sample
The controversy with NHST
The controversy with NHST
We assume the null hypothesis, the average
performance of the compared methods is the same.
Then, the observed difference is computed from data
and the probability of observing such a difference (or
bigger) is estimated: the p-value.
The p-value refers to the probability of erroneously
assuming that there are differences when actually
there are not. It is used to measure the magnitude of
difference, as it decreases when the difference
increases.
WHAT NHST COMPUTES
p(t(x) > ⌧|H0)<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2//wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH277NScOfAqcXNSRjmavv3V68c0jZgEKojWXddJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL55VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit>
1 p(t(x) > ⌧|H0) = p(t(x) < ⌧|H0)<latexit sha1_base64="ixOtl42DABu1QXwNHfHlqHttk6E=">AAACDXicbZC7SgNBFIZnvcZ4W7W0GYxCUhh2VdBCJWCTMoK5QLIss5PZZMjshZmzYoh5ARtfxcZCEVt7O9/GSbKIJv4w8POdczhzfi8WXIFlfRlz8wuLS8uZlezq2vrGprm1XVNRIimr0khEsuERxQQPWRU4CNaIJSOBJ1jd612N6vVbJhWPwhvox8wJSCfkPqcENHLNffswzkP+rnDZApLcl12rcIEn5PyHuGbOKlpj4VljpyaHUlVc87PVjmgSsBCoIEo1bSsGZ0AkcCrYMNtKFIsJ7ZEOa2obkoApZzC+ZogPNGljP5L6hYDH9PfEgARK9QNPdwYEumq6NoL/1ZoJ+GfOgIdxAiykk0V+IjBEeBQNbnPJKIi+NoRKrv+KaZdIQkEHmNUh2NMnz5raUdE+LlrXJ7mSncaRQbtoD+WRjU5RCZVRBVURRQ/oCb2gV+PReDbejPdJ65yRzuygPzI+vgFYSZkn</latexit>
WHAT WE WOULD LIKE TO KNOW
1 p(H0|x) = p(H1|x)<latexit sha1_base64="1JettnS1nfDHVeV06DeUX+AEQ8Y=">AAAB/HicbZDLSgMxFIYz9VbrbbRLN8Ei1IVlooJuhIKbLivYC7TDkEnTNjSTGZKMOIz1Vdy4UMStD+LOtzHTzkJbfwh8/OcczsnvR5wp7TjfVmFldW19o7hZ2tre2d2z9w/aKowloS0S8lB2fawoZ4K2NNOcdiNJceBz2vEnN1m9c0+lYqG400lE3QCPBBsygrWxPLuMTqNqw3MeH06uM0AGPLvi1JyZ4DKgHCogV9Ozv/qDkMQBFZpwrFQPOZF2Uyw1I5xOS/1Y0QiTCR7RnkGBA6rcdHb8FB4bZwCHoTRPaDhzf0+kOFAqCXzTGWA9Vou1zPyv1ov18MpNmYhiTQWZLxrGHOoQZknAAZOUaJ4YwEQycyskYywx0SavkgkBLX55GdpnNXRec24vKnWUx1EEh+AIVAECl6AOGqAJWoCABDyDV/BmPVkv1rv1MW8tWPlMGfyR9fkDE+OTDg==</latexit>
p(H0|x)<latexit sha1_base64="/MpXzWcP8EqakOTUlXIzz1ULR90=">AAAB73icbVDLSgNBEOz1GeMr6tHLYBDiJeyqoMeAlxwjmAckS5idzCZDZmfXmV4xxPyEFw+KePV3vPk3TpI9aGJBQ1HVTXdXkEhh0HW/nZXVtfWNzdxWfntnd2+/cHDYMHGqGa+zWMa6FVDDpVC8jgIlbyWa0yiQvBkMb6Z+84FrI2J1h6OE+xHtKxEKRtFKraRU7bpPj2fdQtEtuzOQZeJlpAgZat3CV6cXszTiCpmkxrQ9N0F/TDUKJvkk30kNTygb0j5vW6poxI0/nt07IadW6ZEw1rYUkpn6e2JMI2NGUWA7I4oDs+hNxf+8dorhtT8WKkmRKzZfFKaSYEymz5Oe0JyhHFlCmRb2VsIGVFOGNqK8DcFbfHmZNM7L3kXZvb0sVrwsjhwcwwmUwIMrqEAValAHBhKe4RXenHvnxXl3PuatK042cwR/4Hz+ABOrj0c=</latexit>
The Point
Unknown Behaviour
Observed Sample
Many alternatives to handle uncertainty
associated with empirical results:
6WDWLVWLFDO QDOVLV
+DQGERRN
$ &RPSUHKHQVL H +DQGERRN RI 6 D LV LFDO
&RQFHS V 7HFKQLT HV DQG 6RI DUH 7RROV
(GL LRQ
'U 0LFKDHO - GH 6PL K
WHAT NHST COMPUTES
p(t(x) > ⌧|H0)<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2//wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH277NScOfAqcXNSRjmavv3V68c0jZgEKojWXddJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL55VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit>
BAYESIAN STATISTICAL
ANALYSIS
The Point
STATISTICAL ANALYSIS OF
EXPERIMENTAL RESULTS
NULL HYPOTHESIS
STATISTICAL TESTING
Unknown Behaviour
Observed Sample
The Bayesian Approach
The method focuses on estimating relevant
information about the underlying performance
parametric distribution represented by a set of
parameters θ.
This method asses the distribution of θ
conditioned on a sample s drawn from the
performance distribution.
Instead of having a single probability distribution
to model the underlying performance, Bayesian
statistics considers all possible distributions
and assigns a probability to each.
P(✓|s) / P(s|✓)P(✓)<latexit sha1_base64="1oaUrufzQhQHrQgFYQ+vqg7duQg=">AAACEXicbVDLSsNAFJ34rPUVdelmsAjppiRV0GXRjcsI9gFtKJPppB06eTBzI5S0v+DGX3HjQhG37tz5N07bCNp6YOBwzr3cOcdPBFdg21/Gyura+sZmYau4vbO7t28eHDZUnErK6jQWsWz5RDHBI1YHDoK1EslI6AvW9IfXU795z6TicXQHo4R5IelHPOCUgJa6puVaHRgwIGNV7iQyTiDGrqXGc7GMf+xy1yzZFXsGvEycnJRQDrdrfnZ6MU1DFgEVRKm2YyfgZUQCp4JNip1UsYTQIemztqYRCZnyslmiCT7VSg8HsdQvAjxTf29kJFRqFPp6MiQwUIveVPzPa6cQXHoZj5IUWETnh4JUYB17Wg/ucckoiJEmhEqu/4rpgEhCQZdY1CU4i5GXSaNacc4q1dvzUu0qr6OAjtEJspCDLlAN3SAX1RFFD+gJvaBX49F4Nt6M9/noipHvHKE/MD6+ASzGnJY=</latexit>
Posterior distribution
of the parameters
Likelihood
function
Prior distribution
of the parameters
HOW DO WE COMPARE MULTIPLE
ALGORITHMS?
Minimizing some instances of a problemMinimizing a given instance of a problem
Algorithm f1
GA 100
PSO 90
ILP 135
SA 105
GP 95
.
.
.
.
.
.
From Results to Rankings
Observed Sample
σ1
3
1
5
4
2
.
.
.
Algorithm f2
GA 130
PSO 80
ILP 135
SA 30
GP 300
.
.
.
.
.
.
σ2
3
2
4
1
5
.
.
.
σ3
3
5
2
4
1
.
.
.
σ4
4
5
3
1
2
.
.
.
σ5
4
3
2
5
1
.
.
.
Algorithm f3
GA 37
PSO 352
ILP 19
SA 100
GP 10
.
.
.
.
.
.
Algorithm f4
GA 566
PSO 756
ILP 101
SA 56
GP 57
.
.
.
.
.
.
Algorithm f5
GA 256
PSO 125
ILP 89
SA 369
GP 36
.
.
.
.
.
.
rankings, permutations
● Each algorithm in the comparison has a weight associated.
● The weights sum up 1.
● The weight associated to an algorithm represents its probability to appear at first rank.
Plackett-luce Model
P( ) =
nY
i=1
w i
Pn
j=i w j
!
<latexit sha1_base64="l2ncjWDTg/lJpaxSOQNZ0W4MK+s=">AAACQXicbVBLSwMxGMzWd31VPXoJFqFeyq4KeikIXjxWsFro1iWbZtvYJLsk3ypl2b/mxX/gzbsXD4p49WL6OGjrQGAyMx9fMmEiuAHXfXEKc/MLi0vLK8XVtfWNzdLW9rWJU01Zg8Yi1s2QGCa4Yg3gIFgz0YzIULCbsH8+9G/umTY8VlcwSFhbkq7iEacErBSUmvWKb3hXkoOan+i4E2S85uW3CvuCRVDxI01o9hBk45B18zy3l1QG2V2ND4O/zDtrYl/zbg8OglLZrboj4FniTUgZTVAPSs9+J6apZAqoIMa0PDeBdkY0cCpYXvRTwxJC+6TLWpYqIplpZ6MGcrxvlQ6OYm2PAjxSf09kRBozkKFNSgI9M+0Nxf+8VgrRaTvjKkmBKTpeFKUCQ4yHdeIO14yCGFhCqOb2rZj2iC0NbOlFW4I3/eVZcn1Y9Y6q7uVx+ex4Uscy2kV7qII8dILO0AWqowai6BG9onf04Tw5b86n8zWOFpzJzA76A+f7BwPtslo=</latexit>
#1
#2
#3
#4
#4
Plackett-luce Model
w1 = 0.3<latexit sha1_base64="kzx8wZWjYtX8pfbSH0T899Osw8k=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbFvQiFLx4rGA/oF1KNs22oUl2SbJKWfojvHhQxKu/x5v/xrTdg7Y+GHi8N8PMvDDhTBvP+3YKG5tb2zvF3dLe/sHhUfn4pK3jVBHaIjGPVTfEmnImacsww2k3URSLkNNOOLmd+51HqjSL5YOZJjQQeCRZxAg2Vuo8Dfwbz60NyhXP9RZA68TPSQVyNAflr/4wJqmg0hCOte75XmKCDCvDCKezUj/VNMFkgke0Z6nEguogW5w7QxdWGaIoVrakQQv190SGhdZTEdpOgc1Yr3pz8T+vl5roOsiYTFJDJVkuilKOTIzmv6MhU5QYPrUEE8XsrYiMscLE2IRKNgR/9eV10q66fs2t3tcrjXoeRxHO4BwuwYcraMAdNKEFBCbwDK/w5iTOi/PufCxbC04+cwp/4Hz+ANiWjos=</latexit>
w4 = 0.6<latexit sha1_base64="ih+pLwy7ZdqSbp6UNhQ/Da/0pZk=">AAAB7nicbVBNSwMxEJ3Ur1q/qh69BIvgadmtRb0IBS8eK9gPaJeSTbNtaDa7JFmlLP0RXjwo4tXf481/Y9ruQVsfDDzem2FmXpAIro3rfqPC2vrG5lZxu7Szu7d/UD48auk4VZQ1aSxi1QmIZoJL1jTcCNZJFCNRIFg7GN/O/PYjU5rH8sFMEuZHZCh5yCkxVmo/9Ws3rnPZL1dcx50DrxIvJxXI0eiXv3qDmKYRk4YKonXXcxPjZ0QZTgWblnqpZgmhYzJkXUsliZj2s/m5U3xmlQEOY2VLGjxXf09kJNJ6EgW2MyJmpJe9mfif101NeO1nXCapYZIuFoWpwCbGs9/xgCtGjZhYQqji9lZMR0QRamxCJRuCt/zyKmlVHe/Cqd7XKvVaHkcRTuAUzsGDK6jDHTSgCRTG8Ayv8IYS9ILe0ceitYDymWP4A/T5A+G6jpE=</latexit>
w3 = 0.17<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszzWkvlhSHPqddf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bbGJYrju1kQOvEzUkFcrSG5a/BKCJJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d44ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CCNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit>
w2 = 0.03<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AAAB73icbVBNSwMxEJ31s9avqkcvwSJ4KrttQS9CwYvHCvYD2qVk09k2NJtdk6xSSv+EFw+KePXvePPfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNIItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit>
#1
#2
#3
#4
Plackett-luce Model
w1 = 0.3<latexit sha1_base64="kzx8wZWjYtX8pfbSH0T899Osw8k=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbFvQiFLx4rGA/oF1KNs22oUl2SbJKWfojvHhQxKu/x5v/xrTdg7Y+GHi8N8PMvDDhTBvP+3YKG5tb2zvF3dLe/sHhUfn4pK3jVBHaIjGPVTfEmnImacsww2k3URSLkNNOOLmd+51HqjSL5YOZJjQQeCRZxAg2Vuo8Dfwbz60NyhXP9RZA68TPSQVyNAflr/4wJqmg0hCOte75XmKCDCvDCKezUj/VNMFkgke0Z6nEguogW5w7QxdWGaIoVrakQQv190SGhdZTEdpOgc1Yr3pz8T+vl5roOsiYTFJDJVkuilKOTIzmv6MhU5QYPrUEE8XsrYiMscLE2IRKNgR/9eV10q66fs2t3tcrjXoeRxHO4BwuwYcraMAdNKEFBCbwDK/w5iTOi/PufCxbC04+cwp/4Hz+ANiWjos=</latexit>
w3 = 0.17<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszzWkvlhSHPqddf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bbGJYrju1kQOvEzUkFcrSG5a/BKCJJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d44ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CCNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit>
w2 = 0.03<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AAAB73icbVBNSwMxEJ31s9avqkcvwSJ4KrttQS9CwYvHCvYD2qVk09k2NJtdk6xSSv+EFw+KePXvePPfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNIItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit>
#1
#2
#3
#4
Plackett-luce Model
w3 = 0.17<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszzWkvlhSHPqddf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bbGJYrju1kQOvEzUkFcrSG5a/BKCJJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d44ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CCNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit>
w2 = 0.03<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AAAB73icbVBNSwMxEJ31s9avqkcvwSJ4KrttQS9CwYvHCvYD2qVk09k2NJtdk6xSSv+EFw+KePXvePPfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNIItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit>
#1
#2
#3
P( ) =
w4
w1 + w2 + w3 + w4
·
w1
w1 + w2 + w3
·
w3
w2 + w3
·
w2
w2<latexit sha1_base64="k2yXUvSJjQYl5+sp1WrWkD6O2tU=">AAACVnicbZHNS8MwGMbTzrk5v6oevRSHMBmMdhvoRRh48TjBfcBaSpqlW1jSliR1jNJ/Ui/6p3gR022C+3jhDQ+/Jy9JnvgxJUJa1pemFw6Kh6XyUeX45PTs3Li47Iso4Qj3UEQjPvShwJSEuCeJpHgYcwyZT/HAnz3l/uANc0Gi8FUuYuwyOAlJQBCUCnkG69YcQSYM3j06AYconXvtTC12fe41VbfqOXDQOJJ/vr3hb3qt3NvDm0ueeUbValjLMneFvRZVsK6uZ7w74wglDIcSUSjEyLZi6aaQS4IozipOInAM0QxO8EjJEDIs3HQZS2beKjI2g4irDqW5pP8nUsiEWDBf7WRQTsW2l8N93iiRwYObkjBOJA7R6qAgoaaMzDxjc0w4RpIulICIE3VXE02hSkKqn6ioEOztJ++KfrNhtxrWS7vaaa/jKINrcANqwAb3oAOeQRf0AAIf4FvTtYL2qf3oRb202qpr65krsFG68QuBprRa</latexit>
The bayesian model
Posterior distribution of the weights Likelihood of the sample
Prior distribution of the weights
NY
k=1
nY
i=1
0
@
w (k)
i
Pn
j=i w (k)
j
1
A
<latexit sha1_base64="382jpMOvUOBX2CNNv68sNU9hmgk=">AAACUHicbVFNaxsxFHzrNh910sRNj72ImoBzMbtNoL0EArnkFFKonYDXWbSydq1Y0i7S2xQj9ifmklt/Ry49tLRa24U27gOheTPzkDRKSykshuG3oPXi5cbm1var9s7u6739zpuDoS0qw/iAFbIwNym1XArNByhQ8pvScKpSya/T2XmjX99zY0Whv+C85GNFcy0ywSh6KunkcWmKSeJmp1F9e7lqRNNoEkueYS/ODGXua+JiK3JFb11vdlR7T13XnqpU4u5ORWNfs9x5C4mNyKd4lHS6YT9cFFkH0Qp0YVVXSecxnhSsUlwjk9TaURSWOHbUoGCS1+24srykbEZzPvJQU8Xt2C0CqcmhZyYkK4xfGsmC/XvCUWXtXKXeqShO7XOtIf+njSrMPo2d0GWFXLPlQVklCRakSZdMhOEM5dwDyozwdyVsSn2A6P+g7UOInj95HQw/9KPjfvj5pHt2sopjG97Be+hBBB/hDC7gCgbA4AGe4Af8DB6D78GvVrC0/tnhLfxTrfZvGq62uA==</latexit>
R = { (1)
, . . . , (N)
}<latexit sha1_base64="p6uONzgcyQDNNmoWvv+HTlxo17g=">AAACD3icbVBNS8NAEN3Ur1q/oh69LBalhVISLehFKHjxJFVsKzSxbDbbdulmE3Y3Qgn5B178K148KOLVqzf/jds2iLY+GHi8N8PMPC9iVCrL+jJyC4tLyyv51cLa+sbmlrm905JhLDBp4pCF4tZDkjDKSVNRxchtJAgKPEba3vB87LfviZA05DdqFBE3QH1OexQjpaWueXh95iSOpP0A3SUlu5xWHOaHSlZ+tMty6qRds2hVrQngPLEzUgQZGl3z0/FDHAeEK8yQlB3bipSbIKEoZiQtOLEkEcJD1CcdTTkKiHSTyT8pPNCKD3uh0MUVnKi/JxIUSDkKPN0ZIDWQs95Y/M/rxKp36iaUR7EiHE8X9WIGVQjH4UCfCoIVG2mCsKD6VogHSCCsdIQFHYI9+/I8aR1V7eOqdVUr1mtZHHmwB/ZBCdjgBNTBBWiAJsDgATyBF/BqPBrPxpvxPm3NGdnMLvgD4+Mb1wab2w==</latexit>
P(w|R) /<latexit sha1_base64="CzIyNBVIpLnUlZDF5eJdtnMe9Lw=">AAAB/3icbVDLSgMxFM3UV62vUcGNm2AR6qbMaEGXBTcuq9gHdIaSSTNtaGYSkoxSpl34K25cKOLW33Dn35hpZ6GtBwKHc+7lnpxAMKq043xbhZXVtfWN4mZpa3tnd8/eP2gpnkhMmpgzLjsBUoTRmDQ11Yx0hCQoChhpB6PrzG8/EKkoj+/1WBA/QoOYhhQjbaSefdSoeBHSwyBMH6eTuzNPSC4079llp+rMAJeJm5MyyNHo2V9en+MkIrHGDCnVdR2h/RRJTTEj05KXKCIQHqEB6Roao4goP53ln8JTo/RhyKV5sYYz9fdGiiKlxlFgJrOsatHLxP+8bqLDKz+lsUg0ifH8UJgwqDnMyoB9KgnWbGwIwpKarBAPkURYm8pKpgR38cvLpHVedS+qzm2tXK/ldRTBMTgBFeCCS1AHN6ABmgCDCXgGr+DNerJerHfrYz5asPKdQ/AH1ucPKnKWJw==</latexit>
1
B
nY
i=1
w↵i 1
i
<latexit sha1_base64="/gfyjh4UDNfus5EbeDuQVHLsAyw=">AAACE3icbVDLSsNAFJ34rPUVdelmsAgiWBIt6EYounFZwT6gScNkMmmHTiZhZqKUkH9w46+4caGIWzfu/BunbRbaeuDC4Zx7ufceP2FUKsv6NhYWl5ZXVktr5fWNza1tc2e3JeNUYNLEMYtFx0eSMMpJU1HFSCcRBEU+I21/eD322/dESBrzOzVKiBuhPqchxUhpyTOPnVAgnNl5dpU7iYgDL6OXdt7j8MGjvcxBLBkgj8ITO/fMilW1JoDzxC5IBRRoeOaXE8Q4jQhXmCEpu7aVKDdDQlHMSF52UkkShIeoT7qachQR6WaTn3J4qJUAhrHQxRWcqL8nMhRJOYp83RkhNZCz3lj8z+umKrxwM8qTVBGOp4vClEEVw3FAMKCCYMVGmiAsqL4V4gHSISkdY1mHYM++PE9ap1X7rGrd1ir1WhFHCeyDA3AEbHAO6uAGNEATYPAInsEreDOejBfj3fiYti4Yxcwe+APj8wfKaZ4G</latexit>
B =
Qn
i=1 (↵i)
(
Pn
i=1 ↵i)<latexit sha1_base64="lQ2UQ095A4jrK9whnNjihdhrbPg=">AAACL3icbVDLSgMxFM34rPU16tJNsAh1U2ZU0I1QFNRlBauFTh3upBkbmmSGJCOUoX/kxl/pRkQRt/6Faa1vDwQO55zLzT1Rypk2nvfgTExOTc/MFuaK8wuLS8vuyuqFTjJFaJ0kPFGNCDTlTNK6YYbTRqooiIjTy6h7NPQvb6jSLJHnppfSloBryWJGwFgpdI8PD4JYAcmDVCXtMGcHfv9K4uAEhIByADztQMi2+vmHojPxlfq0Q7fkVbwR8F/ij0kJjVEL3UHQTkgmqDSEg9ZN30tNKwdlGOG0XwwyTVMgXbimTUslCKpb+ejePt60ShvHibJPGjxSv0/kILTuicgmBZiO/u0Nxf+8Zmbi/VbOZJoZKsn7ojjj2CR4WB5uM0WJ4T1LgChm/4pJB2x9xlZctCX4v0/+Sy62K/5OxTvbLVV3x3UU0DraQGXkoz1URaeohuqIoFs0QI/oyblz7p1n5+U9OuGMZ9bQDzivb04RqSw=</latexit>
No way to sample posterior
distribution exactly à MCMC
Bayesian inference for algorithm
ranking analysis
Instance #1
Instance #m
Instance #2
Inst. #1
Inst. #m
Inst. #2
Alg1
w1
w2
wn
Alg2
Algn
Performance
Matrix
Weight
Vector
Sample
Run the
Algorithms
Rank the
Algorithms
Inst. #1
Inst. #m
Inst. #2
Alg1
Alg2
Algn
Ranking
Matrix
MCMC
Sampling
Query
Posterior
0.0
0.2
0.4
0.6
The Case of Study
23 FUNCTIONS TO OPTIMIZE:
• OneMax (F1) and W-model extensions (F4-F10)
• LeadingOnes (F2) and W-model extensions (F11-
F17)
• Harmonic (F3)
• LABS: Low Autocorrelation Binary Sequences (F18)
• Ising-Ring (F19)
• Ising-Torus (F20)
• Ising-Triangular (F21)
• MIVS: Maximum Independent Vertex Set (F22)
• NQP: N-Queens problem (F23)
n 2 {16, 64, 100, 625}<latexit sha1_base64="HS0JdBr8a6YmSKd4vVyu+TiOCPw=">AAAB/nicbVBNS8NAEJ3Ur1q/ouLJy2IRPJSS1Fr1VvDisYKthSaUzXbbLt1swu5GKKHgX/HiQRGv/g5v/hu3bQ7a+mDg8d4MM/OCmDOlHefbyq2srq1v5DcLW9s7u3v2/kFLRYkktEkiHsl2gBXlTNCmZprTdiwpDgNOH4LRzdR/eKRSsUjc63FM/RAPBOszgrWRuvaR8JjwUrdWqlVLruOUapULb9K1i07ZmQEtEzcjRcjQ6NpfXi8iSUiFJhwr1XGdWPsplpoRTicFL1E0xmSEB7RjqMAhVX46O3+CTo3SQ/1ImhIazdTfEykOlRqHgekMsR6qRW8q/ud1Et2/8lMm4kRTQeaL+glHOkLTLFCPSUo0HxuCiWTmVkSGWGKiTWIFE4K7+PIyaVXK7nm5clct1q+zOPJwDCdwBi5cQh1uoQFNIJDCM7zCm/VkvVjv1se8NWdlM4fwB9bnD6Ask0w=</latexit>
Problem Size:
11 Metaheuristic algorithms:
• greedy Hill Climber (gHC)
• Randomlized Local Search (RLS)
• (1+1) EA
• fast Genetic Algorithm (fGA)
• (1+10) EA
• (1+10) EAr/2,2r
• (1+10) EAnorm
• (1+10) EAvar
• (1+10) EAlog-n
• (1+(λ+λ)) GA
• “vanilla” GA (vGA)
Results of 11.132 runs are collected (23 x 4 x 11 x 11)
• Aggregation of performances across 11 instances.
• Median performance across 11 repetitions.
Estimate the probability of each algorithm being top-ranked
• as its expected weight in the posterior distribution of weights
Analyze the uncertainty about the probabilities
• By estimating the 90% credible intervals of the posterior distribution of weights (5% and 95%)
Inference analyses & results
QUALITATIVE SUMMARY
Similar perf. (1+(λ+λ)) GA, (1+1)-EA, (1+10)-EAvar, (1+10)-Ealog-n, (1+10)-Eanorm,(1+10)-EAr/2,2r and fGA.
Extreme perf. vGA and gHC.
Easily treated instances are F1-F6, F8, F11-F13 and F15-16.
Best solutions found for n=625
Inference analyses & results
Fixed-target perspective – Record Running-time
(1+( , )) GA
(1+1) EA
gHC
(1+10) EA_r/2,2r
(1+10) EA
(1+10) EA_log-n.
(1+10) EA_norm.
(1+1) EA_var.
fGA
vGA
RLS
0.0 0.2 0.4 0.6
Probability of winning
Algorithm
F17, n=625, φ=625 F19, n=100, φ=100
(1+( , )) GA
(1+1) EA
gHC
(1+10) EA_r/2,2r
(1+10) EA
(1+10) EA_log-n.
(1+10) EA_norm.
(1+1) EA_var.
fGA
vGA
RLS
0.0 0.1 0.2 0.3 0.4 0.5
Probability of winning
Algorithm
Credible Intervals
Only 11 samples to do inference à High uncertainty is expected!
The more samples, the lower the uncertainty à Credibility intervals are more tight!
Expected
probability
High
uncertainty
INTERPRETABILITY
Inference analyses & results
Fixed-target perspective – Record Running-time – Set of easy functions
(1+( , )) GA
(1+1) EA
gHC
(1+10) EA_r/2,2r
(1+10) EA
(1+10) EA_log-n.
(1+10) EA_norm.
(1+1) EA_var.
fGA
vGA
RLS
0.00 0.25 0.50 0.75 1.00
Probability of winning
Algorithm
n=625, all runs
(1+( , )) GA
(1+1) EA
gHC
(1+10) EA_r/2,2r
(1+10) EA
(1+10) EA_log-n.
(1+10) EA_norm.
(1+1) EA_var.
fGA
vGA
RLS
0.0 0.2 0.4 0.6
Probability of winning
Algorithm
n=625, median
Credible Intervals
Set of functions, two paths à (1) take all the runs, (2) take the median of the runs on each instance.
gHC is the best in both cases à with more samples the uncertainty is lower
Inference analyses & results
Fixed-target perspective – Record Running-time – Set of non-easy functions
Credible Intervals
Good estimations à credible intervals smaller than 0.05
Probabilities are similar à due to overlapping
Uncertainty about which is the best à but not due to
limitation of data, but due to equivalence in the
algorithms
(1+( , )) GA
(1+1) EA
gHC
(1+10) EA_r/2,2r
(1+10) EA
(1+10) EA_log-n.
(1+10) EA_norm.
(1+1) EA_var.
fGA
vGA
RLS
0.050 0.075 0.100 0.125 0.150
Probability of winning
Algorithm
n=625, all runs
Inference analyses & results
Fixed-budget perspective – Evolution winning probability - %90 credibility intervals
0.0
0.2
0.4
0.6
0 300 600 900
Budget
Winningprobability
(1+( , )) GA
(1+1) EA
gHC
(1+10) EA_r/2,2r
(1+10) EA
(1+10) EA_log-n.
(1+10) EA_norm.
(1+1) EA_var.
fGA
vGA
RLS
F21, n=100
gHC is the best, but probability decreases while the rest improve.
gHC becomes better, as the budget increases.
3 4 5 6 7 8 9 10 11
Algorithms ranked with average data
Wilcoxon test for pairwise comparisons, and
shaffer’s method for p-value correction.
BAYESIAN ANALYSIS
ESTIMATED PROBABILITY AND
NOTION OF UNCERTAINTY IN THE
FORM OF CREDIBLE INTERVAL
Inference analyses & results
Impact of the prior distribution – Comparison of three different priors
0.0
0.2
0.4
0.6
(1+(
,
))G
A
(1+1)EA
gH
C
(1+10)EA_r/2,2r
(1+10)EA
(1+10)EA_log-n.
(1+10)EA_norm
.
(1+1)EA_var.
fG
A
vG
A
R
LS
Algorithm
Winningprobability
Prior Unifor Empirical Deceptive
F9, n=100, φ=100
Empirical data favours the best
performing algorithms
Neligible effect (even when median
values are considered)
Discussion
Bayesian inference using Plackett-Luce for analysis of algorithms’ performance ranking
Include it in the practical EC performance comparison’ tool set à IOHProfiler
Strong points
Ability to handle multiple
algorithms
Interpretability
Exact description of the
uncertainty
WEAKNESSES
Aggregating performances into
rankings we loose information about
the magnitude of differences
Limitations of the Plackett-Luce model
à From n! to n parameters.
How do we deal with ties?
scmamp: Statistical Comparison of Multiple
Algorithms in Multiple Problems
Josu Ceberio
Bayesian Analysis for
Algorithm Performance Comparison
Thank you very much for your attention!

More Related Content

More from Facultad de Informática UCM

DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersFacultad de Informática UCM
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmFacultad de Informática UCM
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingFacultad de Informática UCM
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroFacultad de Informática UCM
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...Facultad de Informática UCM
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Facultad de Informática UCM
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFacultad de Informática UCM
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoFacultad de Informática UCM
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Facultad de Informática UCM
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Facultad de Informática UCM
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Facultad de Informática UCM
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windFacultad de Informática UCM
 
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Facultad de Informática UCM
 
Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...
Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...
Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...Facultad de Informática UCM
 

More from Facultad de Informática UCM (20)

DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
 
Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...
Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...
Discrete-Event Modeling and Simulation for Development of Embedded and Real-T...
 

Recently uploaded

EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxJoseeMusabyimana
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Projectreemakb03
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationMohsinKhanA
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxwendy cai
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical SensorTanvir Moin
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfRedhwan Qasem Shaddad
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesDIPIKA83
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Amil baba
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging systemgokuldongala
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...sahb78428
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsYusuf Yıldız
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderjuancarlos286641
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Apollo Techno Industries Pvt Ltd
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfJulia Kaye
 

Recently uploaded (20)

EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptx
 
Lecture 2 .pptx
Lecture 2                            .pptxLecture 2                            .pptx
Lecture 2 .pptx
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Project
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software Simulation
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptx
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical Sensor
 
Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdf
 
計劃趕得上變化
計劃趕得上變化計劃趕得上變化
計劃趕得上變化
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display Devices
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging system
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovations
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entender
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdfsdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
sdfsadopkjpiosufoiasdoifjasldkjfl a asldkjflaskdjflkjsdsdf
 

Bayesian Performance Analysis for Optimization Algorithm Comparison

  • 1. Josu Ceberio Bayesian Analysis for Algorithm Performance Comparison Is it possible to compare optimization algorithms without hypothesis testing?
  • 2. Is there a reproducibility crisis? Fuente: Monya Baker (2016) Is there a reproducibility crisis? Nature, 533, 452-454
  • 3. Hypothesis Idea for solving a set of problems more efficiently. Questions Is my algorithm better than the state- of-the-art? On which problems is my algorithm better? Why is my algorithm better (or worse)? Experimentation Compare the performance of my algorithm with the- state-of-the-art on some benchmark of problems. The analysis of the results should take into account the associated uncertainty. Conclusions What conclusions do we draw from the experimentation? How do we answer to the formulated questions? Is there a reproducibility crisis?
  • 4. The Questions How likely is my proposal to be the best algorithm to solve a problem? How likely is my proposal to be the best algorithm from the compared ones?
  • 5. The Point STATISTICAL ANALYSIS OF EXPERIMENTAL RESULTS NULL HYPOTHESIS STATISTICAL TESTING WHAT NHST COMPUTES p(t(x) > ⌧|H0)<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2//wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH277NScOfAqcXNSRjmavv3V68c0jZgEKojWXddJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL55VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit> Unknown Behaviour Observed Sample
  • 7. The controversy with NHST We assume the null hypothesis, the average performance of the compared methods is the same. Then, the observed difference is computed from data and the probability of observing such a difference (or bigger) is estimated: the p-value. The p-value refers to the probability of erroneously assuming that there are differences when actually there are not. It is used to measure the magnitude of difference, as it decreases when the difference increases. WHAT NHST COMPUTES p(t(x) > ⌧|H0)<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2//wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH277NScOfAqcXNSRjmavv3V68c0jZgEKojWXddJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL55VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit> 1 p(t(x) > ⌧|H0) = p(t(x) < ⌧|H0)<latexit sha1_base64="ixOtl42DABu1QXwNHfHlqHttk6E=">AAACDXicbZC7SgNBFIZnvcZ4W7W0GYxCUhh2VdBCJWCTMoK5QLIss5PZZMjshZmzYoh5ARtfxcZCEVt7O9/GSbKIJv4w8POdczhzfi8WXIFlfRlz8wuLS8uZlezq2vrGprm1XVNRIimr0khEsuERxQQPWRU4CNaIJSOBJ1jd612N6vVbJhWPwhvox8wJSCfkPqcENHLNffswzkP+rnDZApLcl12rcIEn5PyHuGbOKlpj4VljpyaHUlVc87PVjmgSsBCoIEo1bSsGZ0AkcCrYMNtKFIsJ7ZEOa2obkoApZzC+ZogPNGljP5L6hYDH9PfEgARK9QNPdwYEumq6NoL/1ZoJ+GfOgIdxAiykk0V+IjBEeBQNbnPJKIi+NoRKrv+KaZdIQkEHmNUh2NMnz5raUdE+LlrXJ7mSncaRQbtoD+WRjU5RCZVRBVURRQ/oCb2gV+PReDbejPdJ65yRzuygPzI+vgFYSZkn</latexit> WHAT WE WOULD LIKE TO KNOW 1 p(H0|x) = p(H1|x)<latexit sha1_base64="1JettnS1nfDHVeV06DeUX+AEQ8Y=">AAAB/HicbZDLSgMxFIYz9VbrbbRLN8Ei1IVlooJuhIKbLivYC7TDkEnTNjSTGZKMOIz1Vdy4UMStD+LOtzHTzkJbfwh8/OcczsnvR5wp7TjfVmFldW19o7hZ2tre2d2z9w/aKowloS0S8lB2fawoZ4K2NNOcdiNJceBz2vEnN1m9c0+lYqG400lE3QCPBBsygrWxPLuMTqNqw3MeH06uM0AGPLvi1JyZ4DKgHCogV9Ozv/qDkMQBFZpwrFQPOZF2Uyw1I5xOS/1Y0QiTCR7RnkGBA6rcdHb8FB4bZwCHoTRPaDhzf0+kOFAqCXzTGWA9Vou1zPyv1ov18MpNmYhiTQWZLxrGHOoQZknAAZOUaJ4YwEQycyskYywx0SavkgkBLX55GdpnNXRec24vKnWUx1EEh+AIVAECl6AOGqAJWoCABDyDV/BmPVkv1rv1MW8tWPlMGfyR9fkDE+OTDg==</latexit> p(H0|x)<latexit sha1_base64="/MpXzWcP8EqakOTUlXIzz1ULR90=">AAAB73icbVDLSgNBEOz1GeMr6tHLYBDiJeyqoMeAlxwjmAckS5idzCZDZmfXmV4xxPyEFw+KePV3vPk3TpI9aGJBQ1HVTXdXkEhh0HW/nZXVtfWNzdxWfntnd2+/cHDYMHGqGa+zWMa6FVDDpVC8jgIlbyWa0yiQvBkMb6Z+84FrI2J1h6OE+xHtKxEKRtFKraRU7bpPj2fdQtEtuzOQZeJlpAgZat3CV6cXszTiCpmkxrQ9N0F/TDUKJvkk30kNTygb0j5vW6poxI0/nt07IadW6ZEw1rYUkpn6e2JMI2NGUWA7I4oDs+hNxf+8dorhtT8WKkmRKzZfFKaSYEymz5Oe0JyhHFlCmRb2VsIGVFOGNqK8DcFbfHmZNM7L3kXZvb0sVrwsjhwcwwmUwIMrqEAValAHBhKe4RXenHvnxXl3PuatK042cwR/4Hz+ABOrj0c=</latexit>
  • 8. The Point Unknown Behaviour Observed Sample Many alternatives to handle uncertainty associated with empirical results: 6WDWLVWLFDO QDOVLV +DQGERRN $ &RPSUHKHQVL H +DQGERRN RI 6 D LV LFDO &RQFHS V 7HFKQLT HV DQG 6RI DUH 7RROV (GL LRQ 'U 0LFKDHO - GH 6PL K
  • 9. WHAT NHST COMPUTES p(t(x) > ⌧|H0)<latexit sha1_base64="QScPf75YqpsLM08xO+kyaRgOrOs=">AAAB+XicbVBNS8NAEN3Ur1q/oh69LBahvZREBT1JwUuPFWwrtCFstpt26WYTdifFEvtPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFuFtfWNza3idmlnd2//wD48aus4VZS1aCxi9RAQzQSXrAUcBHtIFCNRIFgnGN3O/M6YKc1jeQ+ThHkRGUgeckrASL5tJxWoPFZvekDSp4bvVH277NScOfAqcXNSRjmavv3V68c0jZgEKojWXddJwMuIAk4Fm5Z6qWYJoSMyYF1DJYmY9rL55VN8ZpQ+DmNlSgKeq78nMhJpPYkC0xkRGOplbyb+53VTCK+9jMskBSbpYlGYCgwxnsWA+1wxCmJiCKGKm1sxHRJFKJiwSiYEd/nlVdI+r7kXNefuslx38ziK6ASdogpy0RWqowZqohaiaIye0St6szLrxXq3PhatBSufOUZ/YH3+ANqXknE=</latexit> BAYESIAN STATISTICAL ANALYSIS The Point STATISTICAL ANALYSIS OF EXPERIMENTAL RESULTS NULL HYPOTHESIS STATISTICAL TESTING Unknown Behaviour Observed Sample
  • 10. The Bayesian Approach The method focuses on estimating relevant information about the underlying performance parametric distribution represented by a set of parameters θ. This method asses the distribution of θ conditioned on a sample s drawn from the performance distribution. Instead of having a single probability distribution to model the underlying performance, Bayesian statistics considers all possible distributions and assigns a probability to each. P(✓|s) / P(s|✓)P(✓)<latexit sha1_base64="1oaUrufzQhQHrQgFYQ+vqg7duQg=">AAACEXicbVDLSsNAFJ34rPUVdelmsAjppiRV0GXRjcsI9gFtKJPppB06eTBzI5S0v+DGX3HjQhG37tz5N07bCNp6YOBwzr3cOcdPBFdg21/Gyura+sZmYau4vbO7t28eHDZUnErK6jQWsWz5RDHBI1YHDoK1EslI6AvW9IfXU795z6TicXQHo4R5IelHPOCUgJa6puVaHRgwIGNV7iQyTiDGrqXGc7GMf+xy1yzZFXsGvEycnJRQDrdrfnZ6MU1DFgEVRKm2YyfgZUQCp4JNip1UsYTQIemztqYRCZnyslmiCT7VSg8HsdQvAjxTf29kJFRqFPp6MiQwUIveVPzPa6cQXHoZj5IUWETnh4JUYB17Wg/ucckoiJEmhEqu/4rpgEhCQZdY1CU4i5GXSaNacc4q1dvzUu0qr6OAjtEJspCDLlAN3SAX1RFFD+gJvaBX49F4Nt6M9/noipHvHKE/MD6+ASzGnJY=</latexit> Posterior distribution of the parameters Likelihood function Prior distribution of the parameters HOW DO WE COMPARE MULTIPLE ALGORITHMS?
  • 11. Minimizing some instances of a problemMinimizing a given instance of a problem Algorithm f1 GA 100 PSO 90 ILP 135 SA 105 GP 95 . . . . . . From Results to Rankings Observed Sample σ1 3 1 5 4 2 . . . Algorithm f2 GA 130 PSO 80 ILP 135 SA 30 GP 300 . . . . . . σ2 3 2 4 1 5 . . . σ3 3 5 2 4 1 . . . σ4 4 5 3 1 2 . . . σ5 4 3 2 5 1 . . . Algorithm f3 GA 37 PSO 352 ILP 19 SA 100 GP 10 . . . . . . Algorithm f4 GA 566 PSO 756 ILP 101 SA 56 GP 57 . . . . . . Algorithm f5 GA 256 PSO 125 ILP 89 SA 369 GP 36 . . . . . . rankings, permutations
  • 12. ● Each algorithm in the comparison has a weight associated. ● The weights sum up 1. ● The weight associated to an algorithm represents its probability to appear at first rank. Plackett-luce Model P( ) = nY i=1 w i Pn j=i w j ! <latexit sha1_base64="l2ncjWDTg/lJpaxSOQNZ0W4MK+s=">AAACQXicbVBLSwMxGMzWd31VPXoJFqFeyq4KeikIXjxWsFro1iWbZtvYJLsk3ypl2b/mxX/gzbsXD4p49WL6OGjrQGAyMx9fMmEiuAHXfXEKc/MLi0vLK8XVtfWNzdLW9rWJU01Zg8Yi1s2QGCa4Yg3gIFgz0YzIULCbsH8+9G/umTY8VlcwSFhbkq7iEacErBSUmvWKb3hXkoOan+i4E2S85uW3CvuCRVDxI01o9hBk45B18zy3l1QG2V2ND4O/zDtrYl/zbg8OglLZrboj4FniTUgZTVAPSs9+J6apZAqoIMa0PDeBdkY0cCpYXvRTwxJC+6TLWpYqIplpZ6MGcrxvlQ6OYm2PAjxSf09kRBozkKFNSgI9M+0Nxf+8VgrRaTvjKkmBKTpeFKUCQ4yHdeIO14yCGFhCqOb2rZj2iC0NbOlFW4I3/eVZcn1Y9Y6q7uVx+ex4Uscy2kV7qII8dILO0AWqowai6BG9onf04Tw5b86n8zWOFpzJzA76A+f7BwPtslo=</latexit>
  • 13. #1 #2 #3 #4 #4 Plackett-luce Model w1 = 0.3<latexit sha1_base64="kzx8wZWjYtX8pfbSH0T899Osw8k=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbFvQiFLx4rGA/oF1KNs22oUl2SbJKWfojvHhQxKu/x5v/xrTdg7Y+GHi8N8PMvDDhTBvP+3YKG5tb2zvF3dLe/sHhUfn4pK3jVBHaIjGPVTfEmnImacsww2k3URSLkNNOOLmd+51HqjSL5YOZJjQQeCRZxAg2Vuo8Dfwbz60NyhXP9RZA68TPSQVyNAflr/4wJqmg0hCOte75XmKCDCvDCKezUj/VNMFkgke0Z6nEguogW5w7QxdWGaIoVrakQQv190SGhdZTEdpOgc1Yr3pz8T+vl5roOsiYTFJDJVkuilKOTIzmv6MhU5QYPrUEE8XsrYiMscLE2IRKNgR/9eV10q66fs2t3tcrjXoeRxHO4BwuwYcraMAdNKEFBCbwDK/w5iTOi/PufCxbC04+cwp/4Hz+ANiWjos=</latexit> w4 = 0.6<latexit sha1_base64="ih+pLwy7ZdqSbp6UNhQ/Da/0pZk=">AAAB7nicbVBNSwMxEJ3Ur1q/qh69BIvgadmtRb0IBS8eK9gPaJeSTbNtaDa7JFmlLP0RXjwo4tXf481/Y9ruQVsfDDzem2FmXpAIro3rfqPC2vrG5lZxu7Szu7d/UD48auk4VZQ1aSxi1QmIZoJL1jTcCNZJFCNRIFg7GN/O/PYjU5rH8sFMEuZHZCh5yCkxVmo/9Ws3rnPZL1dcx50DrxIvJxXI0eiXv3qDmKYRk4YKonXXcxPjZ0QZTgWblnqpZgmhYzJkXUsliZj2s/m5U3xmlQEOY2VLGjxXf09kJNJ6EgW2MyJmpJe9mfif101NeO1nXCapYZIuFoWpwCbGs9/xgCtGjZhYQqji9lZMR0QRamxCJRuCt/zyKmlVHe/Cqd7XKvVaHkcRTuAUzsGDK6jDHTSgCRTG8Ayv8IYS9ILe0ceitYDymWP4A/T5A+G6jpE=</latexit> w3 = 0.17<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszzWkvlhSHPqddf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bbGJYrju1kQOvEzUkFcrSG5a/BKCJJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d44ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CCNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit> w2 = 0.03<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AAAB73icbVBNSwMxEJ31s9avqkcvwSJ4KrttQS9CwYvHCvYD2qVk09k2NJtdk6xSSv+EFw+KePXvePPfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNIItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit>
  • 14. #1 #2 #3 #4 Plackett-luce Model w1 = 0.3<latexit sha1_base64="kzx8wZWjYtX8pfbSH0T899Osw8k=">AAAB7nicbVBNSwMxEJ2tX7V+VT16CRbB07LbFvQiFLx4rGA/oF1KNs22oUl2SbJKWfojvHhQxKu/x5v/xrTdg7Y+GHi8N8PMvDDhTBvP+3YKG5tb2zvF3dLe/sHhUfn4pK3jVBHaIjGPVTfEmnImacsww2k3URSLkNNOOLmd+51HqjSL5YOZJjQQeCRZxAg2Vuo8Dfwbz60NyhXP9RZA68TPSQVyNAflr/4wJqmg0hCOte75XmKCDCvDCKezUj/VNMFkgke0Z6nEguogW5w7QxdWGaIoVrakQQv190SGhdZTEdpOgc1Yr3pz8T+vl5roOsiYTFJDJVkuilKOTIzmv6MhU5QYPrUEE8XsrYiMscLE2IRKNgR/9eV10q66fs2t3tcrjXoeRxHO4BwuwYcraMAdNKEFBCbwDK/w5iTOi/PufCxbC04+cwp/4Hz+ANiWjos=</latexit> w3 = 0.17<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszzWkvlhSHPqddf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bbGJYrju1kQOvEzUkFcrSG5a/BKCJJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d44ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CCNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit> w2 = 0.03<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AAAB73icbVBNSwMxEJ31s9avqkcvwSJ4KrttQS9CwYvHCvYD2qVk09k2NJtdk6xSSv+EFw+KePXvePPfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNIItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit> #1
  • 15. #2 #3 #4 Plackett-luce Model w3 = 0.17<latexit sha1_base64="OBKhEeAk2eaGUTZBHfmTbwTPVNE=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0jaQr0IBS8eK9gPaEPZbDft0s0m7m6UEvonvHhQxKt/x5v/xm2ag7Y+GHi8N8PMPD/mTGnH+bYKG5tb2zvF3dLe/sHhUfn4pKOiRBLaJhGPZM/HinImaFszzWkvlhSHPqddf3qz8LuPVCoWiXs9i6kX4rFgASNYG6n3NKxdO7bbGJYrju1kQOvEzUkFcrSG5a/BKCJJSIUmHCvVd51YeymWmhFO56VBomiMyRSPad9QgUOqvDS7d44ujDJCQSRNCY0y9fdEikOlZqFvOkOsJ2rVW4j/ef1EB1deykScaCrIclGQcKQjtHgejZikRPOZIZhIZm5FZIIlJtpEVDIhuKsvr5NO1XZrdvWuXmnW8ziKcAbncAkuNKAJt9CCNhDg8Ayv8GY9WC/Wu/WxbC1Y+cwp/IH1+QNSzo7M</latexit> w2 = 0.03<latexit sha1_base64="vHBL8XOa+7zXqukyJXzFKCa0DxM=">AAAB73icbVBNSwMxEJ31s9avqkcvwSJ4KrttQS9CwYvHCvYD2qVk09k2NJtdk6xSSv+EFw+KePXvePPfmLZ70NYHIY/3ZpiZFySCa+O6387a+sbm1nZuJ7+7t39wWDg6buo4VQwbLBaxagdUo+ASG4Ybge1EIY0Cga1gdDPzW4+oNI/lvRkn6Ed0IHnIGTVWaj/1ytduya30CkX7zUFWiZeRImSo9wpf3X7M0gilYYJq3fHcxPgTqgxnAqf5bqoxoWxEB9ixVNIItT+Z7zsl51bpkzBW9klD5urvjgmNtB5Hga2MqBnqZW8m/ud1UhNe+RMuk9SgZItBYSqIicnseNLnCpkRY0soU9zuStiQKsqMjShvQ/CWT14lzXLJq5TKd9VirZrFkYNTOIML8OASanALdWgAAwHP8ApvzoPz4rw7H4vSNSfrOYE/cD5/AEmwjsY=</latexit> #1 #2 #3 P( ) = w4 w1 + w2 + w3 + w4 · w1 w1 + w2 + w3 · w3 w2 + w3 · w2 w2<latexit sha1_base64="k2yXUvSJjQYl5+sp1WrWkD6O2tU=">AAACVnicbZHNS8MwGMbTzrk5v6oevRSHMBmMdhvoRRh48TjBfcBaSpqlW1jSliR1jNJ/Ui/6p3gR022C+3jhDQ+/Jy9JnvgxJUJa1pemFw6Kh6XyUeX45PTs3Li47Iso4Qj3UEQjPvShwJSEuCeJpHgYcwyZT/HAnz3l/uANc0Gi8FUuYuwyOAlJQBCUCnkG69YcQSYM3j06AYconXvtTC12fe41VbfqOXDQOJJ/vr3hb3qt3NvDm0ueeUbValjLMneFvRZVsK6uZ7w74wglDIcSUSjEyLZi6aaQS4IozipOInAM0QxO8EjJEDIs3HQZS2beKjI2g4irDqW5pP8nUsiEWDBf7WRQTsW2l8N93iiRwYObkjBOJA7R6qAgoaaMzDxjc0w4RpIulICIE3VXE02hSkKqn6ioEOztJ++KfrNhtxrWS7vaaa/jKINrcANqwAb3oAOeQRf0AAIf4FvTtYL2qf3oRb202qpr65krsFG68QuBprRa</latexit>
  • 16. The bayesian model Posterior distribution of the weights Likelihood of the sample Prior distribution of the weights NY k=1 nY i=1 0 @ w (k) i Pn j=i w (k) j 1 A <latexit sha1_base64="382jpMOvUOBX2CNNv68sNU9hmgk=">AAACUHicbVFNaxsxFHzrNh910sRNj72ImoBzMbtNoL0EArnkFFKonYDXWbSydq1Y0i7S2xQj9ifmklt/Ry49tLRa24U27gOheTPzkDRKSykshuG3oPXi5cbm1var9s7u6739zpuDoS0qw/iAFbIwNym1XArNByhQ8pvScKpSya/T2XmjX99zY0Whv+C85GNFcy0ywSh6KunkcWmKSeJmp1F9e7lqRNNoEkueYS/ODGXua+JiK3JFb11vdlR7T13XnqpU4u5ORWNfs9x5C4mNyKd4lHS6YT9cFFkH0Qp0YVVXSecxnhSsUlwjk9TaURSWOHbUoGCS1+24srykbEZzPvJQU8Xt2C0CqcmhZyYkK4xfGsmC/XvCUWXtXKXeqShO7XOtIf+njSrMPo2d0GWFXLPlQVklCRakSZdMhOEM5dwDyozwdyVsSn2A6P+g7UOInj95HQw/9KPjfvj5pHt2sopjG97Be+hBBB/hDC7gCgbA4AGe4Af8DB6D78GvVrC0/tnhLfxTrfZvGq62uA==</latexit> R = { (1) , . . . , (N) }<latexit sha1_base64="p6uONzgcyQDNNmoWvv+HTlxo17g=">AAACD3icbVBNS8NAEN3Ur1q/oh69LBalhVISLehFKHjxJFVsKzSxbDbbdulmE3Y3Qgn5B178K148KOLVqzf/jds2iLY+GHi8N8PMPC9iVCrL+jJyC4tLyyv51cLa+sbmlrm905JhLDBp4pCF4tZDkjDKSVNRxchtJAgKPEba3vB87LfviZA05DdqFBE3QH1OexQjpaWueXh95iSOpP0A3SUlu5xWHOaHSlZ+tMty6qRds2hVrQngPLEzUgQZGl3z0/FDHAeEK8yQlB3bipSbIKEoZiQtOLEkEcJD1CcdTTkKiHSTyT8pPNCKD3uh0MUVnKi/JxIUSDkKPN0ZIDWQs95Y/M/rxKp36iaUR7EiHE8X9WIGVQjH4UCfCoIVG2mCsKD6VogHSCCsdIQFHYI9+/I8aR1V7eOqdVUr1mtZHHmwB/ZBCdjgBNTBBWiAJsDgATyBF/BqPBrPxpvxPm3NGdnMLvgD4+Mb1wab2w==</latexit> P(w|R) /<latexit sha1_base64="CzIyNBVIpLnUlZDF5eJdtnMe9Lw=">AAAB/3icbVDLSgMxFM3UV62vUcGNm2AR6qbMaEGXBTcuq9gHdIaSSTNtaGYSkoxSpl34K25cKOLW33Dn35hpZ6GtBwKHc+7lnpxAMKq043xbhZXVtfWN4mZpa3tnd8/eP2gpnkhMmpgzLjsBUoTRmDQ11Yx0hCQoChhpB6PrzG8/EKkoj+/1WBA/QoOYhhQjbaSefdSoeBHSwyBMH6eTuzNPSC4079llp+rMAJeJm5MyyNHo2V9en+MkIrHGDCnVdR2h/RRJTTEj05KXKCIQHqEB6Roao4goP53ln8JTo/RhyKV5sYYz9fdGiiKlxlFgJrOsatHLxP+8bqLDKz+lsUg0ifH8UJgwqDnMyoB9KgnWbGwIwpKarBAPkURYm8pKpgR38cvLpHVedS+qzm2tXK/ldRTBMTgBFeCCS1AHN6ABmgCDCXgGr+DNerJerHfrYz5asPKdQ/AH1ucPKnKWJw==</latexit> 1 B nY i=1 w↵i 1 i <latexit sha1_base64="/gfyjh4UDNfus5EbeDuQVHLsAyw=">AAACE3icbVDLSsNAFJ34rPUVdelmsAgiWBIt6EYounFZwT6gScNkMmmHTiZhZqKUkH9w46+4caGIWzfu/BunbRbaeuDC4Zx7ufceP2FUKsv6NhYWl5ZXVktr5fWNza1tc2e3JeNUYNLEMYtFx0eSMMpJU1HFSCcRBEU+I21/eD322/dESBrzOzVKiBuhPqchxUhpyTOPnVAgnNl5dpU7iYgDL6OXdt7j8MGjvcxBLBkgj8ITO/fMilW1JoDzxC5IBRRoeOaXE8Q4jQhXmCEpu7aVKDdDQlHMSF52UkkShIeoT7qachQR6WaTn3J4qJUAhrHQxRWcqL8nMhRJOYp83RkhNZCz3lj8z+umKrxwM8qTVBGOp4vClEEVw3FAMKCCYMVGmiAsqL4V4gHSISkdY1mHYM++PE9ap1X7rGrd1ir1WhFHCeyDA3AEbHAO6uAGNEATYPAInsEreDOejBfj3fiYti4Yxcwe+APj8wfKaZ4G</latexit> B = Qn i=1 (↵i) ( Pn i=1 ↵i)<latexit sha1_base64="lQ2UQ095A4jrK9whnNjihdhrbPg=">AAACL3icbVDLSgMxFM34rPU16tJNsAh1U2ZU0I1QFNRlBauFTh3upBkbmmSGJCOUoX/kxl/pRkQRt/6Faa1vDwQO55zLzT1Rypk2nvfgTExOTc/MFuaK8wuLS8vuyuqFTjJFaJ0kPFGNCDTlTNK6YYbTRqooiIjTy6h7NPQvb6jSLJHnppfSloBryWJGwFgpdI8PD4JYAcmDVCXtMGcHfv9K4uAEhIByADztQMi2+vmHojPxlfq0Q7fkVbwR8F/ij0kJjVEL3UHQTkgmqDSEg9ZN30tNKwdlGOG0XwwyTVMgXbimTUslCKpb+ejePt60ShvHibJPGjxSv0/kILTuicgmBZiO/u0Nxf+8Zmbi/VbOZJoZKsn7ojjj2CR4WB5uM0WJ4T1LgChm/4pJB2x9xlZctCX4v0/+Sy62K/5OxTvbLVV3x3UU0DraQGXkoz1URaeohuqIoFs0QI/oyblz7p1n5+U9OuGMZ9bQDzivb04RqSw=</latexit> No way to sample posterior distribution exactly à MCMC
  • 17. Bayesian inference for algorithm ranking analysis Instance #1 Instance #m Instance #2 Inst. #1 Inst. #m Inst. #2 Alg1 w1 w2 wn Alg2 Algn Performance Matrix Weight Vector Sample Run the Algorithms Rank the Algorithms Inst. #1 Inst. #m Inst. #2 Alg1 Alg2 Algn Ranking Matrix MCMC Sampling Query Posterior 0.0 0.2 0.4 0.6
  • 18. The Case of Study 23 FUNCTIONS TO OPTIMIZE: • OneMax (F1) and W-model extensions (F4-F10) • LeadingOnes (F2) and W-model extensions (F11- F17) • Harmonic (F3) • LABS: Low Autocorrelation Binary Sequences (F18) • Ising-Ring (F19) • Ising-Torus (F20) • Ising-Triangular (F21) • MIVS: Maximum Independent Vertex Set (F22) • NQP: N-Queens problem (F23) n 2 {16, 64, 100, 625}<latexit sha1_base64="HS0JdBr8a6YmSKd4vVyu+TiOCPw=">AAAB/nicbVBNS8NAEJ3Ur1q/ouLJy2IRPJSS1Fr1VvDisYKthSaUzXbbLt1swu5GKKHgX/HiQRGv/g5v/hu3bQ7a+mDg8d4MM/OCmDOlHefbyq2srq1v5DcLW9s7u3v2/kFLRYkktEkiHsl2gBXlTNCmZprTdiwpDgNOH4LRzdR/eKRSsUjc63FM/RAPBOszgrWRuvaR8JjwUrdWqlVLruOUapULb9K1i07ZmQEtEzcjRcjQ6NpfXi8iSUiFJhwr1XGdWPsplpoRTicFL1E0xmSEB7RjqMAhVX46O3+CTo3SQ/1ImhIazdTfEykOlRqHgekMsR6qRW8q/ud1Et2/8lMm4kRTQeaL+glHOkLTLFCPSUo0HxuCiWTmVkSGWGKiTWIFE4K7+PIyaVXK7nm5clct1q+zOPJwDCdwBi5cQh1uoQFNIJDCM7zCm/VkvVjv1se8NWdlM4fwB9bnD6Ask0w=</latexit> Problem Size: 11 Metaheuristic algorithms: • greedy Hill Climber (gHC) • Randomlized Local Search (RLS) • (1+1) EA • fast Genetic Algorithm (fGA) • (1+10) EA • (1+10) EAr/2,2r • (1+10) EAnorm • (1+10) EAvar • (1+10) EAlog-n • (1+(λ+λ)) GA • “vanilla” GA (vGA) Results of 11.132 runs are collected (23 x 4 x 11 x 11) • Aggregation of performances across 11 instances. • Median performance across 11 repetitions. Estimate the probability of each algorithm being top-ranked • as its expected weight in the posterior distribution of weights Analyze the uncertainty about the probabilities • By estimating the 90% credible intervals of the posterior distribution of weights (5% and 95%)
  • 19. Inference analyses & results QUALITATIVE SUMMARY Similar perf. (1+(λ+λ)) GA, (1+1)-EA, (1+10)-EAvar, (1+10)-Ealog-n, (1+10)-Eanorm,(1+10)-EAr/2,2r and fGA. Extreme perf. vGA and gHC. Easily treated instances are F1-F6, F8, F11-F13 and F15-16. Best solutions found for n=625
  • 20. Inference analyses & results Fixed-target perspective – Record Running-time (1+( , )) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.0 0.2 0.4 0.6 Probability of winning Algorithm F17, n=625, φ=625 F19, n=100, φ=100 (1+( , )) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.0 0.1 0.2 0.3 0.4 0.5 Probability of winning Algorithm Credible Intervals Only 11 samples to do inference à High uncertainty is expected! The more samples, the lower the uncertainty à Credibility intervals are more tight! Expected probability High uncertainty INTERPRETABILITY
  • 21. Inference analyses & results Fixed-target perspective – Record Running-time – Set of easy functions (1+( , )) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.00 0.25 0.50 0.75 1.00 Probability of winning Algorithm n=625, all runs (1+( , )) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.0 0.2 0.4 0.6 Probability of winning Algorithm n=625, median Credible Intervals Set of functions, two paths à (1) take all the runs, (2) take the median of the runs on each instance. gHC is the best in both cases à with more samples the uncertainty is lower
  • 22. Inference analyses & results Fixed-target perspective – Record Running-time – Set of non-easy functions Credible Intervals Good estimations à credible intervals smaller than 0.05 Probabilities are similar à due to overlapping Uncertainty about which is the best à but not due to limitation of data, but due to equivalence in the algorithms (1+( , )) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS 0.050 0.075 0.100 0.125 0.150 Probability of winning Algorithm n=625, all runs
  • 23. Inference analyses & results Fixed-budget perspective – Evolution winning probability - %90 credibility intervals 0.0 0.2 0.4 0.6 0 300 600 900 Budget Winningprobability (1+( , )) GA (1+1) EA gHC (1+10) EA_r/2,2r (1+10) EA (1+10) EA_log-n. (1+10) EA_norm. (1+1) EA_var. fGA vGA RLS F21, n=100 gHC is the best, but probability decreases while the rest improve. gHC becomes better, as the budget increases. 3 4 5 6 7 8 9 10 11 Algorithms ranked with average data Wilcoxon test for pairwise comparisons, and shaffer’s method for p-value correction. BAYESIAN ANALYSIS ESTIMATED PROBABILITY AND NOTION OF UNCERTAINTY IN THE FORM OF CREDIBLE INTERVAL
  • 24. Inference analyses & results Impact of the prior distribution – Comparison of three different priors 0.0 0.2 0.4 0.6 (1+( , ))G A (1+1)EA gH C (1+10)EA_r/2,2r (1+10)EA (1+10)EA_log-n. (1+10)EA_norm . (1+1)EA_var. fG A vG A R LS Algorithm Winningprobability Prior Unifor Empirical Deceptive F9, n=100, φ=100 Empirical data favours the best performing algorithms Neligible effect (even when median values are considered)
  • 25. Discussion Bayesian inference using Plackett-Luce for analysis of algorithms’ performance ranking Include it in the practical EC performance comparison’ tool set à IOHProfiler Strong points Ability to handle multiple algorithms Interpretability Exact description of the uncertainty WEAKNESSES Aggregating performances into rankings we loose information about the magnitude of differences Limitations of the Plackett-Luce model à From n! to n parameters. How do we deal with ties?
  • 26. scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems
  • 27. Josu Ceberio Bayesian Analysis for Algorithm Performance Comparison Thank you very much for your attention!