Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling Techniques

ENHANCING INTELLIGENT AGENTS BY IMPROVING HUMAN
BEHAVIOR IMITATION USING STATISTICAL MODELING
TECHNIQUES
By
Osama Salah Eldin Farag
A Thesis submitted to the
Faculty of Engineering at Cairo University in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE
In
Computer Engineering
FACULTY OF ENGINEERING, CAIRO UNIVERSITY
GIZA, EGYPT
2015

TECHNIQUES
By
MASTER OF SCIENCE
In
Under the Supervision of
Prof. Dr. Magda Bahaa Eldin Fayek
Professor of
Faculty of Engineering, Cairo University
GIZA, EGYPT
2015

TECHNIQUES
By
MASTER OF SCIENCE
In
Approved by the
Examining Committee
____________________________________________________________
Prof. Dr. Magda Bahaa Eldin Fayek, Thesis Main Advisor
Prof. Dr. Samia Abdulrazik Mashaly,
- Department of Computers and Systems,
Electronics Research Institute
Prof. Dr. Mohamed Moustafa Saleh
- Department of Operations Research &
Decision Support, Faculty of Computers
and Information, Cairo University
GIZA, EGYPT
2015

Insert photo
here
Engineer’s Name: Osama Salah Eldin Farag
Date of Birth: 11 / 2 / 1987
Nationality: Egyptian
E-mail: osamasalah@outlook.com
Phone: +20/ 100 75 34 156
Address: Egypt – Zagazig City
Registration Date: 1 / 10 / 2010
Awarding Date: / / 2015
Degree: Master of Science
Department: Computer Engineering
Supervisors:
Prof. Magda B. Fayek
Examiners:
Prof. Samia Abdulrazik Mashaly
Prof. Mohamed Moustafa Saleh
Prof. Magda B. Fayek
Title of Thesis:
Enhancing intelligent agents by improving human behavior imitation using
statistical modeling techniques
Keywords:
Intelligent agent; Cognitive agent; Human imitation; Evolutionary computation;
Machine Learning
Summary:
This thesis introduces a novel non-neurological method for modeling human
behaviors. It integrates statistical modeling techniques with “the society of mind” theory
to build a system that imitates human behaviors. The introduced Human Imitating
Cognitive Modeling Agent (HICMA) can autonomously change its behavior according
to the situation it encounters.

ix
Acknowledgements
“All the praises and thanks be to Allah, Who has guided us to this, and never could we
have found guidance, were it not that Allah had guided us”
Immeasurable appreciation and deepest gratitude for the help and support are
extended to the following persons who, in one way or another, contributed in making this
work possible.
A sincere gratitude I give to Prof. Magda B. Fayek for her support, valuable
advice, guidance, precious comments, suggestions, and patience that benefited me much
in completing this work. I, heartily, appreciate her effort to impart her experience and
knowledge to my work.
I would also like to acknowledge with much appreciation all participants of
Robocode experiments. Many thanks to Ali El-Seddeek and his fellows; the students of
Computers department at faculty of Engineering - Cairo University. Thanks a lot to
Ahmed Reda and his students at faculty of Engineering - Zagazig University. Also,
thanks a million to my friends and coworkers who kindly participated in these
experiments.
Deep thanks to Mahmoud Ali and Mohammed Hamdy for helping getting
material and information that supported this work.
Finally, I warmly thank my family who has been motivating me to keep moving
forward. My deepest appreciation to all those who helped me complete this work.

xi
Table of Contents
Acknowledgements........................................................................................................ix
Table of Contents...........................................................................................................xi
List of Tables................................................................................................................xiv
List of Figures ...............................................................................................................xv
List of Abbreviations................................................................................................ xviii
Nomenclature...............................................................................................................xix
Abstract ........................................................................................................................xxi
Chapter 1: Introduction...........................................................................................1
Problem Statement......................................................................................1
Literature Review .......................................................................................1
Previous Work............................................................................................2
Contributions of this Work.........................................................................4
Applications................................................................................................4
1.5.1 Brain Model Functions........................................................................5
1.5.2 Artificial Personality ...........................................................................5
1.5.3 Ambient Intelligence and Internet of Things ......................................6
1.5.4 Ubiquitous Computing and Ubiquitous Robotics ...............................7
Techniques..................................................................................................7
1.6.1 Feature Selection.................................................................................7
1.6.2 Modeling ...........................................................................................10
Organization of the Thesis........................................................................11
Chapter 2: Background .........................................................................................13
Introduction ..............................................................................................13
The Society of Mind.................................................................................13

xii
2.2.1 Introduction ...................................................................................... 13
2.2.2 Example – Building a Tower of Blocks ........................................... 13
Evolutionary Computation (EC).............................................................. 16
Evolutionary Algorithms (EA) ................................................................ 16
Genetic Algorithms (GA) ........................................................................ 17
Optimization Problem.............................................................................. 19
2.6.1 Introduction ...................................................................................... 19
2.6.2 Mathematical Definition................................................................... 20
Evolution Strategies................................................................................. 23
2.7.1 Basic Evolution Strategies................................................................ 23
2.7.2 Step-size Adaptation Evolution Strategy (σSA-ES )........................ 26
2.7.3 Cumulative Step-Size Adaptation (CSA)......................................... 28
2.7.4 Covariance Matrix Adaptation Evolution Strategy (CMA-ES) ....... 30
Nelder-Mead Method............................................................................... 38
2.8.1 Introduction ...................................................................................... 38
2.8.2 What is a simplex ............................................................................. 38
2.8.3 Operation .......................................................................................... 38
2.8.4 Nelder-Mead Algorithm ................................................................... 40
Robocode Game....................................................................................... 44
2.9.1 Robot Anatomy................................................................................. 44
2.9.2 Robot Code....................................................................................... 44
2.9.3 Scoring.............................................................................................. 44
Chapter 3: Human Imitating Cognitive Modeling Agent (HICMA) ................ 45
Introduction.............................................................................................. 45
The Structure of HICMA......................................................................... 45
3.2.1 Modeling Agent................................................................................ 46

xiii
3.2.2 Estimation Agent...............................................................................48
3.2.3 Shooting Agent..................................................................................48
3.2.4 Evolver Agents..................................................................................49
The Operation of HICMA ........................................................................55
Chapter 4: Experiments and Results....................................................................65
Human-Similarity Experiments................................................................66
4.1.1 Human Behavior Imitation................................................................66
4.1.2 Human Performance Similarity.........................................................69
Modeling-Agent Evolution.......................................................................70
Chapter 5: Conclusions and Future Work...........................................................77
Conclusions ..............................................................................................77
Future Work..............................................................................................78
References......................................................................................................................79
Feature Selection.................................................................................85
A.1. Mutual Information ..................................................................................85
A.1.1. Histogram Density Estimation ..........................................................85
A.1.2. Kernel Density Estimation ................................................................87
A.2. Correlation................................................................................................91
A.2.1. Pearson Correlation Coefficient (PCC).............................................91
A.2.2. Distance Correlation..........................................................................92

xiv
List of Tables
Table 2-1: A simple CMA-ES code............................................................................... 35
Table 2-2: An example of “fitness” function................................................................. 36
Table 2-3: An example of “sortPop” function ............................................................... 36
Table 2-4: An example of “recomb” function ............................................................... 36
Table 2-5: Simplexes in different dimensions ............................................................... 38
Table 2-6: Nelder-Mead Algorithm............................................................................... 40
Table 2-7: Iteration count for different initial guesses of Nelder-Mead Algorithm ...... 42
Table 3-1: The parameters of modeling agent ............................................................... 47
Table 3-2: The function lexicon..................................................................................... 50
Table 4-1: Robocode simulation parameters ................................................................. 66
Table 4-2: Human behavior interpretation..................................................................... 68
Table 4-3: Description of human behaviors modeled by mathematical functions ........ 68
Table 4-4: The initial state of HICMA .......................................................................... 69
Table 4-5: Human Players Data..................................................................................... 69
Table 4-6: The parameters of Nelder-Mead evolver used in the experiment ................ 75
Table 4-7: Bad parameter values of Nelder-Mead evolver (used for verfication)......... 75
Table A-1: Common Kernel density functions.............................................................. 88
Table A-2: Rule-of-thumb constants ............................................................................. 89
Table A-3: Common kernel constants ........................................................................... 90

xv
List of Figures
Fig. 1-1: Asimple IoT system...........................................................................................7
Fig. 1-2: A model of battery-decay-rate...........................................................................8
Fig. 1-3: mathematical modeling techniques utilized or experimented in this work .......8
Fig. 1-4: Modeling-related technique...............................................................................9
Fig. 2-1: AI Disciplines map .........................................................................................14
Fig. 2-2: A builder agent with its sub-sgents..................................................................15
Fig. 2-3: Sub-agents of add agent...................................................................................15
Fig. 2-4: A general scheme of evolutionary algorithms.................................................17
Fig. 2-5: A 2D fitness landscape ....................................................................................17
Fig. 2-6: Basic GA flowchart .........................................................................................18
Fig. 2-7: An objective function ......................................................................................19
Fig. 2-8: Local and global minima .................................................................................20
Fig. 2-9: An example application of mathematical optimization...................................21
Fig. 2-10 Minimization example in a 2D search space ..................................................22
Fig. 2-11: Basic steps of evolution strategies.................................................................24
Fig. 2-12: Visualization of the search process of a (1/1,100)-ES...................................25
Fig. 2-13: One-Sigma ellipse of bivariate normal distribution N(0,I) [µ=0, σ=I]..........26
Fig. 2-14: Two random probability distributions with (a) σ = 1.0 and (b) σ = 3.0. The
circles are the one-sigma elipses.............................................................................27
Fig. 2-15: A 2D normal distribution (a) 2D vector of points and (b) two 1D histograms
.................................................................................................................................28
Fig. 2-16: The principle of Cumulative Step-size Adaptation (CSA),...........................28
Fig. 2-17: A population of (a) Step-size Adaptation and (b) Covariance Matrix
Adaptation...............................................................................................................30
Fig. 2-18: A 2D normal distribution N(0,C) [µ=0, σ=C]................................................31
Fig. 2-19: Optimization of 2D problem using CMA-ES................................................31

xvi
Fig. 2-20: Operations of Nelder-Mead algorithm.......................................................... 39
Fig. 2-21: Twelve iterations of a practical run of Nelder-Mead algorithm ................... 42
Fig. 2-22: Nelder-Mead algorithm flowchart................................................................. 43
Fig. 2-23: A Robocode robot anatomy .......................................................................... 44
Fig. 3-1: The structure of HICMA’s Robocode agent ................................................... 45
Fig. 3-2: The block diagram of HICMA’s modeling agent ........................................... 46
Fig. 3-3: The hybrid optimizatiob method..................................................................... 47
Fig. 3-4: The block diagram of HICMA’s modeling agent ........................................... 48
Fig. 3-5: The block diagram of HICMA’s shooting agent ............................................ 48
Fig. 3-6: The operation of Nelder-Mead evolver agent ................................................. 49
Fig. 3-7: A chromosome example.................................................................................. 51
Fig. 3-8: The flowchart of the mutation process............................................................ 54
Fig. 3-9: Operation phases flowchart of HICMA .......................................................... 55
Fig. 3-10: Solving the estimating problem .................................................................... 58
Fig. 3-11: The initialization phase of HICMA............................................................... 59
Fig. 3-12: The operation phase of HICMA.................................................................... 60
Fig. 3-13: The evolution phase of HICMA.................................................................... 61
Fig. 3-14: The evolution sequence of parameters.......................................................... 62
Fig. 3-15: The optimization operation of the modeling agent ....................................... 63
Fig. 3-16: The operation of the estimation agent........................................................... 64
Fig. 4-1: Experiment map with HICMA’s agents.......................................................... 65
Fig. 4-2: The behavior of every function pair against: (a) Shadow and (b) Walls......... 67
Fig. 4-3: Peroformace similarity with human players ................................................... 70
Fig. 4-4: Peroformace difference with human players................................................... 70
Fig. 4-5: The evolution of the parameters of the modeling agent.................................. 74
Fig. A-1: A histogram.................................................................................................... 85

xvii
Fig. A-2: A 2D histogtam with origin at (a) (-1.5, -1.5) and (b) (-1.625, -1.625)..........87
Fig. A-3: Kernel density estimation. The density distribution (dotted curve) is estimated
by the accumulation (solid curve) of Gaussian function curves (dashed curves)...88
Fig. A-4: 2D kernel density estimate (a) individual kernels and (b) the final KDE.......90
Fig. A-5: Pearson Correlations of different relationships between two variables..........91
Fig. A-6: Distance Correlation of linear and non-linear relationships ...........................92
Fig. A-7: Murual Information vs Distance Correlation as dependence measures..........92

xviii
List of Abbreviations
AI Artificial Intelligence
AmI Ambient Intelligence
ANN Artificial Neural Network
CMA-ES Covariance Matrix Adaptation Evolution Strategy
CSA-ES Cumulative Step-size Adaptation Evolution Strategy
EA Evolutionary Algorithm
EC Evolutionary Computation
ES Evolution Strategy
GA Genetic Algorithm
HICMA Human Imitating Cognitive Modeling Agent
IA Intelligent Agent
IoT Internet of Things
LAD Least Absolute Deviations
LRMB Layered Reference Model of the Brain
MFT Modeling Field Theory
PCC Pearson Correlation Coefficient
PIR Passive Infrared
RL Reinforcement Learning
RSS Residual Sum of Squares
SA-ES Step-size Adaptation Evolution Strategy
Ubibot Ubiquitous Robot
Ubicomp Ubiquitous Computing

xix
Nomenclature
A
agency................................................ 14
Ambience............................................. 6
Ambient intelligence ............................ 6
Artificial Neural Network.................... 2
C
comma-selection.......................... 22, 24
cost function....................................... 20
covariance.......................................... 89
D
direct behavior imitation...................... 2
E
elite .................................................... 22
elitist selection................................... 22
Euclidean norm ........................... 28, 32
evaluation function ............................ 15
evolver agent...................................... 45
F
feature selection................................... 7
fitness function ............................ 18, 22
fitness landscape.......................... 16, 22
functional form .................................. 10
G
Genetic Algorithms ............................. 1
global optimum............................ 16, 18
H
heuristic search .................................. 37
Histogram .......................................... 83
I
indicator function ...............................84
indirect behavior imitation...................2
Intelligent Agent ..................................1
Internet of Things (IoT) .......................6
J
joint probability distribution ..............83
K
kernel..................................................85
Kernel Density Estimation.................85
Koza-style GP ......................................2
L
local minimum ...................................37
local optimum ..............................16, 18
loss function.......................................18
M
marginal probability distribution .83, 87
mutation strength ...............................25
N
Nelder-Mead ......................................37
neurons.................................................2
O
object parameter ................................23
objective function.........................18, 20
one-σ ellipse.......................................29
one-σ line ...........................................24
P
plus-selection ...............................22, 24

xx
R
Reinforcement Learning ......................1
reinforcement signal ............................1
Robocode .............................................2
S
sample standard deviation..................87
search costs........................................22
search space .......................................20
search-space ......................................16
Silverman’s rule of thumb .................87
society of mind.....................................1
standard deviation.............................. 89
statistical model................................... 7
stochastic optimization problem........ 15
stochastic search................................ 15
T
training dataset .................................... 2
U
ubibots ................................................. 7
Ubicomp .............................................. 7
Ubiquitous Computing ........................ 7

xxi
Abstract
Human intelligence is the greatest inspirational source of artificial intelligence
(AI). The ambition of all AI researches is to build systems that behave in a human-like
manner. For this goal, researches follow one of two directions: either studying the
neurological theories of the human brain or finding how a human-like intelligence can
stem from non-biological methods. This research follows the second school. It employs
statistical methods for imitating human behaviors. A Human-Imitating Cognitive
Modeling Agent (HICMA) is introduced. It combines different non-neurological
techniques for building and tuning models of the environment. Every combination of the
parameters of these models represents a typical human behavior. HICMA is a society of
intelligent agents that interact together. HICMA adopts the society of mind theory and
extends it by introducing a new type of agents: the evolver agent. An evolver is a special
type of agents whose function is to adapt other agents according to the encountered
situations. HICMA’s simple representation of human behaviors allows an evolver agent
to dress the entire system in a suitable human-like behavior (or personality) according to
the given situation.
HICMA is tested on Robocode [1]. Robocode is a game where autonomous tanks
battle in an arena. Every tank is provided by a gun and a radar. The proposed system
consists of a society of five agents (including two evolver agents) that cooperate to
control a Robocode tank in a human-like behavior. The individuals of the society are
based on statistical and evolutionary methods: CMA-ES, Nelder-Mead algorithm, and
Genetic Algorithms (GA).
Results show that HICMA could develop human-like behaviors in Robocode
battles. Furthermore, it could select the suitable behavior for every Robocode battle.

1
Chapter 1: Introduction
Problem Statement
An Intelligent Agent (IA) is an entity that interacts with the environment by
observing it via sensors and acting on it via actuators. This interaction aims at achieving
some goals. An IA usually builds and keeps models of the environment and the
interesting objects in this environment. Such models represent how an AI sees the
environment and, consequently, how it behaves in it. These models are continuously
adapted according to the environmental changes. These adaptations are reflected on the
behavior of the AI. The ability of an AI to conform to environmental changes mainly
depends on the mechanisms it uses for building and adapting its internal model(s) of the
environment. These internal models became the focus of wide researches in the field of
artificial intelligence (AI). As human intelligence is the greatest inspirational source for
AI, many researches focused on imitating human intelligence by envisioning how human
brain works. However, the operation of mind is very complicated, so it may be better to
imitate human behavior without emulating the operation of mind. This thesis tackles
imitating human behavior by simple mathematical models.
This work integrates the society of mind [2] theory with statistical approaches to
achieve human-like intelligence. An enhanced adaptable model of the society of mind
theory is proposed, where statistical approaches are used to autonomously build and
optimize models of the environment and the interesting objects in this environment.
Literature Review
The problem of providing intelligent agents with human-like behavior has been
tackled in many researches using different AI techniques such as Reinforcement
Learning (RL), Genetic Algorithms (GA), and Artificial Neural Networks (ANN).
Reinforcement Learning (RL) targets the problem of which behavior an agent
should have to maximize a reward called the reinforcement signal. The agent learns the
proper behavior by trial-and-error interactions with the dynamic environment. It observes
the state of the environment through its sensors and selects an action to apply on the
environment by its actuators. Each of the available actions has a different effect on the
environment and gains a different reinforcement signal. The agent behavior should
choose actions that maximize the cumulative reinforcement signal. Learning the proper
behavior is a systematic trial-and-error process guided by a wide variety of RL
algorithms.
Genetic Algorithms (GA) is a type of evolutionary algorithms, which imitates the
natural evolution of generations. It encodes the solution of a problem into the form of
chromosomes and generates a population of candidate-solution individuals each
represented by one or more chromosomes. The individuals (candidate solutions) are then
adapted iteratively hoping to find a good one eventually. An overview of GA is given in
section 2.5.

2
An Artificial Neural Network (ANN) is imitation of the biological neural networks
existing in the brains of living organisms. Both consists of neurons organized and
connected together in a specific way. The neurons are grouped in three layers: an input
layer, an output layer, and one or more intermediate hidden layers. Some or all neurons
of every layer are connected to the neurons of the next layer via weighted directed
connections. The inputs of the network are received via the input layer and passed to the
hidden the output layers over the weighted connections. The network is trained on a
training dataset that consists of observations of inputs along with their corresponding
outputs. The goal of this learning stage is to adapt the weights of the interconnections so
that the output of the network for a given input is as close as possible to the output in the
training dataset for the same input. ANNs have been utilized in a wide variety of fields
including, but not limited to, computer vision, speech recognition, robotics, control
systems, and game playing and decision making.
This section examines some of the previous works based on RL, GA, and ANN
that had novel contributions to AI and can give inspiration for human imitation
techniques. The review of a previous work focuses on how this work shows human
behavior imitation regardless of how efficient (with respect to scoring) this work is in
comparison with similar ones. The novel contributions of the previous works are
extracted to support human imitation of this thesis.
Previous Work
An extensive comparison between different human behavior-imitation techniques
is introduced in [3]. They are divided into direct and indirect behavior imitation.
In direct behavior imitation, a controller is trained to output the same actions a
human took when faced the same situation. This means that the performance of the
controller depends on the performance of the human exemplar. This arises a dilemma:
should the human exemplar be skillful or amateur? If he was skillful, the IA may
encounter new hard situations that have never been encountered by that human exemplar
because he is merely clever enough to avoid falling into such hard situations. On the other
hand, if the human exemplar is amateur, then imitating his behavior will not endow the
IA with good performance.
On the other hand, indirect behavior imitation uses an optimization algorithm to
optimize a fitness function that measures the human similarity of an IA. There is no
human exemplar, so indirect techniques can achieve more generalization than direct ones
[3]. It is found that controllers trained using GA (indirect) performed more similar to
human players than those trained using back-propagation (direct). All previous works
presented in this section employ indirect techniques.
Genetic Programming (GP) is used in [4] to build a robot for the Robocode game.
In Robocode, programmed autonomous robots fight against each other, this involves
shooting at the enemy robots and dodging their bullets. The authors used Koza-style GP
where every individual is a program composed of functions and terminals. The functions
they used are arithmetic and logical ones (e.g. addition, subtraction, OR, etc.). Their

3
evolved robot could win the third rank in a competition against other 26 manually
programmed robots.
The same authors of [4] used the same technique, GP, to evolve car-driving
controllers for the Robot Auto Racing Simulator (RARS) [5]. The top evolved controllers
won the second and the third ranks over other 14 RARS hand-coded controllers.
Also In [6], GP is used for evolving car racing controllers. GP builds a model of
the track and a model of the driving controller. The model of the track is built and stored
in memory. Then the driving controller uses this model during the race to output the
driving commands. The driving controller is a two-branch tree of functions and terminals.
The output of the first sub-tree is interpreted as a driving command (gas/break) and the
output of the second one is the steering command (left/right). The functions of a tree are
mathematical ones in addition to memory functions for reading the model stored in
memory. The terminals of a tree are the readings of some sensors provided by the
simulation environment (e.g. distances to track borders, distance traveled, and speed).
A work similar to [6] is introduced in [7] where virtual car driving controllers for
RARS are evolved using GP. An individual consists of two trees; one controls the
steering angle and the other triggers the gas and brake pedals. The controller of the
steering angle is a simple proportional controller that tries to keep the car as close as
possible to the middle of the road. A proportional controller merely gives a steering angle
proportional to the current deviation from the middle of the road without concerning the
previous deviations or the expected future deviation tendency. The best-evolved
controller performed well but not enough to compete with other elaborate manually
constructed controllers.
An interesting comparison between GP and artificial neural networks (ANN) in
evolving controllers is made in [8]. Car controllers similar to those of [6] and [7] were
built using GP and ANN and their performances were compared. It is found that the GP
controllers evolve much faster than those of ANN do. However, the ANN controllers
ultimately reach higher fitness. In addition, the ANN controllers outperform the GP
controllers in generality; ANN controllers could perform significantly better on tracks
for which they have not been evolved. Finally, it is found that both GP and ANN could
use the controllers trained for one track as seeds for evolving other controllers trained for
all tracks. Both GP and ANN could generalize these controllers proficiently on the
majority of eight different tracks. However, it is found that ANN controllers generalized
better.
In [9], an ANN is trained using GA to ride simulated motorbikes in a computer
game. It is found that GA could create a human-like performance and even could find
solutions that no human has previously found. GA is then compared with back-
propagation learning algorithm. As GA requires no training data, it could adapt to any
new track. However, its solutions are not optimal and not as good as the solutions of a
good human player or the solutions of the back-propagation algorithm. On the other hand,
back-propagation requires training data recorded from a game played by a good human
player, but cannot be trained to deal with unusual situations.

4
Contributions of this Work
This work introduces a flexible and expandable model of IAs. The evolved IA can
autonomously adapt itself to the encountered situation to get higher fitness. Its fitness
function seems as if it implicitly involves an unlimited number of fitness functions from
which it selects the most appropriate one. For example, in Robocode game, our IA is
required to get the highest possible score. It autonomously finds that saving its power
implicitly increases its score. Consequently, it evolves a behavior that targets decreasing
power consumption.
Furthermore, the proposed IA not only evolves human-like behaviors but also
allows for the manual selection among these evolved behaviors. This can be used in
computer games to generate intelligent opponents with various human-like behaviors
.This can make computer games more exciting and challenging. In addition, the evolution
process requires no supervision; it is a type of indirect human imitation.
This thesis introduces the following to the field of AI:
 An evolvable model of the society of mind theory by introducing a new type of
agents (evolver agents) that facilitates autonomous behavior adaptation
 Automatic unsupervised behavior selection according to the encountered
situation
 Simple mathematical representations of different personalities of IAs without
emulating the brain
In addition to the automatic behavior selection, the proposed agent can be
influenced towards a certain behavior. This influence is easily achieved by changing the
fitness function. A suitable fitness function is selected and the agent automatically
changes its behavior to satisfy it. This is similar to a child who acquires the manners and
behaviors of his ideals (e.g. his parents). Also, as the fitness of the agent can be fed back
directly from a human exemplar (e.g. the user), the agent can autonomously learn to have
a satisfying behavior to that person. Experiments show that a wise selection of the fitness
function inspires the agent to the required behavior.
The simple representation of behaviors enables the agent to survive in different
environments. It can learn different behaviors and autonomously select the suitable
behavior for an environment. For example, a robot can live with different persons and
select the behavior that satisfies each person.
Applications
Intelligent Agents are used in almost all fields of life due to their flexibility and
autonomy. They contribute in several emerging disciplines such as ambient intelligence
and cognitive robotics. The benefits of intelligent systems are unlimited. They can be
useful in educating children, guiding tourists, helping handicapped and old persons …

5
etc. This section presents what and how emerging disciplines can benefit from work
introduced in this thesis
1.5.1 Brain Model Functions
Several models of the brain have been introduced to AI literature. These models
can be categorized into two classes: Models that imitate the architecture of the brain, and
models that imitate the human behavior without imitating the architecture of the brain.
The brain models of the first class develop simple architectures similar to those of
the brain of an organism. They normally adopt Artificial Neural Networks (ANN). An
example of this category is the confabulation theory. It is proposed as the fundamental
mechanism of all aspects of cognition (vision, hearing, planning, language, initiation of
thought and movement, etc.) [10]. This theory has been hypothesized to be the core
explanation for the information processing effectiveness of thought.
The other class of brain modeling theories tries to develop a model of the brain that
does not necessarily resemble the brain of an organism. These models aim to develop
models of the brain that rigorously implement its functions. This implementation can
mainly depend on non-neurological basis such as mathematics, statistics, probability
theory … etc. Examples of this category are: the Modeling Field Theory (MFT) [11],
[12], the Layered Reference Model of the Brain (LRMB) [13]–[16], Bayesian models of
cognition [17], the society of mind [2] outlined in 2.2, and the Human Imitating Cognitive
Modeling Agent (HICMA) [18] introduced in 2.9. These theories adopt different theories
for modeling the brain. However, they all have the same target: developing an artificial
brain that behaves like a human (or animal) brain when encountering real-life situations.
This imitation includes not only brain’s strong points (e.g. adaptability and learning
capability) but also its weak points such as memory loss over time. Inheriting the weak
points of the brain is not necessarily a drawback. It can be useful in some applications
such as in computer games, where a human-like opponent is more amusing than a genius
one.
1.5.2 Artificial Personality
Personality can be defined as a characteristic way of thinking, feeling, and
behaving. It includes behavioral characteristics, both inherent and acquired, that
distinguish one person from another [19]. When imitating human behaviors, personality
must be taken into account, as personality is the engine of behavior [20]. As an important
trait of humans, it has gained much focus of research. For a genetic robot’s personality,
genes are considered as key components in defining a creature’s personality [21]. That
is, every robot has its own genome in which each chromosome, consisting of many genes,
contributes to defining the robot’s personality.
In this work, human behaviors are represented by mathematical models. The
parameters of a model define not only a human behavior but also the degree of that
behavior. For example, a robot can have a tricky behavior and another robot can be
learned to be trickier.

6
Another advantage of mathematical modeling of behaviors is that it opens the door
to the great facilities of mathematical optimization techniques such as CMA-ES and
Nelder-Mead described in sections 2.7 and 2.8 respectively.
1.5.3 Ambient Intelligence and Internet of Things
Ambience is the character and atmosphere of some place [22]. It is everything
surrounding us in the environment. This includes lights, doors and windows, TV sets,
computers … etc. Ambient intelligence (AmI) is providing the ambience with enough
intelligence to understand user’s preferences and adapt to his needs. It incorporates
smartness into the environment for comfort, safe, secure, healthy, and energy conserving
environment. The applications of AmI in life are unlimited. They include helping
elderlies and disabled persons, nursing children, guiding tourists … etc.
An example of an AmI application is a smart home that detects the arrival of the
user and automatically takes the actions that the user likely requires. It can switch on
certain lights, turn on the TV and switch to the user’s favorite channel with the preferred
volume, suggest and order a meal from a restaurant and so on. All of these actions are
taken automatically by the AmI system depending on preferences the system has learned
before about the user.
The evolution of AmI has led to the emergence of the Internet of Things (IoT). IoT
provides the necessary intercommunication system between the smart things in the
environment including sensors, processors, and actuators. Sensors (e.g. temperature,
clock, humidity … etc.) send their information to a central processor. The processor
receives their information and guesses what tasks the user may like to be done. Finally,
the processor decides what actions need to be done and sends commands to the concerned
actuators to put these actions into effect.
An example scenario: A PIR (Passive Infra-Red) sensor detects the arrival of the
user and sends a trigger to a processor. The processor receives this trigger along with
information from a clock and a temperature sensor. It reviews its expertise in the user’s
preferences and guesses that he likely desires a shower when he arrives at that time in
such a hot weather. Consequently, the processor sends a command to the water heater to
get the water heated at the user’s preferred degree. The same principle can be scaled up
to hundreds or thousands of things connected together via a network. A simple IoT
system is shown in Fig. 1-1.
This work proposes a simple mathematical modeling of human behaviors. This
modeling can be useful for providing the smart environment with human behaviors so
that it interacts with the user humanly. Furthermore, it is simple enough for the
environment to tailor different behaviors for different users.

7
1.5.4 Ubiquitous Computing and Ubiquitous Robotics
Ambient Intelligence is based on Ubiquitous Computing (ubicomp). The word
“ubiquitous” means existing or being everywhere at the same time [23].Ubicomp is a
computer science concept where computing exists everywhere and not limited to
personal computers and servers. The world that human will be living in is expected to be
fully ubiquitous and everything to be networked. Information may flow freely among
lamps, TVs, vehicles, cellular phones, computers, smart watches and glasses … etc.
Among this ubiquitous life, the ubiquitous robotics (ubibots) may be living as artificial
individuals. They, over other things, deal directly with their human users. This requires
them to understand and imitate human behaviors. Again, mathematical behavior
modeling can be useful.
Techniques
The main part of this work is the statistical modeling. Given a set of observations
or samples, a statistical model is the mathematical representation of the process assumed
to have generated these samples. This section outlines statistical techniques utilized or
experimented in this work for building and optimizing a statistical model. Fig. 1-3
illustrates how these techniques contribute to the modeling process. An extended chart
of related techniques is also given in Fig. 1-4.
1.6.1 Feature Selection
A model can be thought of as a mapping between a set of input features and one or
more output responses. When the model of a process is unknown, it is usually unknown
which features affect the output of that process. For example, assume that a mobile robot
is provided by a battery and some sensors: humidity, temperature, gyroscope,
accelerometer, and speedometer. Assume that it is required to find the battery decay rate
as a function of sensor readings as shown in Fig. 1-2. Obviously, the decay rate does not
change with all sensor readings. The purpose of feature selection is to identify which
features (i.e. sensor readings) are relevant to the output (i.e. battery decay rate). Thus, the
modeling process tries to map the output to only a subset of all features. This makes
modeling faster, less complicated, and more accurate. An introduction to feature selection
is provided in [24].
Fig. 1-1: Asimple IoT system

8
In general, any feature-selection method tends to select the features that have:
 Maximum relevance with the observed feature (output)
 Minimum redundancy to (relation with) other input features
Mutual Information and correlation are detailed more in Appendix A.1 and
Appendix A.2 respectively.
Fig. 1-2: A model of battery-decay-rate
Sensor readings Battery Consumption Decay rate
Fig. 1-3: mathematical modeling techniques utilized or experimented in this work
Feature
Selection
• Mutual
Information
• Density
Estimation
• Histogram
• Kernel
• Correlation
• Pearson
Correlation
• Distance
Correlation
• Linear
• Polynomial
• Logarithmic
• Exponential
Parameter
Estimation
(Optimization)
• CMA-ES
• Nelder-Mead
• BOBYQA
• Powell
Model
Evaluation
• Least Absolute
Deviations (LAD)
Model Selection
(Functional Form)

9
Fig. 1-4: Modeling-related technique

10
1.6.2 Modeling
A model of a process is the relation between its inputs and its output. This relation
comprises:
 Input features (Independent variables)
 Observed output (Dependent variable)
 Parameters
For example, in Eq. (1-1), the inputs are x1 and x2, the output is y, and the
parameters are a, b, c, d, and e.
𝑦 = 𝑎. 𝑥1
2
+ 𝑏. 𝑥2 + 𝑐. 𝑒 𝑑.𝑥1 + 𝑒 (1-1)
The modeling process consists of three main steps:
1. Model Selection
2. Model Optimization
3. Model Evaluation
The following subsections describe these steps.
1.6.2.1 Model Selection
The first step in formulating a model equation is selecting its functional form,
namely the form of the function that represents the process. For example, the formula
(1-1) consists of a quadratic function, a linear function, an exponential function, and a
constant. Making such selection depends on experience and trials. Assuming having
experience about the modeled process, trials must be conducted to find the most suitable
functional form. In this work, Genetic Algorithms searches for a good functional form as
described in section 3.2.4.2.
1.6.2.2 Model Optimization
After selecting a suitable functional form of a model, the function of the
optimization stage is to estimate the values of the parameters that best fit the model to
the real process. As this stage is a main part of this work, it is described in detail in section
2.6.
1.6.2.3 Model Evaluation
The function of the model evaluation stage is to evaluate optimized models. This
allows for selecting the fittest model from among the available ones. For example,
different functional forms (polynomial, linear, logarithmic … etc.) can be optimized for
modeling a process, and the best one can then be chosen. The meaning of best depends

11
on the model-evaluation method. For example, the residual sum of squares (RSS) adds
up the squares of the differences between the observed (real) output and the predicted
(modeled) output. A different method is Pearson Correlation Coefficient (PCC),
described in Appendix B.1, which calculates the correlation between the real output and
the predicted output. The simplest method is the Least Absolute Deviations (LAD),
which is used in this work.
Organization of the Thesis
This thesis is organized as follows: Chapter 2 gives a background about the
underlying theories of this work. It overviews the society of mind theory and briefly
summarizes Evolutionary Computation (EC), Evolutionary Algorithms (EA), and
Genetic Algorithms (GA). It next explains in detail the mathematical optimization
problem. Two optimization methods are then explained: Covariance Matrix Adaptation
(CMA) and Nelder-Mead method. Finally, the Robocode game is presented as the
benchmark of this work.
Chapter 3 explains in detail the structure of the proposed agent (HICMA). It
comprises five sections that describe the five agents of HICMA. Chapter 4 introduces the
experiments and results of HICMA as a Robocode agent. Finally, Chapter 5 discusses
the conclusions and possible future works.

13
Chapter 2: Background
Introduction
This chapter overviews the basic disciplines behind this work. It is organized as
follows. Section 2.2 briefly overviews the society of mind theorem and how it is utilized
and extended in this work. Section 2.3 generally overviews evolutionary computation.
Section 2.4 reviews evolutionary algorithms. Section 2.5 overviews Genetic Algorithms
(GA). Section 2.6 defines the optimization problem. Section 2.7 explains solving
optimization problems using the evolution strategies techniques focusing on the
Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Section 2.8 explains the
Nelder-Mead algorithm combined in this work with CMA-ES to form a hybrid
optimization technique, which is the main engine of the proposed system.
The relations between the aforementioned disciplines and similar ones are depicted in
Fig. 2-1. The disciplines used in this work are bounded by double outlines. This map
provides a good reference to different substitutes that can be used for extending this work
in the future.
The Society of Mind
2.2.1 Introduction
The society of mind theory was introduced by Marvin Minsky in 1980 [2]. It tries
to explain how minds work and how intelligence can emerge from non-intelligence. It
envisions the mind as a number of many little parts, each mindless by itself, each part is
called agent. Each agent by itself can only do some simple thing that needs no mind or
thought at all. Yet when these agents are joined in societies, in certain very special ways,
this leads to true intelligence. The agents of the brain are connected in a lattice where
they cooperate to solve problems.
2.2.2 Example – Building a Tower of Blocks
Imagine that a child wants to build a tower with blocks, and imagine that his mind
consists of a number of mental agents. Assume that a “builder” agent is responsible for
building towers of blocks. The process of building a tower is not that simple. It involves
other sub-processes: choosing a place to install the tower, adding new blocks to the tower,
and deciding whether the tower is high enough. It may be better to break up this complex
task into simpler ones and dedicate an agent to each one such as in Fig. 2-2.
Again, adding new blocks is too complicated for the single agent “add” to
accomplish. It would be helpful to break this process into smaller and simpler sub-
processes such as finding an unused block, getting this block, and putting it onto the
tower as shown in Fig. 2-3.

14
Fig. 2-1: AI Disciplines map

15
Fig. 2-3: Sub-agents of add agent
In turn, the agent get can be broken up into: “grasp” sub-process that grasps a
block, and “move” sub-process that moves it to the top of the tower. Generally, when an
agent is found to have to do something complicated, it is replaced with a sub-society of
agents that do simpler tasks.
It is clear that none of these agents alone can build a tower, and even all of them
cannot do unless the interrelations between them are defined, that is, how every agent is
connected with the others. In fact, an agent can be examined from two perspectives: from
outside and from inside its sub-society. If an agent is examined from the outside with no
idea about its sub-agents, it will appear as if it knows how to accomplish its assigned
task. However, if the agent’s sub-society is examined from the inside, the sub-agents will
appear to have no knowledge about the task they do.
To distinguish these two different perspectives of an agent, the word agency is used
for the system as a black box, and agent is used for every process inside it.
A clock can be given as an example. As an agency, if examined from its front, its
dial seems to know the time. However, as an agent it consists of some gears that appear
to move meaninglessly with no knowledge about time.
To sense the importance of viewing a system of agents as an agency, one can
examine a steering wheel of a car. As an agency, it changes the direction of the car
without taking into account how this works. However, if it disassembled, it appears as an
agent that turns a shaft that turns a gear to pull a rod that shifts the axle of a wheel.
Bearing this detailed view in mind while driving a car can cause a crash because it
requires too long time to be realized every time the wheels are to be steered.
In summary, to understand a society of agents, the following points must be known:
1. How each separate agent works
2. How each agent interacts with other agents of the society
3. How all agents of the society cooperate to accomplish a complex task
Fig. 2-2: A builder agent with its sub-sgents

16
This thesis extends the society of mind theory to present a novel model of an
intelligent agent that behaves like a human. The previous points are expanded so that the
entire society evolves according to the environmental changes. 2.9 describes the
proposed model and how it adopts the society of mind theory.
Evolutionary Computation (EC)
Evolutionary Computation (EC) is a subfield of Artificial Intelligence (AI) that
solves stochastic optimization problems. A stochastic optimization problem is the
problem of finding the best solution from all possible solutions by means of a stochastic
search process, that is, a process that involves some randomness. EC methods are used
for solving black-box problems where there is no information to guide the search process.
An EC method tests some of the possible solutions, trying to target the most promising
ones. EC methods adopt the principle of evolution of generations. They generate a
population of candidate solutions and evaluate every individual solution in this
population. Then, a new generation of, hopefully fitter, individuals is generated. The
evolution process is repeated until a satisfying result is obtained.
Evolutionary Algorithms (EA)
An Evolutionary Algorithm (EA) is an Evolutionary Computation subfield that
adopts the principle: survival of the fittest. EA methods are inspired from the evolution
in nature, a population of candidate solutions is evaluated using an evaluation function,
and the fittest individuals, called parents, are granted better chance to reproduce the
offspring of the next generation. Reproduction is done by recombining pairs of the
selected parents to produce offspring. The offspring are then mutated in such a way that
they hold some of the traits inherited from their parents in addition to their own developed
traits. The rates of recombination and mutation are selected to achieve a balance between
utilization of parents’ good traits and exploration of new traits. For improving the fitness
over generations, the process of reproduction ensures that the good traits are not only
inherited over generations but also developed by the offspring. After a predetermined
termination condition is satisfied, the evolution process is stopped. The fittest individual
in the last generation is then selected as the best solution of the given problem. Fig. 2-4
shows the general scheme of evolutionary algorithms.

17
The search process of an evolutionary algorithm tries to cover the search-space of
the problem, which is the entire range of possible solutions, without exhaustively
experimenting every solution. The fitness of all individuals in the search-space can be
represented by a fitness landscape as shown in Fig. 2-5. The horizontal axes represent
the domain of candidate solutions (i.e. individuals) and the vertical axis represents their
fitness. The optimum solution within a limited sub-range of the search-space is called a
local optimum, while the absolute optimum solution over the entire search-space is called
the global optimum. An EA tries to find the global optimum and not to fall into one of
the local optima.
Genetic Algorithms (GA)
Genetic Algorithm (GA) is a type of evolutionary algorithms that was first
introduced by John Holland in the 1960s and were further developed during the 1960s
and the 1970s [25]. It is designed to imitate the natural evolution of generations. It
encodes the solution of a problem into the form of chromosomes. A population of
candidate-solution individuals (also called phenotypes) is generated, where each solution
is represented by one or more chromosomes (also called genotypes). In turn, each
chromosome consists of a number of genes. GA then selects parents to reproduce from
the fittest individuals in the population. Reproduction involves crossover of parents to
produce offspring, and mutation of offspring’s genes. The newly produced offspring
population represents the new generation, which is hopefully fitter, on average, than the
previous one. The evolution process is repeated until a satisfying fitness level is achieved
Fig. 2-5: A 2D fitness landscape
Population
Parents Offspring
Fitness
function
Termination
Condition
Selection
Initialization
Crossover
&
Mutation
Evaluation
Solution
Fig. 2-4: A general scheme of evolutionary algorithms
Global optimumLocal optimum

18
or a maximum limit of generations is exceeded. The general GA flowchart is illustrated
in Fig. 2-6. Every block is briefly explained next.
Encoding
Encoding is how a candidate solution is represented by one or more chromosomes.
The most common type of chromosomes is the binary chromosome. The genes of this
chromosome can hold either 0 or 1.
Initial Population
An initial population of candidate solutions is generated to start the evolution
process. It is often generated randomly, but sometimes candidate solutions are seeded
into it.
Evaluation
The candidate solutions, represented by individuals, are evaluated by a fitness
function. The fitness of an individual determines its probability to be selected for
reproduction.
Selection
Selection is the process of choosing the parents of the next generation from among
the individuals of the current generation. It is a critical part of GA, as it must ensure a
good balance between exploitation and exploration. Exploitation is giving the fittest
individuals better chance to survive over generations while exploration is searching for
Start
Stop
New Generation
No
Yes
Parents
Offspring
Encoding
Initializing Parameters
Generating Initial Population
Evaluating Population
Selection
Crossover
Mutation
Terminate?
Fig. 2-6: Basic GA flowchart

19
new useful individuals. The tradeoff between exploitation and exploration is critical. Too
much exploitation may lead to a local optimum and too much exploration may greatly
increase the number of required generations to find a good solution of the problem.
Selection methods include roulette-wheel, stochastic universal sampling, rank, and
tournament selection.
Crossover
Crossover is the recombination of two, or more, selected parents to produce
offspring. It is performed by dividing parents’ chromosomes into two or more portions,
and randomly copying every portion to either offspring.
Mutation
Mutation is the adaptation of an individual’s genes. For example, a binary gene is
mutated by flipping it with a specified probability.
Termination
The evolution process is repeated until a termination criterion is satisfied.
Common termination criteria are:
1. A satisfying solution is found
2. A maximum limit of generations is reached.
3. Significant fitness improvement is no longer achieved.
Optimization Problem
2.6.1 Introduction
Optimization is the minimization or the maximization of a non-linear objective
function (fitness function, loss function). An objective function is a mapping of n-
dimension input vector to a single output value as shown in Fig. 2-7. That is, if y = f(x)
is a non-linear function of n-dimension vector x ∈ ℝ(n) , then minimizing f(x) is finding
the n values of x that gives the minimum value of y ∈ ℝ.
Fig. 2-7: An objective function
The optimum value (maximum or minimum) of a function within a limited range
is called local optimum while the optimum value of the function over its domain is called
global optimum. A function can have several local optima, but only one global optimum.
An example of local and global minima is shown in Fig. 2-8.
x(n)
f(x) y
X

20
Fig. 2-8: Local and global minima
2.6.2 Mathematical Definition
Given a number n of observations (samples) of p variables (e.g. sensor readings)
forming a 2-D matrix M with dimensions n x p, such that column j represents the
observations (samples) of the jth
variable, and row i represents the ith
observation
(sample).
M =[
𝑚1,1 𝑚1,2 ⋯ 𝑚1,𝑝
⋮ ⋮ ⋮ ⋮
𝑚 𝑛,1 𝑚 𝑛,2 ⋯ 𝑚 𝑛,𝑝
], where mi,j is the ith
sample of the jth
variable (sensor)
The p variables may correspond to, for example, a number of sensors attached to a
robot such as (thermometers, speedometers, accelerometers … etc.) which represent the
senses of that robot. For example, suppose that a robot plays the goalkeeper role in a
soccer game. Like a skillful human goalkeeper, the robot should predict the future
location of the ball as it approaches the goal to block it in time. Assume that the motion
of the ball over time is modeled by a quadratic function of time, that is:
Location (x, y, z) = f (t) (2-1)
Equivalently:
Location (x) ≡ 𝑓𝑥(𝑥) = 𝑎 𝑥. 𝑡2
+ 𝑏 𝑥. 𝑡 + 𝑐 𝑥 (2-2)
Location (y) ≡ 𝑓𝑦(𝑦) = 𝑎 𝑦. 𝑡2
+ 𝑏 𝑦. 𝑡 + 𝑐 𝑦 (2-3)
Location (z) ≡ 𝑓𝑧(𝑧) = 𝑎 𝑧. 𝑡2
+ 𝑏 𝑧. 𝑧 + 𝑐 𝑧 (2-4)
Let the axes of the playground be as illustrated in Fig. 2-9. The robot can use
equation (2-3) to predict the time T at which the ball will arrive at the goal line (this
prediction can be done using optimization). Then the robot can use equations (2-2) and
(2-4) to predict the location of the ball at the goal line (xg, yg, zg) at time T, and can move
to there in time.
y=f(x)
global minimumlocal minimum xlocal minimum

21
Solving this problem (blocking the ball) is done as follows:
1. Get n samples (observations) of the locations of the moving ball (x, y, and z) at
fixed intervals of time (t).
2. Store the (x, t) samples in matrix Mx , where x and t represent the p variables
(sensors).
3. Similarly, store (y, t), and (z, t) samples in My and Mz
The three matrices will appear like these:
Mx =[
𝑥0 𝑡0
⋮ ⋮
𝑥 𝑛−1 𝑡 𝑛−1
], My =[
𝑦0 𝑡0
⋮ ⋮
𝑦 𝑛−1 𝑡 𝑛−1
], Mz =[
𝑧0 𝑡0
⋮ ⋮
𝑧 𝑛−1 𝑡 𝑛−1
]
The function of the optimization strategy is to find the values of functions’
parameters (ax, bx, cx, ay, by …) in equations (2-2), (2-3), and (2-4) that make every
function optimally fits into the sampled data (observations). Finding such values is a
search problem, where the search space is the range of all possible values of the
parameters. This is what the following two steps do. To minimize, for example, 𝑓𝑥(𝑥):
4. Re-organize the given equation so that the left-hand side equals zero:
𝑓𝑥(𝑥) − 𝑎 𝑥. 𝑡2
+ 𝑏 𝑥. 𝑡 + 𝑐 𝑥= ex (2-5)
Where ex is the error of the optimization process. This equation is called
the objective function, cost function, or loss function.
5. Search for the values of the parameters (ax, bx, and cx) that minimize the error ex.
This is a 3-D search problem as the algorithm searches for the optimum values of
three parameters. The optimization (minimization) of the function 𝑓𝑥 is done as
follows:
a. Guess a value for (ax0, bx0, cx0)
b. Substitute into (2-5) for (ax, bx, cx) by (ax0, bx0, cx0), and for 𝑓𝑥(𝑥) and t
by x0 and t0 respectively into the matrix Mx , and calculate the error ex0
Fig. 2-9: An example application of mathematical optimization

22
x0 − 𝑎 𝑥0. 𝑡0
2
+ 𝑏 𝑥0. 𝑡0 + 𝑐 𝑥0 = 𝑒 𝑥0 (2-6)
c. Repeat steps (b) and (c) for all of the n observations and accumulate the
errors into ex
𝑒 𝑥 = ∑|xi − 𝑎 𝑥𝑖. 𝑡𝑖
2
+ 𝑏 𝑥𝑖. 𝑡𝑖 + 𝑐 𝑥𝑖|
𝑛−1
𝑖=0
(2-7)
Where |X| is the absolute value of X.
The optimization algorithm tries to improve its guesses of the parameters (a, b, and
c) over the iterations to find the optimum values in the search space that minimizes the
error ex.
Fig. 2-10 visualizes a 2D search space (two parameters), where the horizontal axes
represent the domains of the two parameters and the vertical axis represents the value of
the objective function (the error). The optimization algorithm should start from any point
on the search space and move iteratively towards the minimum error (i.e. the global
optimum).
Searching the search space for the optimum (minimum or maximum) solution
depends on the optimization algorithm. The next section explains evolution-strategy
optimization algorithms.
After optimizing a function, it is saved for usage as a model for an observed
phenomenon or an environmental object. For example, the goalkeeper robot can keep the
optimized functions (1), (2), and (3) for similar future shoots (Assuming that all shoots
have similar paths).
Fig. 2-10 Minimization example in a 2D search space

23
Evolution Strategies
Evolution Strategies (ESs) are optimization techniques belonging to the class of
Evolutionary Computation (EC) [26], [27]. An evolution strategy searches for the
optimum solution in a search-space similarly as Genetic Algorithms (GA). It generates a
population of individuals representing candidate solutions (i.e. vectors of the parameters
to be optimized). Every individual in the population is then evaluated by a fitness function
that measures how promising it is for solving the given problem. The fittest individuals
are then selected and mutated to reproduce another generation of offspring. This process
is repeated until a termination condition is met. Mutation represents the search steps that
the algorithm takes in the search-space; it is done by adding normally distributed random
vectors to the individuals.
The fitness of all individuals in the search space can be represented by a fitness
landscape as shown in Fig. 2-5.The horizontal axes represent candidate solutions
(individuals) and the vertical axis represents their fitness. The goal of the optimization
algorithm is to converge to the global optimum solution with the minimum search costs
represented by the number of objective function evaluations.
In optimization problems, the search space is the domain of the parameters of the
optimized function (minimized or maximized) which is also used as an objective
function. For example, in maximization problems, the goal is to find the values of the
parameters that maximizes a function, so the value of that function represents the fitness
of the given set of parameters, the higher the value of the function, the fitter the solution.
2.7.1 Basic Evolution Strategies
The basic evolution strategy can be defined by:
(µ/ρ, λ)-ES and (µ/ρ + λ)-ES
Where:
 µ is the number of parents (fittest individuals) in the population
 ρ is the number of parents that produce offspring
 λ is the number of generated offspring
The “,” means that the new µ parents are selected from only the current λ offspring,
this is called comma-selection, while the “+” means that the new parents are selected
from the current offspring and the current parents, this is called plus-selection.
For example, a (4/2, 10)-ES selects the fittest 4 parents from the current population
(10 individuals) and randomly mutates two of them to generate 10 new offspring. The 4
parents are selected from the current 10 offspring only.
On the other hand, a (4/2 + 10)-ES selects the fittest 4 parents from the current
population (10 individuals) along with the 4 parents of the previous generation. This is a
type of elitist selection where the elite individuals are copied to the next generation
without mutation.

24
The basic steps of an evolutionary strategy are:
1. Generating candidate solutions (Mutating parent individuals)
2. Selecting the fittest solutions
3. Updating the parameters of the selected solutions
Fig. 2-11 illustrates the previous three steps.
Fig. 2-11: Basic steps of evolution strategies
An ES individual x is defined as follows:
x = [y, s, F(y)] (2-8)
Where:
 y is the parameter vector to be optimized, it is called object parameter vector
 s is a set of parameters used by the strategy, it is called strategy parameter vector
 F(y) is the fitness of y
Strategy parameters s are the parameters used by the strategy during the search
process, they can be thought of as the tools used by the strategy, they are similar to a
torchlight a person may use for finding an object (optimum solution) in a dark room
(search space). The most important strategy parameter is the step-size described later.
Notice that an evolution strategy not only searches for the optimum solution of y, but also
searches for the optimum strategy parameter s. This is similar to trying several types of
torchlights to find the best one to use for finding the lost object in the dark room.
Obviously, finding the optimum strategy-parameter vector speeds-up the search process.
Generating Selecting Updating

25
The basic Algorithm of a (µ/ρ
+
, λ)-ES is given in Algorithm 2-1:
Algorithm 2-1: A basic ES algorithm
1. Initialize the initial parent population Pµ = {p1, p2 … pµ}
2. Generate initial offspring population Pλ = {x1, x2 … xλ} as follows:
a. Select ρ random parents from Pµ
b. Recombine the selected ρ parents to form an offspring x
c. Mutate the strategy parameter vector s of the offspring x
d. Mutate the object parameter vector y of the offspring x using the
mutated parameter set s
e. Evaluate the offspring x using the given fitness function
f. Repeat for all λ offspring
3. Select the fittest µ parents from either
 {Pµ∪ Pλ} if plus-selection (µ/ρ + λ)-ES
 {Pλ} if comma-selection (µ/ρ , λ)-ES
4. Repeat 2 and 3 until a termination condition is satisfied
Normally distributed random vectors are used to mutate the strategy-parameter set
s and the object-parameter vector y at steps 2.c and 2.d respectively. The mutation process
is explained in more detail in the next section.
Fig. 2-12 visualizes the search process described above for solving a 2D problem
(i.e. Optimizing two parameters where the object parameter vector y⊂ℝ(2) ). Both µ and
ρ equals 1, and λ equals 100. That is, one parent is selected to produce 100 offspring.
Every circle represents the one-σ line of the normal distribution at a generation. The
center of every circle is the parent that was mutated by the normally distributed random
vectors to produce the rest of the population represented by ‘.’, ‘+’ and ‘*’ marks. The
black solid line represents the direction of search in the search space.
A one-σ line is the horizontal cross-section of the normal-distribution 2D curve at
σ (standard deviation). It is an ellipse (or a circle) surrounding 68.27% of the samples.
This ellipse is useful in studying normal distributions. In Fig. 2-12, the one-σ lines are
unit circles because both of the two sampled variables has a standard deviation σ = 1.
Fig. 2-12: Visualization of the search process of a (1/1,100)-ES
Generation g Generation g+1 Generation g+2

26
Fig. 2-13 shows the one-σ ellipse (circle in this case) of a 2D normal distribution
represented on a 3D graph.
2.7.2 Step-size Adaptation Evolution Strategy (σSA-ES )
The goal of an optimization algorithm is to take steps towards the optimum; the
faster an optimization algorithm reaches the optimum the better it is. Clearly, if the
optimum is far from the starting point, it is better to take long steps towards the optimum
and vice versa. In an optimization algorithm, the step size is determined by the amount
of mutation of parent individuals. That is, high mutation of a parent causes its offspring
to be highly distinct from it and thus very far from it in the search space.
Usually, there is no detailed knowledge about the good choice of the strategy parameters
including the step-size. Therefore, step-size adaptation evolution strategies adapt the
mutation strength of parents at every generation in order to get the optimum step-size
that quickly reaches the optimum. As mutation is done by adding a normally distributed
random vector to the parent, the standard deviation σ of that random vector represents
the mutation strength. A large value of σ means that the random vector will more likely
hold larger absolute values and thus more mutation strength. The principle of step-size
adaptation is illustrated in Fig. 2-14, where the standard deviation in (a) equals 1.0 and
in (b) equals 3.0. It is clear that the step-size in (b) is larger than in (a).
An individual of a σ-SA strategy is defined as follows:
a = [y, σ, F(y)] (2-9)
Fig. 2-13: One-Sigma ellipse of bivariate normal distribution N(0,I) [µ=0, σ=I]

27
Recall the general definition of an ES individual in Eq. (2-8). It is clear that σ is
the only element in the strategy parameter vector s. σ is the mutation strength parameter
that would be used to mutate the parameter vector y if that individual is selected for
generating offspring. An offspring generated from a parent is defined as follows:
x
t
i
)1( 
= {
σi
(t+1)
← σ(t)
. eτ𝑁 𝑖(0,1)
𝑦𝑖
(𝑡+1)
← 𝑦𝑖
(𝑡)
+ σ(t)
. 𝑁𝑖(0, 𝐼)
𝐹𝑖 ← F(yt)
(2-10)
As shown in eq. (2-10), the mutation strength parameter σ is self-adapted every
generation t. The learning parameter τ controls the amount of self-adaptation of σ per
generation. A typical value of τ is 1/ 2n [28]. It is obvious that the step-size σ in Eq.
(2-10) is a parameter of every individual in the population, and because the fittest
individuals are selected to produce the offspring of the next generation, the best step sizes
are inherited by the new offspring. This enables the algorithm to find the optimum step
size that quickly reaches the optimum solution.
The exponential function in Eq. (2-10) is usually used in evolution strategies, but
other functions can also be used to mutate the step-size [29]
N(0,1) is a normally distributed random scalar (i.e. a random number sampled from
a normal distribution with mean = 0, and standard deviation = 1). N(0, I) is a normally
distributed random vector with the same dimensions of the optimized parameter vector
s. Fig. 2-15 shows a (a) normal distribution of 2D points (i.e. Two object parameters) and
(b) the probability density function (PDF) of every parameter. It is obvious that the
density of samples is high around the mean and decreases as we move away. That is, the
number of random samples near the mean is larger than far from it.
(a) (b)
Fig. 2-14: Two random probability distributions with (a) σ = 1.0 and (b) σ = 3.0. The
circles are the one-sigma elipses

28
Normal distributions are used for the following reasons [27]:
1. Widely observed in nature
2. The only stable distribution with finite variance, that is, the sum of independent
normal distributions is also a normal distribution. This feature is helpful in the
design and the analysis of algorithms
3. Most convenient way to generate isotropic search points, that is, no favor to any
direction in the search space
2.7.3 Cumulative Step-Size Adaptation (CSA)
CSA-ESs updates the step-size depending on the accumulation of all steps the
algorithm has made. The importance of a step decreases exponentially with time [30].
The goal of CSA is to adapt the mutation strength (i.e. step-size) such that the correlations
between successive steps are eliminated [30], [31]. Correlation represents how much two
vectors agree in direction. Highly correlated steps are replaced with a long step, and low-
correlated steps are replaced with a short step. The concept of Cumulative Step-seize
Adaptation is illustrated in Fig. 2-16. The thick arrow represents step adaptation
according to the cumulation of the previous six steps. Every arrow represents a transition
of the mean m of the population.
(a) (b)
Fig. 2-15: A 2D normal distribution (a) 2D vector of points and (b) two 1D histograms
Fig. 2-16: The principle of Cumulative Step-size Adaptation (CSA),

29
The CSA works as follows:
1. Get λ random normally distributed samples around the mean solution (i.e. the
parent of the population):
𝑥𝑖 = 𝑚 𝑡 + 𝜎𝑡. 𝑁(0, 𝐼)
Equivalently:
𝑥𝑖 = 𝑁(𝑚 𝑡, 𝜎𝑡
2
)
Where mt is the solution at iteration t, and σt is the standard deviation of the
selection (i.e. the step-size). That is, select λ random samples from around the
solution mt with probability decreasing as we move away from mt. The samples
appear as shown in Fig. 2-15-a.
2. Evaluate the λ samples ,and get the fittest µ of them
3. Calculate the average Zt of the fittest µ samples as follows:
𝑍𝑡 =
1
𝜇
∑ 𝑥𝑖
𝜇
𝑖=1
4. Calculate the cumulative path:
Pct+1 = (1-c)Pct + μ.c(2-c) Zt , where 0 < 𝑐 ≤ 1 (2-11)
The parameter c is called the cumulation parameter; it determines how rapidly
the information stored in Pct fades. That is, how long (over generations) the effect
of a step size at generation t lasts. The typical value of c is between
1
√n
and
1
n
.
It is chosen such that the normalization term µ.c (2-c) normalizes the
cumulative path. That is, if Pt follows a normal distribution with a zero mean and
a unit standard deviation (i.e. Pt ∼ N(0 ,I)), Pt+1 also follows N(0 ,I) [30].
5. Update the mutation strength (i.e. step-size):
𝜎𝑡+1 = 𝜎𝑡 exp (
𝑐
2𝑑 𝜎
(
‖𝑃𝑐𝑡+1‖2
𝑛
− 1)) (2-12)
Where ‖𝑋‖ is the Euclidean norm of the vector X ∈ℝn :
‖𝑋‖ = √𝑥1
2
+ 𝑥2
2
+ ⋯ + 𝑥 𝑛
2
The damping parameter dσ determines how much the step-size can change. It is
set to 1 as indicated in [30], [31].

30
The fittest µ parents are then selected and mutated by the new step-size σt+1 to
form a new population of λ offspring.
6. Repeat all steps until a termination condition is satisfied.
2.7.4 Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
2.7.4.1 Introduction
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [32] is a state-of-the-
art evolution strategy. It extends the CSA strategy described in section 2.7.3, which
adapts the step-size σ every generation and uses the updated step-size to mutate the parent
solution. CMA-ES differs from CSA in that it uses a covariance matrix C, instead of the
identity matrix I, to generate the random mutating vectors. This means that the different
components of the random vector are generated from normal distributions with different
standard deviations. That is, every component has a different step size. For example,
imagine the problem of optimizing a 2D vector, and assume that the optimum solution is
{10, 30} and the initial guess is {0, 0}. It is obvious that the optimum value of the second
parameter is farther than the first one. Therefore, it is better to take larger steps in the
direction of the second one. This is what CMA-ES does and this is why it requires fewer
generations to find the optimum solution. In brief, CMA is more directive than SA. Fig.
2-17 shows the same population generated from a parent at the origin. In (a) the
covariance matrix is the identity matrix I, and in (b) the covariance matrix equals[
1 1
1 3
].
Fig. 2-18 shows a 2D normal distribution shaped by a covariance matrix C. The
black ellipse is the one-σ ellipse of the distribution.
(a) (b)
Fig. 2-17: A population of (a) Step-size Adaptation and (b) Covariance Matrix Adaptation

31
In addition to using a covariance matrix to adapt the shape of the mutation
distribution, the covariance matrix itself is set as a strategy parameter. Consequently, it
is adapted every generation so that the mutation distributions could adapt to the shape of
the fitness landscape and converge faster to the optimum. The operation of CMA-ES is
further illustrated in Fig. 2-19, where the population concentrate on the global optimum
after six generations. The ‘*’ symbols represent individuals, and the dashed lines
represent the distribution of the population. Background color represents the fitness
landscape, where darker color represents lower fitness.
Fig. 2-18: A 2D normal distribution N(0,C) [µ=0, σ=C]
Fig. 2-19: Optimization of 2D problem using CMA-ES.
C = [
1 1
1 3
]
Generation 1 Generation 2 Generation 3
Generation 4 Generation 5 Generation 6

32
2.7.4.2 CMA-ES Algorithm
The basic (µ/µ, λ) CMA-ES works as follows [27]:
Initialization:
I.1 λ Number of offspring (i.e. population size)
I.2 µ/ µ
Number of parents / number of solutions involved in
updating m, C, and σ
I.3 m ∈ ℝ(nx1)
n-dimension Initial solution (the mean of the population)
I.4 C = I(nxn)
Initial covariance matrix = Identity matrix
I.5 σ ∈ ℝ+
𝑛𝑥1 Initial step size
I.6 cσ ≈ 4/n Decay rate for evolution (cumulation) path for step-size σ
I.7 dσ ≈ 1 Damping parameter for σ change
I.8 cc ≈ 4/n Decay rate for evolution (cumulation) path of C
I.9 c1 ≈ 2/n2
Learning rate for rank-one update of C
I.10 cµ ≈ µw/n2
Learning rate for rank-µ update of C
I.11 Pσ = 0 Step-size cumulation path
I.12 Pc = 0 Covariance-matrix cumulation path
The constant n is the number of state parameters (i.e. the parameters of the objective
function). The left column maps these parameters with the code given in Table 2-1.
Generation Loop: Repeat until a termination criterion is met:
1. Generate λ offspring by mutating the mean m
xi = m + yi, 0 < i ≤ λ
Where: yi is an (n x 1) random vector generated according to a normal distribution
with zero mean and covariance C [yi ~ Ni (σ2
, C)] as shown in Fig. 2-17.b and
Fig. 2-18.
2. Evaluate the λ offspring by the fitness function
F(xi) = f(xi)
3. Sort the offspring by fitness so that:
f(x1:λ) < f(x2:λ) < … < f(xλ:λ)
Where 𝑥1:𝜆 is the fittest individual in the population.
4. Update the mean m of the population
𝑚 = ∑ 𝑤𝑖. 𝑥𝑖:𝜆
µ
𝑖=1
m = m + σ. yw

33
Where:
𝑦 𝑤 = ∑ 𝑤𝑖. 𝑦𝑖:𝜆
µ
𝑖=1
Hence:
m = 𝑚 + σ ∑ 𝑤𝑖. 𝑦𝑖:𝜆
µ
𝑖=1
Where xi:λ is the ith
best individual in the population, and the constants wi are
selected such that [27]:
w1 ≥ w2 ≥ w3 ≥ ⋯ ≥ wµ ≥ 0,
∑ 𝑤𝑖
µ
𝑖=1 = 1,
μ 𝑤 =
1
∑ 𝑤𝑖
2µ
𝑖=1
≈
𝜆
4
5. Update step-size cumulation path Pσ :
Pσ = (1-cσ)Pσ + 1-(1-cσ)2
µw C yw
Where Pσ ∈ ℝ(nx1)
The square root of the matrix C can be calculated using Matrix Decomposition
[32] or using Cholesky Decomposition [26].
6. Update the covariance-matrix cumulation path Pc :
Pc = (1-cc)Pc + 1-(1-cc)2
µw yw
Where Pc ∈ ℝ(nx1)
7. Update the step-size σ:
𝜎 = 𝜎. exp (
𝑐 𝜎
𝑑 𝜎
(
‖𝑃𝜎‖
𝐸‖𝑁(0, 𝐼)‖
− 1))
According to [30] this formula can be simplified to:
𝜎 = 𝜎. exp (
𝑐 𝜎
2𝑑 𝜎
(
‖𝑃𝜎‖2
𝑛
− 1))
Where ||X|| is the Euclidean norm of the vector X.

34
8. Update the covariance matrix C:
𝐶 = (1 − 𝑐1)𝐶 + 𝑐1 𝑃𝑐 𝑃𝑐
𝑇
+ (1 − 𝑐 𝜇)𝐶 + 𝑐 𝜇 ∑ 𝑤𝑖 𝑦𝑖:𝜆 𝑦𝑖:𝜆
𝑇
𝜇
𝑖=1
The expression (1 − 𝑐1) 𝐶 + 𝑐1 𝑃𝑐 𝑃𝑐
𝑇
is called “rank-one update”. It reduces the
number of function evaluations. The constant c1 is called “rank-one learning
rate”.
The expression (1 − 𝑐 𝜇)𝐶 + 𝑐 𝜇 ∑ 𝑤𝑖 𝑦𝑖:𝜆 𝑦𝑖:𝜆
𝑇𝜇
𝑖=1 is called “rank-µ update”. It
increases the learning rate in large populations and can reduce the number of
necessary generations. The constant cµ is the “rank-µ” learning rate [27].
Termination:
Some example termination criteria used in [32] are:
 Stop if the best objective function values of the most recent 10+
30n
λ
generations
are zero
 Stop if the average fitness of the most recent 30% of M generations is not better
than the average of the first 30% of M generations. Where M is 20% all
generations, such that 120 +
30n
λ
≤ M ≤ 20,000 generations.
 Stop if all of the best objective function values over the last 10+
30n
λ
generations
are below a certain limit. A common initial guess is of that limit is 10-12
 Stop if the standard deviations (step-sizes) in all coordinates are smaller than a
certain limit. A common limit is 10-12
of the initial σ.
Usually, the algorithm is bounded to a limited search space, but in our experiments
it could find the global optimum even if the search space is unbounded (i.e. the domain
of a component of the solution vector is [-∞, ∞]).
A simple MATLAB/Octave CMA-ES code is given in Table 2-1. The left column
of the table maps the given code with the steps given above in the initialization stage.
Rank-one update Rank-µ update

35
Table 2-1: A simple CMA-ES code
%Initialization
I.1 lambda = LAMBDA; % number of offspring
I.2 mu = MU; % number of parents
I.3 yParent = INIT_SOL; % Initial solution vector
n = length(yParent); % Problem dimensions
I.4 Cov = eye(n); % Initial covariance matrix
I.5 sigma = INIT_SIGMA; % Initial sigma (step-size)
I.6 Cs = 1/sqrt(n); % Learning rate of step-size
I.8 Cc = 1/sqrt(n); % Decay rate of Pc
I.10 Cmu = 1/n^2; % Learning rate of C
I.11 Ps = zeros(n,1); % Step-size cumulation
I.12 Pc = zeros(n,1); % Cov. matrix cumulation
I.13
minSigma = 1e-3; %Min. step-size…
% … termination condition
% Generation Loop: Repeat until termination criterion
while(1)
SqrtCov = chol(Cov)'; % square root of cov. …
% … matrix
for l = 1:lambda; % generate lambda …
% … offspring
1
offspr.std = randn(n,1); % offspring σ
offspr.w = sigma*(SqrtCov*offspr.std); % σ C N(0, I) ≡ N(σ, C)
offspr.y = yParent + offspr.w; % Mutate the parent
2 offspr.F = fitness(offspr.y); % Evaluate the offspring
offspringPop{l} = offspr; % offspring complete
end; % end for
3
ParentPop = sortPop(offspringPop, mu); % sort pop. and take µ …
% … best individuals
4
yw = recomb(ParentPop); % Calculate yw
yParent = yParent + yw.w; % new mean (parent)
5
Ps=(1-Cs)*Ps+sqrt(mu*Cs*
(2-Cs))*yw.std;
% Update Ps
6 Pc=(1-Cc)*Pc+sqrt(mu*Cc*(2-Cc))*yw.w; % Update Pc
7
sigma=sigma*exp((Ps'*Ps - n)
/(2*n*sqrt(n)));
% Update step-size
8
Cov = (1-Cmu)*Cov + Cmu*Pc*Pc'; % Update cov. matrix
Cov = (Cov + Cov')/2; % enforce symmetry
% Termination
if (sigma < minSigma) % termination condition
printf("solution="); % The solution is…
disp(ParentPop{1}.y'); % … the first parent
break; % Terminate the loop
end; % end if
end % end while

36
The upper-case words, such as LAMBDA, are predefined constants. The function
fitness evaluates the candidate solutions. The function sortPop sorts the individuals by
fitness and extracts the best µ ones. The function recomb recombines the selected µ
parents to form a new parent for the next generation. A simple recombination is to
average the solution vector and the step-size vector of the selected µ parents.
MATLAB/Octave examples of these three functions are given in Table 2-2, Table 2-3,
and Table 2-4 respectively.
Table 2-2: An example of “fitness” function
function out = fitness(x)
out = norm(x-[5 -5]'); % The global optimum is at [5, -5]
end
Table 2-3: An example of “sortPop” function
function sorted_pop = sortPop(pop, mu);
for i=1:length(pop);
fitnesses(i) = pop{i}.F;
end;
[sorted_fitnesses, index] = sort(fitnesses);
for i=1:mu;
sorted_pop{i} = pop{index(i)};
end;
end
Table 2-4: An example of “recomb” function
function recomb = recomb(pop);
recomb.w = 0; recomb.std = 0;
for i=1:length(pop);
recomb.w = recomb.w + pop{i}.w;
recomb.std = recomb.std + pop{i}.std;
end;
recomb.w = recomb.w/length(pop);
recomb.std = recomb.std/length(pop);
end
The previous code snippets are modifications of the code provided in [26]. Codes
also for C, C++, Fortran, Java, MATLAB/Octave, Python, R, and Scilab are provided in
[33].

37
2.7.4.3 Advantages of CMA-ES
CMA-ES is efficient for solving:
 Non-separable problems 1
 Non-convex functions 2
 Multimodal optimization problems, where there are possibly many local optima
 Objective functions with no available derivatives
 High dimensional problems
2.7.4.4 Limitations of CMA-ES
CMA-ES can be outperformed by other strategies in the following cases:
 Partly separable problems (i.e. optimization of an n-dimension objective function
can be divided into a series of n optimizations of every single parameter)
 The derivative of the objective function is easily available (Gradient
Descend/Ascend is better)
 Small dimension problems
 Problems that can be solved using a relatively small number of function
evaluations (e.g. < 10n evaluations. Nelder-Mead may be better)
1
An n-dimensional separable problem can be divided into n 1-dimensional separate problems
2
A function is convex if the line segment between any two points lies above the curve of the function

38
Nelder-Mead Method
2.8.1 Introduction
Nelder-Mead method [34] is a non-linear optimization technique that uses a
heuristic search, that is, its solution is not guaranteed to be optimal. It is suitable for
solving problems where the derivatives of the objective function are not known or too
costly to compute. Normally, it is faster than the CMA-ES, but it easily falls in local
optima. This method uses the simplex concept described in the next section.
2.8.2 What is a simplex
A simplex is a geometric structure consisting of n+1 vertices in n dimensions.
Table 2-5 contains examples of simplexes:
Table 2-5: Simplexes in different dimensions
Dim. Shape Graph
0 Point
1 Line
2 Triangle
3 Tetrahedron
4 Pentachoron
2.8.3 Operation
To optimize an n-dimensional function (i.e. with n parameters), Nelder-Mead
algorithm constructs an (n+1) initial simplex and tries to capture the optimum point inside
it while reducing the size of the simplex. A simplex is similar to a team of police officers
chasing a criminal; every simplex point represents a police officer, the optimum solution
is the criminal, and Nelder-Mead is the plan the police officers follow to catch the
criminal. Selecting the initial simplex is critical and problem-dependent as a very small
initial simplex can lead to a local minimum. This is why Nelder-Mead method is usually
used only when a local optimum is satisfying such as its usage in the hybrid optimization
technique described in section 3.2.4.
After constructing the initial simplex, it is iteratively updated using four types of
operations. Fig. 2-20 illustrates these operations on a 2D simplex (triangle). The shaded

39
and the blank regions represent the simplex before and after the operation respectively.
P̅ is the mean of all points except for the worst. Ph is the highest (worst) point, Pl is the
lowest (best) point, P*
is the reflected point, P**
is the expanded or the contracted point.
Every operation is described next.
(a) reflection (b) expansion (c) contraction
(d) reduction
(resizing)
Fig. 2-20: Operations of Nelder-Mead algorithm
a) Reflection:
If Ph is the worst point, it is expected to find a better point at the reflection of Ph
on the other side of the simplex. The reflection P*
of Ph is:
P*
= (1 + α) P̅ – α Ph , where:
α ∈ ℝ+
is the reflection coefficient, and [P*
P̅] = α[P̅ Ph]
b) Expansion:
If the reflection point P*
is better than the best Pl:
f(P*
) < f(Pl)
Then expand P*
to the expansion point P**
:
P**
= γ P*
+ (1-γ) P̅ , where:
γ > 1 is the expansion coefficient: the ratio of [P**
P¯ ] to [P*
P¯ ]
c) Contraction:
If the reflection point is worse than all points except for the worst (i.e. worse than
the second worst point):
f(P*
) > f(Pi), for i ≠ h
Then, define a new Ph to be either the old Ph or P*
whichever is better, and
contract P*
to P**
:
P**
= β Ph (1 – β) P¯
Pl
𝑃
Ph
P*
P**
P*
Pl
𝑃 𝑃
Pl
P*
P**
Pl
P*

40
The contraction coefficient β lies between 0 and 1 and is the ratio of [P**
P¯ ] to
[P*
P¯ ]
d) Reduction (Resizing):
If, after contraction, the contracted point P**
is found worse than the second worst
point, then replace every point i with (Pi + Pl) / 2. This contracts the entire simplex
towards the best point Pl and, thus, reduces the size of the simplex.
Reduction handles the rare case of having a failed contraction, which can
happen if one of the simplex points is much farther than the others from the
minimum (optimum) value of the function. Contraction may thus move the
reflected point away from the minimum value and, consequently, further
contractions are useless. In this case, reduction is the proposed action in [34] to
bring all points to a simpler fitness landscape.
2.8.4 Nelder-Mead Algorithm
A flowchart of the Nelder-Mead method is illustrated in Fig. 2-22, and the
corresponding MATLAB/Octave code is given in Table 2-6. This algorithm is explained
in detail in [35].
Table 2-6: Nelder-Mead Algorithm
function [x, fmax] = nelder_mead (fun, x)
% Initialization
minVal = 1e-4; % Min. value to achieve
maxIter = length(x)*200; % Max. number of iterations
n = length (x); % Problem dimension
S = zeros(n,n+1); % Empty simplex
y = zeros (n+1,1); % Empty simplex fitness
S(:,1) = x; % The initial guess
y(1) = feval (fun,x); % Evaluate the initial guess
iter = 0; % Iteration counter
for j = 2:n+1
% Build initial simplex
S(j-1,j) = x(j-1) + (1/n);
y(j) = feval (fun,S(:,j));
endfor
[y,j] = sort (y,'ascend'); % Sort simplex points
S = S(:,j); % Re-arrange simplex points
alpha = 1;
beta = 1/2;
gamma = 2;
% Reflection coefficient
% Contraction coefficient
% Expansion coefficient
while (1) % Main loop
if (++iter > maxIter)
% Stop if exceeded max. iterationsbreak;
endif
if (abs(y(1)) <= minVal)
% Stop of target min. value achievedbreak;
endif

41
mean = (sum (S(:,1:n)')/n)'; % Calculate the mean point
Pr =(1+alpha)*mean - alpha*S(: ,n+1); % Calculate the reflected point
Yr = feval (fun,Pr); % Evaluate the reflected point
if (Yr < y(n)) % Is Reflected better than 2nd worst?
if (Yr < y(1)) % Is reflected better than best?
Pe=gamma*Pr+ (1-gamma)*mean; % Calculate expanded point
Fe = feval (fun,Pe); % Evaluate expanded point
if (Fe < y(1)) % Is expanded better than best?
S(:,n+1) = Pe;
% Replace worst with expanded
y(n+1) = Fe;
else
S(:,n+1) = Pr;
% Replace worst by reflected
y(n+1) = Yr;
endif
else
S(:,n+1) = Pr;
y(n+1) = Yr;
endif
else
if (Yr < y(n+1)) % Is reflected better than worst?
S(:,n+1) = Pr;
y(n+1) = Yr;
endif
Pc = beta*S(:,n+1) + (1-beta)*mean; % Calculate contracted point
Yc = feval (fun,Pc); % Evaluate contracted point
if (Yc < y(n)) % Is contracted better than 2nd worst?
S(:,n+1) = Pc;
% Replace worst by contracted
y(n+1) = Yc;
else
for j = 2:n+1
% Shrink the simplex (Reduction)
S(:,j) = (S(:,1) + S(:,j))/2;
y(j) = feval (fun,S(:,j));
endfor
endif
endif
[y,j] = sort(y,'ascend');
% Sort the simplex
S = S(:,j);
endwhile
x = S(:,1); % The best solution
fmax = y(1); % The minimum value
endfunction
Fig. 2-21 shows 12 of 19 iterations of a Nelder-Mead algorithm run for
minimizing the function f (x, y) = (x-5)2
+ (y-8)4
, with the starting point (6, 6). The reason
of selecting a starting point close to the optimum solution is just to view all simplex
updates on the same axes without shifting or scaling the axes at every iteration. This
illustrates the operation of the algorithm better. The same problem was resolved several

42
times using different initial guesses, and the number of algorithm iterations is recorded
for every starting point in Table 2-7.
Table 2-7: Iteration count for different initial guesses of Nelder-Mead Algorithm
Initial Guess Number of iterations
(6, 6) 19
(0, 60) 21
(-60, 60) 32
(100, 100) 34
(-200, 300) 38
For all results in Table 2-7, the algorithm terminated after achieving the targeted
minimum value (0.0001). It never exceeded the maximum iteration count (400).
Fig. 2-21: Twelve iterations of a practical run of Nelder-Mead algorithm
(1) Initial (2) Contraction (3) Reflection (4) Contraction
(5) Contraction (6) Contraction (7) Contraction (8) Contraction
(9) Contraction (10) flection (11) Contraction (12) Reflection

43
Fig. 2-22: Nelder-Mead algorithm flowchart
Contraction
Expansion
Reflection
Resizing
Replace all Pi’s by
(Pi+Pl)/2
Get fitness function f
Get termination condition
Get α, β, and γ
Yes
P**
= (1+γ) P*
- γ P¯
y**
= f (P**
)
End
Start
Sort the simplex
Determine Ph and Pl
Calculate P¯
P*
= (1+α) P¯ - α Ph
y*
= f (P*
)
yr < yn ?
yr < yl ?
y* < yl ?
y** < yl ?
Replace Ph
by P**
Replace Ph
by P*
No
Yes
No
Terminate?
y* < yh?
y** < yn?
P**
= β Ph + (1-β) P¯
y**
= f (P**
)
Replace Ph by P*
No
Yes
Replace Ph
by P**
NoYes
Yes
No
No
Yes
Initialize the simplex

44
Robocode Game
The proposed IA is tested on Robocode game [1]. It is a Java-programming game
where programmed tanks compete in a battle arena. The tanks are completely
autonomous. That is, programmers have no control over them during the battle.
Therefore, Robocode is ideal for testing IAs. Furthermore, a tank has no perfect
knowledge about the environment (i.e. the arena); its knowledge is restricted to what its
radar can read. Robocode game has been used as a benchmark in many previous works
including [4], [36], [37]. The following subsections describe the Robocode game in
detail.
2.9.1 Robot Anatomy
As shown in Fig. 2-23, every robot consists of a body, a gun, and a radar. The body
carries the gun and the radar. The radar scans for other robots and the gun shoots bullets
with a configurable speed. The energy consumption of a bullet depends on its damage
strength.
Fig. 2-23: A Robocode robot anatomy
2.9.2 Robot Code
Every robot has a main thread and up to five other running threads. The main thread
usually runs an infinite loop where actions are taken. Robocode game also provides some
listener classes for triggering specific actions at certain events such as colliding a robot,
detecting a robot, hitting a robot by a bullet, being hit by a bullet … etc.
2.9.3 Scoring
There are several factors for ranking the battling tanks in Robocode. The factor
considered in this work is the “Bullet Damage”; it is the score awarded to a robot when
one of its bullets hits an opponent.

Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling Techniques

Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling Techniques

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling Techniques

Similar to Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling Techniques (20)

Recently uploaded

Recently uploaded (20)

Enhancing Intelligent Agents By Improving Human Behavior Imitation Using Statistical Modeling Techniques