[EN] Continuous optimization I - Adaptive operators
Continuous optimization deals with the optimization of functions of real variables. In the lesson, we will use the functions from the BBOB benchmark, which has been used recently to compare continuous optimizers.
However, continuous optimization can be used for more general tasks than just the optimization of real functions. For example, it can be used to optimize parameters of differential equations that describe a process (in such a case the fitness evaluations involves solving/simulating the equations). It can also be used to evolve shapes by evolving the positions of control points of Bézier curves. For the lesson, we have chosen the optimization of real functions, because they are easier to understand, as we know exactly how the function looks, therefore we can have an intuitive understanding of why the algorithm is working or not for a particular function.
Important features of continuous functions
The behavior and performance of the optimizer closely depends on the features of the particular optimized function. Functions resembling a hyper-sphere or multi-dimensional parabola are the easiest to optimize. The are convex, have only one local optima, which is also the global one, and they respond similarly to similar changes of all parameters. They can also be optimized one parameter at a time (i.e. it is possible to find the optima with respect to one parameter with the rest of parameters fixed, then find the optima with respect to another parameter and so on). Such functions can also be optimized using simple mathematical methods. Unfortunately, most functions are not so nice and we have to use more complicated heuristic methods, such as simulated annealing or evolutionary algorithms.
Among the features, which make the optimization harder are:
- Non-separability - dependence among the variables, which make it impossible to optimize one parameter at a time - an example of such a function is an ellipsoid rotated in such a way that its axis are not parallel to the axis of the coordinate system (e.g. f10 from the BBOB benchmark).
- Multi-modality - presence of a large number of local optima. Typical examples are functions, which contains sines and cosines (e.g. f16 from the BBOB benchmark).
- Ill-conditioning - Situations, where different variables have different effect on the values of the fitness function, we can imagine an ellipsoid with one of the axis much longer than the other (e.g. f10 from the BBOB benchmark again)
Evolutionary algorithms for continuous optimization
In continuous optimization, the individuals are usually encoded as vectors of real numbers (i.e. of the type float or double). The operators are closely connected with the representation.
Crossover for continuous optimization
We can of course use the n-point crossover for the vectors of real numbers, similarly as in the case of whole numbers. However, we have another possibility. In the arithmetic crossover, the offspring can be the weighted average of the parents. One of the offspring is created in such a way that one of the parents has weight w and the other has weight 1-w, for the other offspring the weights of the parents are swapped. The weights can be the same for the whole parent or they can be different for each position in the vector. We can also combine the n-point crossover and the arithmetic crossover.
There are more complex variants of the arithmetic crossover. For example, the SBX (Simulated Binary Crossover) crossover selects the weights from such a probabilistic distribution, that the changes are similar to changes observed when crossing binary encoded individuals, i.e. most times the offspring are very close to the parents. On the other hand, if the weights are selected from the uniform distribution, the offspring are very often far from the parents.
Mutation for continuous optimization
There are (as usual) more possibilities for mutation
- Unbiased mutation - a new random value is generated for some positions in the individual regardless of the current value on that position
- Biased mutation - a random number from a suitable distribution is added to some positions in the individual. The distribution should be symmetric around 0, typically, the normal distribution with a suitable standard deviation is used.
There is also a more complex version of the biased mutation. In the polynomial mutation, the distribution, from which the new number is taken is chosen in such a way, that the changes are similar to changes performed by bit-flip mutation for binary encoded numbers.
Adaptive operators
Apart from the operators with fixed parameters, sometimes the parameters are changed adaptively during the evolution
For example, we can lower the standard deviation of the normal distribution in the biased mutation as the number of generations grows. This should enable the algorithm to explore in the beginning and find more exact values later.
Another possibility is to change the standard deviation or the probability of mutation based on the fitness of each mutated individual. We can also have a different standard deviation for different positions in the individual and set them based on some statistics of the whole population. This can help with optimization of ill-conditioned functions.
Those who are interested in the adaptive evolutionary algorithms, also called evolution strategies, can read a nice introduction to this topic written by Niko Hansen.
Source codes
The source code for this lesson contain implementation of the genetic operators described above. There is also an implementation of all (in Java, only selected ones in Python) the functions from the BBOB benchmark, but be aware, that there can be bugs in the implementation (there are almost definitely bugs in the implementation of the more complex functions after f20 - in Java). If you want to compare algorithms in a more serious way (e.g. in master thesis), I recommend using the BBOB benchmark directly instead of this implementation.
The goal of your algorithm should be to minimize these functions. Note, that a different number is added to the value of the function in each run, so that you do not know the value of the optima. The functions can return negative values. In the objectiveValue/objective
the distance to the optima is stored for logging, DO NOT use it in your operators. The location of the optima is also in a different position in each run, so that it is not in the center of the search space.
Do not try to optimize all the functions, select only a few of them and experiment with them. You can for example select one function from each group of the BBOB benchmark.