So far we have discussed only single-objective optimization (we had one fitness function), but evolutionary algorithms are very often and very successfully also used for multi-objective optimization, i.e. the optimization of multiple functions simultaneously. Imagine we want to develop the shape of a wing. In this case, we want the wing to have the largest possible lift and simultaneously the lowest possible drag. It turns out that evolutionary algorithms are well suited for solving multi-objective problems and are among the best multi-objective optimizers.
Multi-objective problem is thus specified as a set of multiple functions from a search space to real numbers. Given that the functions can contradict each other, it is not usually possible to find a solution that would be optimal for all simultaneously. Thus, the aim is to find a set of Pareto optimal solutions.
In order to define Pareto optimal solution, we must first define the dominance of the two solutions. We say that a solution a dominates a solution b (or solution b is dominated by solution a) if a is in all criteria at least as good as b and at least in one criterion it is better. Pareto optimal solutions are then those that are not dominated by any other possible solution of the search space. The image of the Pareto-optimal set into the space of functional values is called the Pareto-optimal front. The set of Pareto-optimal solutions is often infinite and it is not possible to find all of the solutions, thus we seek some finite approximation of the set. It is interesting that in the case of multi-objective optimization there is not a single best individual in the population. Instead the whole population is used as the solutions and is considered an approximation of the current Pareto optimal set.
With multiple criteria comes a new challenges in the design of evolutionary algorithms. One of the problems is how to determine which algorithms perform better and which perform worse. To do so, we compare the resulting approximations of the Pareto optimal set. Initially, such a comparison was performed mostly visually - both sets were plotted and compared and the authors discussed why one them is better than the other. The main problem of this method is that it is not very objective and it is also difficult to evaluate two sets it they are both similar to each other. Later, various indicators, that can reflect the quality of the set of solutions better, began to emerge. Among the easier indicators is the coverage indicator – it expresses the percentage of solutions in one set which is dominated by solutions from the other set. Today, however, the two most commonly used indicators are the inverse generational distance (IGD) and hypervolume (HV).
IGD indicator measures the average distance from the Pareto optimal front to the closest solution in the approximation set. The indicator thus expresses how close are the solutions to the Pareto-optimal front (convergence), and also if the solutions are distributed uniformly along the front (spread). Smaller values of this indicator are better. Sometimes also generational distance (GD) is used. It calculates how far it is from the set of solutions to the nearest Pareto optimal solution. It expresses only the convergence but not the spread.
Hypervolume expresses the volume of space dominated by a given set (under a selected reference point). In case we have only two objective which we minimize simultaneously, it is the area “above” the non-dominated front from the population and below the reference point (otherwise it would be infinite). This indicator expresses again both convergence and coverage of the Pareto-optimal front. Larger values of this indicator are better. Sometimes, the difference between the hypervolume of the optimal set and the resulting approximation is computed. In this case, of course, smaller values are better.
Inability to directly compare two individuals brings another challenge to evolutionary algorithms. It is necessary to ensure convergence to the Pareto optimal front, and also to provide good coverage of the front. As an example of how these algorithms work we will describe one of the simpler ones: NSGA-II.
The main difference between single-objective and multi-objective algorithms is in the environmental selection (operators may well stay the same). Before the selection, NSGA-II merges the populations of parents and offspring (thus ensuring elitism). The combined population is then divided into non-dominated fronts. Those individuals who are not dominated by any other individual in the population belong to the first non-dominated front. Individuals who are not dominated by any individual in the population after removal of the first front belong to the second front, etc. Individuals from fronts with lower numbers are selected to the next generation. Naturally, at one point the whole front does not fit to the new population. In such a case a secondary sorting criterion is used to select some of the individuals in the front. In the original NSGA-II the secondary criterion is called crowding distance. This is the sum over all the criteria of the difference of the closest worse (in this criterion) and the closest better individual. Larger values are better because such individuals are in sparsely populated areas of the non-dominated front and thus improve the coverage of the Pareto front. However, different secondary sorting criteria are often used, such as hypervolume contribution, i.e. the individual’s contribution to the hypervolume of the non-dominated front. It is calculated as the difference of the hypervolume of the front and the hypervolume of the front without the individual. Individuals with higher contributions are selected first.
The source codes for this lesson (package evolution.multi) contain a simple implementation of NSGA-II (this implementation works only for two-objective problems). The implementation required a few hacks in our library.
There are also five bi-objective problems from the ZDT benchmark (named after the names of the authors). We do not have the ZDT5 problem because it uses binary coding, other problems use real coding. All these problems are relatively simple, separable. The first function of all of them (except ZDT6) is linear. ZDT1 and ZDT2 differ only in the shape of the Pareto-optimal front. ZDT3 has discontinuous Pareto-optimal front (however, the Pareto-optimal set is continuous). ZDT4 has plenty of local Pareto optimal fronts and ZDT6 has both objectives multi-modal. For all the problems both functions should be minimized and the optimum lies on the border of the search space (due to this feature, the ZDT benchmark has earned a lot of criticism and tends to be used less often nowadays).