## [EN] Set Partition I - Fitness and Selection

In today’s lesson, we will solve a more complicated task. We will solve the thieves’ problem (this is its name translated from Czech, I do not think there is such a poetic name for it in English). The thieves have stolen a lot of golden things and they want to divide it among themselves in such a way, that all of them possess things of the same value (they are only interested in the overall weight of gold). They divide the stolen things into several (we will use 10) bins with the same sum of weights. Their goal is to minimize the difference between the lightest and heaviest bin.

From the computer science point of view, this is a generalization of the partition problem (we have 10 subsets instead of 2) and its optimization version is known to be NP-hard.

## Evolutionary algorithm for bins

This task is one of the examples where evolutionary algorithms can be easily used. We have a clearly stated criteria for optimization, and at the same time, we do not have a dedicated algorithm for this kind of problems.

First, we have to think about a few things:

**How will we encode the individual?** We get the list of weights as the input and we have to divide them into 10 bins. One possibility might be to have 10 lists in the individual, each of them would contain the things from one bin. However, such a representation would require rather complex genetic operators. There is an easier encoding - the individual can contain a vector of integers from 0 to 9 of the same dimension as the number of weights in the inputs. On the `i`

-th position, there is the number of bin the i-th things belongs to.

**How to compute the fitness?** It is easy to compute the difference between the lightest and the heaviest bin. However, can we use this number as the fitness function? We want to minimize this difference, but evolutionary algorithms maximize the fitness function. Therefore, we have to compute the fitness in a different way. There are simple methods, how to transform minimization into maximization. The simplest would be to put a minus in front of the value of the difference. However, this would break the roulette wheel selection (it does not work with negative fitness). We need a different transformation, we can, for example, subtract the difference from a large number (so that the results is always positive), or we can take the reciprocal of the difference. Or you can find a completely different way. Isn’t there a better way how to compute the fitness? Is the difference between the lightest and the heaviest bins a good base for the fitness function? Isn’t there something different, more informative?

**Which genetic operators to use?** When we decide the encoding of the individual, we have to choose some genetic operators to work with it. For the integer-vector encoding described above, this is quite simple: we can use the one point crossover we saw last time and a mutation, which changes random positions in individuals to random values from the correct range, it is in fact a generalization of the bit-flip mutation from the last lesson.

**Which selection should we use?** The chosen selection depends on the chosen fitness function. So far, we know only the roulette-wheel selection and, if we think a little about how it works, we can see that it depends a lot on the particular value of the fitness function. An individual with fitness 0.1 is ten times less likely to be selected than an individual with fitness 1.0. However, if we add a 1000 to the values of all the individuals, the likelihood of selecting either of them is almost the same (as they now have fitness 1000.1 and 1001). These are different kinds of selection operators, one of them is the tournament selection. It takes two random individuals, compares their fitness and with high probability (usually around 80%) selects the one with higher fitness. This selection depends only on the order of the individuals and not on the particular values of the fitness function. This does not mean, that it is in any way better than roulette wheel selection, it is just a feature. We can scale the fitness in such a way, that the roulette wheel selection prefers certain individuals. The strength of the selection can also be varied in a similar manner.

## Source codes for this lesson

The source codes for this lesson implement the individual encoded by a vector of integer between 0 and 9 (`evolution.individuals.IntegerIndividual`

). Fitness is computed as the reciprocal of the difference between the lightest and the heaviest bin (see `evolution.binPacking.BinPackingFitness`

).

The main class is called `evolution.binPacking.BinPacking`

and it is very similar to the one from the last week. There is also a helper program, which can process the .best files output by the program and print the difference between the lightest and the heaviest bin.

The weights of things are read from file `resources/packingInput-easier`

or `resources/packingInput-harder`

. They both contain 500 weights of things. For the easier one the minimal difference is 0, for the harder one it is a small integer (I do not remember the exact value, should be less or equal to 2).