Last time, we started talking about the continuous optimization and we discussed how some of the features of the optimized function affect the performance of evolutionary algorithms and cause them to perform poorly. Among these features are high-conditioning and non-separability.
Differential evolution tries to solve these problems with a relatively simple idea. It introduces a mutation, which adds the difference of two randomly chosen individuals to another individual (multiplied by a parameter F). If the optimized function is rotated and elongated in some direction in the search space, the population should be elongated in the same direction (thanks to the selection). Thus, the difference of two individuals is a vector with the direction similar to the one in which the function is elongated and adding it to the individual changes it exactly in the most significant direction. Thanks to this mutation, differential evolution is invariant with respect to the rotation of the search space and the scaling of the coordinate axis.
Differential evolution also uses crossover. It works in such a way, that another individual (parent) is selected from the population and a it is crossed with the mutated individual. The crossover goes through both the individuals and with probability 1-CR rewrites the position in the mutated individual by the value from the new parent. (CR is the probability that the position in the mutated individual does not change). At least one position is always changed.
Finally, differential evolution also contains its own selection. It compares the original individual with the one after mutation and crossover and selects the better of these two to the next generation.
The setting of the parameters F and CR has a huge impact on the performance of differential evolution. The typical values for them are F=0.8 and CR=0.9. For separable functions, it is better to set lower CR, e.g. CR=0.2. In such a case most of the positions are from the last parent and the algorithm performs the search along the coordinate axis. It may also be useful to select F uniformly from the interval [0.5, 1.0] before each mutation.
Sometimes, only the differential mutation is used instead of the whole differential evolution, and the rest of the operators and selection is set differently.