In the lecture, Roman told you about how evolutionary algorithms can evolve rules for classification. In this seminar we will try something similar.
In the source codes, you will find a simple implementation inspired by the Pittsburgh approach, i.e. individuals are sets of multiple rules. Our goal is to classify the given data set correctly, that is, we use a fitness function that calculates the classification accuracy (percentage of correctly classified instances). We use
1 - accuracy (i.e. error rate) as the objective function.
Individuals (in our case) are lists of rules (the maximum number of rules can be set), each rule consists of conditions for each attribute. We have three kinds of conditions - less than, greater than, and a universal condition that is always true. When evaluating an individual, for each instance of the data, we find all the rules that match it (all conditions are true), and these rules vote on the classification (i.e. the class the rules predict most often wins).
We have three genetic operators implemented for these individuals
- a crossover that resembles a uniform crossover (random rules are taken from one or the other individual)
- a mutation that changes boundaries in rules
- a mutation that changes the value of the predicted class
We will test the algorithm on two datasets - one of them is the well-known iris dataset (
iris.csv), the goal is to classify three different species of irises based on the measures of petals and sepals. The other file is
winequality-white.csv, where the goal is to predict the quality of wine on a scale 1-9 based on its physical and chemical properties. The former file is quite small and therefore it is suitable for testing. The later file is mostly for those who consider the former one too simple.
You can find the source codes in the