Parsimony Pressure

One common problem with variable length representations is “bloat” whereby genome lengths will gradually increase over time. This may be due to over-fitting or the accumulation of “junk DNA” over time.

LEAP currently provides two approaches to mitigating bloat. First is a very simple “genome tax,” or penalty by genome length, popularized by Koza [Koz92]. The second is lexicographical parsimony, or “tie breaking parsimony,” where the individual with the shortest genome is returned if their respective fitnesses happen to be equivalent Luke and Panait [LP02].

API

Parsimony pressure functions.

These are intended to be used as key parameters for selection operators.

Provided are Koza-style parsimony pressure and lexicographic parsimony key functions.

leap_ec.parsimony.koza_parsimony(ind='__no__default__', *, penalty='__no__default__')

Penalize fitness by genome length times a constant, in the style of Koza [Koz92].

>>> import toolz
>>> from leap_ec.individual import Individual
>>> from leap_ec.decoder import IdentityDecoder
>>> from leap_ec.binary_rep.problems import MaxOnes
>>> import leap_ec.ops as ops
>>> import numpy as np
>>> problem = MaxOnes()
>>> pop = [Individual(np.array([0, 0, 0, 1, 1, 1]), problem=problem),
...        Individual(np.array([0, 0]), problem=problem),
...        Individual(np.array([1, 1]), problem=problem),
...        Individual(np.array([1, 1, 1]), problem=problem)]
>>> pop = Individual.evaluate_population(pop)
>>> best, = ops.truncation_selection(pop, size=1)
>>> print(best.genome, best.fitness)
[0 0 0 1 1 1] 3
>>> best, = ops.truncation_selection(pop, size=1, key=koza_parsimony(penalty=.5))
>>> print(best.genome, best.fitness)
[1 1 1] 3
\[f_p(x) = f(x) - cl(x)\]

Where f(x) is original fitness, c is a penalty constant, and l(x) is the genome length.

Parameters
  • ind – to be compared

  • penalty – for denoting penalty strength

Returns

altered comparison criteria

leap_ec.parsimony.lexical_parsimony(ind)

If two fitnesses are the same, break the tie with the smallest genome

This implements Lexicographical Parsimony Pressure [LP02], which is essentially where if the fitnesses of two individuals are close, then break the tie with the smallest genome.

>>> import toolz
>>> from leap_ec.individual import Individual
>>> from leap_ec.binary_rep.problems import MaxOnes
>>> import leap_ec.ops as ops
>>> import numpy as np
>>> problem = MaxOnes()
>>> pop = [Individual(np.array([0, 0, 0, 1, 1, 1]), problem=problem),
...        Individual(np.array([0, 0]), problem=problem),
...        Individual(np.array([1, 1]), problem=problem),
...        Individual(np.array([1, 1, 1]), problem=problem)]
>>> pop = Individual.evaluate_population(pop)
>>> best, = ops.truncation_selection(pop, size=1)
>>> print(best.genome, best.fitness)
[0 0 0 1 1 1] 3
>>> best, = ops.truncation_selection(pop, size=1, key=lexical_parsimony)
>>> print(best.genome, best.fitness)
[1 1 1] 3
Parameters

ind – to be compared

Returns

altered comparison criteria