Pipeline Operators¶
Overview¶
leap_ec.individual.Individual
, leap_ec.problem.Problem
,
and leap_ec.decoder.Decoder
are passive classes that need an
external framework to make them function. In LEAP Concepts the notion of a
pipeline of evolutionary algorithm (EA) operators that use these classes was
introduced. That is, Individual, Decoder, and Problem are the “nouns”
and the pipeline operators a the verbs that operate on those nouns. The
operator pipeline objective is to create a new set of evaluated individuals
from an existing set of prospective parents that can be in a new set of
prospective parents.
Fig.2 is shown again here to depict a typical set of LEAP pipeline operators. The pipeline generally starts with a “sink”, or a parent population, from which the next operator typically selects for creating offspring. This is followed by a clone operator that ensure the subsequent pertubation operators do not modify the selected parents. (And so it is critically important that users always have a clone operator as a part of the offspring creation pipeline before any mutation, crossover, or other genome altering operators.) The pertubation operators can be mutation or also include a crossover operator. At this point in the pipeline we have a completed offspring with no fitness, so the next operator evaluates the offspring to assign that fitness. Then the evaluated offspring is collected into a pool of offspring. Once the offspring pool reaches a desired size it returns all the offspring to another selection operator to cull the offspring, and optionally the parents, to return the next set of prospective parents.
Or, more explicitly:
Start with a collection of Individuals that are prospective parents
A selection operator for selecting one or more parents to begin the creation of a new offspring
A clone operator that makes a copy of the selected parents to ensure the following operators don’t overwrite those parents
A set of mutation, crossover, or other operators that perturb the cloned individual’s genome, thus (hopefully) giving the new offspring unique values
An operator to evaluate the new offspring
A pool that serves as a “sink” for evaluated offspring; this pool is sent to the next operator, or is returned from the function, once the pool reaches a specified size
Another selection operator to cull the offspring (and optionally parents) to return a population of new prospective parents
This is, the general sequence for most LEAP pipelines, but there will be the occasional variation
on this theme. For example, many of the provided “canned” algorithms take just
snippets of an offspring creation pipeline. E.g., leap_ec.distributed.asynchronous.steady_state()
has an offspring_pipeline parameter that doesn’t have parents explicitly
as part of the pipeline; instead, for steady_state() it’s implied that the parents will be provided during the
run internally.
Implementation Details¶
The LEAP pipeline is implemented using the toolz.functoolz.pipe()
function,
which has arguments comprised of a collection of data followed by an arbitrary
number of functions. When invoked the data is passed as an argument to the first
function, and the output of that function is fed as an argument to the next
function — this repeats for the rest of the functions. The output of the last
function is returned as the overall pipeline output.
(See: https://toolz.readthedocs.io/en/latest/api.html#toolz.functoolz.pipe )
Loose-coupling via generator functions¶
The first “data” argument is a collection of Individuals representing prospective parents, which can be a sequence, such as a list or tuple. The design philosophy for the operator functions that follow was to ensure they were as loosely coupled as possible. This was achieved by implementing some operators as generator functions that accept iterators as arguments. That way, new operators can be spliced into the pipeline and they’d automatically “hook up” to their neighbors.
For example, consider the following snippet:
gen = 0
while gen < max_generation:
offspring = toolz.pipe(parents,
ops.tournament_selection,
ops.clone,
mutate_bitflip,
ops.evaluate,
ops.pool(size=len(parents)))
parents = offspring
gen += 1
The above code snippet is an example of a very basic genetic algorithm implementation that uses a toolz.pipe() function to link together a series of operators to do the following:
binary tournament_selection selection on a set of parents
clone those that were selected
perform mutation bit-flip on the clones
evaluate the offspring
accumulate as many offspring as there are parents
Since we only have mutation in the pipeline, only one parent at a time is selected to be cloned to create an offspring. However, let’s make one change to that pipeline by adding crossover:
gen = 0
while gen < max_generation:
offspring = toolz.pipe(parents,
ops.tournament_selection,
ops.clone,
mutate_bitflip,
ops.uniform_crossover, # NEW OPERATOR
ops.evaluate,
ops.pool(size=len(parents)))
parents = offspring
gen += 1
This does the following:
binary tournament_selection selection on a set of parents
clone those that were selected
perform mutation bitflip on the clones
perform uniform crossover between the two offspring
evaluate the offspring
accumulate as many offspring as there are parents
Adding crossover means that now two parents are selected instead of one. However, note that the tournament_selection selection operator wasn’t changed. It automatically selects two parents instead of one, as necessary.
Let’s take a closer look at uniform_crossover() (this is a simplified version; the actual code has more type checking and docstrings).
def uniform_crossover(next_individual: Iterator,
p_swap: float = 0.5) -> Iterator:
def _uniform_crossover(ind1, ind2, p_swap):
for i in range(len(ind1.genome)):
if random.random() < p_swap:
ind1.genome[i], ind2.genome[i] = ind2.genome[i], ind1.genome[i]
return ind1, ind2
while True:
parent1 = next(next_individual)
parent2 = next(next_individual)
child1, child2 = _uniform_crossover(parent1, parent2, p_swap)
yield child1
yield child2
Note that the argument next_individual is an Iterator that “hooks up” to a previously yielded Individual from the previous pipeline operator. The uniform_crossover operator doesn’t care how the previous Individual is made, it just has a contract that when next() is invoked that it will get another Individual. And, since this is a generator function, it yields the crossed-over Individuals. It also has two yield statements that ensures both crossed-over Individuals are returned, thus eliminating a potential source of genetic drift by arbitrarily only yielding one and discarding the other.
Operators for collections of Individuals¶
There is another class of operators that work on collections of Individuals such as selection and pooling operators. Generally:
- selection pipeline operators
accept a collection of Individuals and yield a selected Individual (and thus are generator functions)
- pooling operators
accept an Iterator from which to get the next() Individual, and returns a collection of Individuals
Below shows an example of a selection operator, which is a simplified version of the tournament_selection() operator:
def tournament_selection(population: List, k: int = 2) -> Iterator:
while True:
choices = random.choices(population, k=k)
best = max(choices)
yield best
(Again, the actual leap_ec.ops.tournament_selection()
has checks and docstrings.)
This depicts how a typical selection pipeline operator works. It accepts a population parameter (plus some optional parameters), and yields the selected individual.
Below is example of a pooling operator:
def pool(next_individual: Iterator, size: int) -> List:
return [next(next_individual) for _ in range(size)]
This accepts an Iterator from which it gets the next individual, and it uses that iterator to accumulate a specified number of Individuals via a list comprehension. Once the desired number of Individuals is accumulated, the list of those Individuals is returned.
Currying Function Decorators¶
Some pipeline operators have user-specified parameters. E.g., leap_ec.ops.pool()
has
the mandatory size parameter. However, given that toolz.pipe() takes
functions as parameters, how do we ensure that we pass in functions that have
set parameters?
Normally we would use the Standard Python Library’s functools.partial to set the function parameters and then pass in the function returned from that call. However, toolz has a convenient function wrapper that does the same thing, toolz.functools.curry. (See: https://toolz.readthedocs.io/en/latest/api.html#toolz.functoolz.curry ) Pipeline operators that take on user-settable parameters are all wrapped with curry to allow functions with parameters set to be passed into toolz.pipe().
Operator Class¶
Most of the pipeline operators are implemented as functions. However, from time to time an operator will need to persist state between invocations. For generator functions, that comes with using yield in that the next time that function is invoked the next individual is returned. However, there are some operators that use closures, such as :py:func:leap_ec.ops.migrate.
In any case, sometimes if one wants persistent state in a pipeline operator a
closure or using yield
isn’t enough. In which case, having a class that
can have objects that persist state might be useful.
To that end, leap_ec.ops.Operator
is an abstract base-class (ABC)
that provides a template of sorts for those kinds of classes. That is, you would
write an Operator sub-class that provides a __call__() member function
that would allow objects of that class to be inserted into a LEAP pipeline
just like any other operator. Presumably during execution the internal object
state would be continually be updated with book-keeping information as
Individuals flow through it in the pipeline.
leap_ec.ops.CooperativeEvaluate
is an example of using this class.
Table of Pipeline Operators¶
Representation Specificity |
Input -> Output |
Operator |
|
---|---|---|---|
Representation Agnostic |
Iterator → Iterator |
clone() |
|
evaluate() |
|||
uniform_crossover() |
|||
n_ary_crossover() |
|||
CooperativeEvaluate |
|||
Iterator → population |
pool() |
||
population → population |
truncation_selection() |
||
const_evaluate() |
|||
insertion_selection() |
|||
migrate() |
|||
population → Iterator |
tournament_selection() |
||
naive_cyclic_selection() |
|||
cyclic_selection() |
|||
random_selection() |
|||
Representation Dependent |
binary_rep |
Iterator → Iterator |
mutate_bitflip() |
real_rep |
Iterator → Iterator |
mutate_gaussian() |
Admittedly it can be confusing when considering the full suite of LEAP pipeline operators, especially in remembering what kind of operators “connect” to what. With that in mind, the above table breaks down pipeline operators into different categories. First, there are two broad categories of pipeline operators — operators that don’t care about the internal representation of Individuals, or “Representation Agnostic” operators; and those operators that do depend on the internal representation, or “Representation Dependent” operators. Most of the operators are “Representation Agnostic” in that it doesn’t matter if a given Individual has a genome of bits, real-values, or some other representation. Only two operators are dependent on representation, and those will be discussed later.
The next category is broken down by what kind of input and output a given
operator takes. That is, generally, an operator takes a population (collection
of Individuals) or an Iterator from which a next Individual can be found.
Likewise, a given operator can return a population or yield an Iterator to
a next Individual. So, operators that return an Iterator can be connected
to operators that expect an Iterator for input. Similarly, an operator that
expects a population can be connected directly to a collection of Individuals
(e.g., be the second argument to toolz.pipe()
) or to an operator that
returns a collection of Individuals.
If you are familiar with evolutionary algorithms, most of these connections are just common sense. For example, selection operators would select from a population.
With regards to “Representation Dependent” operators there currently are only
two: leap_ec.binary_rep.mutate_bitflip()
and
leap_ec.real_rep.mutate_gaussian()
. The former relies on a genome of
all bits, and the latter of real-values. In the future, LEAP will support other
representations that will similarly have their own operators.
Warning
Are all operators really representation agnostic? In reality, most of the operators assume that Individual.genome is a python sequence, which may not always be the case. For example, the user may come up with a representation that employs, say, a sparse matrix. In that case, the crossover operators will fail.
In the future we intend on adding support for other popular representations that will show up as LEAP sub-packages. (I.e., just as binary_rep and real_rep provide support for binary and real-value representations.)
So, in a sense, for where it matters, LEAP currently assumes some sort of sequence for genomes though, again, plans are afoot to add more representation types. In the interim, you will have to add your own operators to support new non-sequence genomic representations.
Type-checking Decorator Functions¶
However, to help minimize the chances that pipeline operators would be mis-used the operators have function decorates that due parameter type-checking to ensure the correct parameters are being passed in. These are:
- iteriter_op
This checks for signatures of type Iterator -> Iterator
- listlist_op
Checks for population -> population type operators
- listiter_op
Checks for population -> population type operators
- iterlist_op
Checks for population -> Iterator type operators
These can be found in leap_ec.ops.
API Documentation¶
Base operator classes and representation agnostic functions¶
Fundamental evolutionary operators.
This module provides many of the most important functions that we string together to create EAs out of operator pipelines. You’ll find many traditional selection and reproduction strategies here, as well as components for classic algorithms like island models and cooperative coevolution.
-
class
leap_ec.ops.
CooperativeEvaluate
(context, num_trials, collaborator_selector, log_stream=None, combine=<function concat_combine>)¶ Bases:
leap_ec.ops.Operator
A simple, non-parallel implementation of cooperative coevolutionary fitness evaluation.
- Parameters
context – the algorithm’s state context. Used to access subpopulation information.
-
class
leap_ec.ops.
Operator
¶ Bases:
abc.ABC
Abstract base class that documents the interface for operators in a LEAP pipeline.
LEAP treats operators as functions of two arguments: the population, and a “context” dict that may be used in some algorithms to maintain some global state or parameters independent of the population.
TODO The above description is outdated. –Siggy
You can inherit from this class to define operators as classes. Classes support operators that take extra arguments at construction time (such as a mutation rate) and maintain some internal private state, and they allow certain special patterns (such as multi-function operators).
But inheriting from this class is optional. LEAP can treat any callable object that takes two parameters as an operator. You may define your custom operators as closures (which also allow for construction-time arguments and internal state), as simple functions ( when no additional arguments are necessary), or as curried functions ( i.e. with the help of toolz.curry(…).
-
leap_ec.ops.
clone
¶ clones and returns the next individual in the pipeline
>>> from leap_ec.individual import Individual
Create a common decoder and problem for individuals.
>>> original = Individual([1,1])
>>> cloned_generator = clone(iter([original]))
- Parameters
next_individual – iterator for next individual to be cloned
- Returns
copy of next_individual
-
leap_ec.ops.
compute_expected_probability
(expected: float, individual_genome: List) → float¶ Computed the probability of mutation based on the desired average expected mutation and genome length.
- Parameters
expected – times individual is to be mutated on average
individual_genome – genome for which to compute the probability
- Returns
the corresponding probability of mutation
-
leap_ec.ops.
concat_combine
(collaborators)¶ Combine a list of individuals by concatenating their genomes.
You can choose whether this or some other function is used for combining collaborators by passing it into the CooperativeEvaluate constructor.
-
leap_ec.ops.
const_evaluate
¶ An evaluator that assigns a constant fitness to every individual.
This is useful for algorithms that need to assign an arbitrary initial fitness value before using their normal evaluation method. Some forms of cooperative coevolution are an example.
-
leap_ec.ops.
cyclic_selection
¶ Deterministically returns individuals in order, then shuffles the sequence, returns the individuals in that new order, and repeats this process.
>>> from leap_ec.individual import Individual >>> from leap_ec.ops import cyclic_selection
>>> pop = [Individual([0, 0]), ... Individual([0, 1])]
>>> cyclic_selector = cyclic_selection(pop)
- Parameters
population – from which to select
- Returns
the next selected individual
-
leap_ec.ops.
evaluate
¶ Evaluate and returns the next individual in the pipeline
>>> from leap_ec.individual import Individual >>> from leap_ec.decoder import IdentityDecoder >>> from leap_ec.binary_rep.problems import MaxOnes
We need to specify the decoder and problem so that evaluation is possible.
>>> ind = Individual([1,1], decoder=IdentityDecoder(), problem=MaxOnes())
>>> evaluated_ind = next(evaluate(iter([ind])))
- Parameters
next_individual – iterator pointing to next individual to be evaluated
kwargs – contains optional context state to pass down the pipeline
in context dictionaries
- Returns
the evaluated individual
-
leap_ec.ops.
insertion_selection
¶ do exclusive selection between offspring and parents
This is typically used for Ken De Jong’s EV algorithm for survival selection. Each offspring is deterministically selected and a random parent is selected; if the offspring wins, then it replaces the parent.
Note that we make a _copy_ of the parents and have the offspring compete with the parent copies so that users can optionally preserve the original parents. You may wish to do that, for example, if you want to analyze the composition of the original parents and the modified copy.
- Parameters
offspring – population to select from
parents – parents that are copied and which the copies are potentially updated with better offspring
- Returns
the updated parent population
-
leap_ec.ops.
iteriter_op
(f)¶ This decorator wraps a function with runtime type checking to ensure that it always receives an iterator as its first argument, and that it returns an iterator.
We use this to make debugging operator pipelines easier in EAs: if you accidentally hook up, say an operator that outputs a list to an operator that expects an iterator, we’ll throw an exception that pinpoints the issue.
- Parameters
function (f) – the function to wrap
-
leap_ec.ops.
iterlist_op
(f)¶ This decorator wraps a function with runtime type checking to ensure that it always receives an iterator as its first argument, and that it returns a list.
We use this to make debugging operator pipelines easier in EAs: if you accidentally hook up, say an operator that outputs a list to an operator that expects an iterator, we’ll throw an exception that pinpoints the issue.
- Parameters
function (f) – the function to wrap
-
leap_ec.ops.
listiter_op
(f)¶ This decorator wraps a function with runtime type checking to ensure that it always receives a list as its first argument, and that it returns an iterator.
We use this to make debugging operator pipelines easier in EAs: if you accidentally hook up, say an operator that outputs an iterator to an operator that expects a list, we’ll throw an exception that pinpoints the issue.
- Parameters
function (f) – the function to wrap
-
leap_ec.ops.
listlist_op
(f)¶ This decorator wraps a function with runtime type checking to ensure that it always receives a list as its first argument, and that it returns a list.
We use this to make debugging operator pipelines easier in EAs: if you accidentally hook up, say an operator that outputs an iterator to an operator that expects a list, we’ll throw an exception that pinpoints the issue.
- Parameters
function (f) – the function to wrap
-
leap_ec.ops.
migrate
(context, topology, emigrant_selector, replacement_selector, migration_gap)¶
-
leap_ec.ops.
n_ary_crossover
¶ Do crossover between individuals between N crossover points.
1 < n < genome length - 1
We also assume that the passed in individuals are clones of parents.
>>> from leap_ec.individual import Individual >>> from leap_ec.ops import n_ary_crossover
>>> first = Individual([0,0]) >>> second = Individual([1,1]) >>> i = iter([first, second]) >>> result = n_ary_crossover(i)
>>> new_first = next(result) >>> new_second = next(result)
- Parameters
next_individual – where we get the next individual from the pipeline
num_points – how many crossing points do we allow?
- Returns
two recombined
-
leap_ec.ops.
naive_cyclic_selection
¶ Deterministically returns individuals, and repeats the same sequence when exhausted.
This is “naive” because it doesn’t shuffle the population between complete tours to minimize bias.
>>> from leap_ec.individual import Individual >>> from leap_ec.ops import naive_cyclic_selection
>>> pop = [Individual([0, 0]), ... Individual([0, 1])]
>>> cyclic_selector = naive_cyclic_selection(pop)
- Parameters
population – from which to select
- Returns
the next selected individual
-
leap_ec.ops.
pool
¶ ‘Sink’ for creating size individuals from preceding pipeline source.
Allows for “pooling” individuals to be processed by next pipeline operator. Typically used to collect offspring from preceding set of selection and birth operators, but could also be used to, say, “pool” individuals to be passed to an EDA as a training set.
>>> from leap_ec.individual import Individual >>> from leap_ec.ops import naive_cyclic_selection
>>> pop = [Individual([0, 0]), ... Individual([0, 1])]
>>> cyclic_selector = naive_cyclic_selection(pop)
>>> pool = pool(cyclic_selector, 3)
print(pool) [Individual([0, 0], None, None), Individual([0, 1], None, None), Individual([0, 0], None, None)]
- Parameters
next_individual – generator for getting the next offspring
size – how many kids we want
- Returns
population of size offspring
-
leap_ec.ops.
random_selection
(population: List) → Iterator¶ return a uniformly randomly selected individual from the population
- Parameters
population – from which to select
- Returns
a uniformly selected individual
-
leap_ec.ops.
tournament_selection
¶ Selects the best individual from k individuals randomly selected from the given population
>>> from leap_ec.individual import Individual >>> from leap_ec.decoder import IdentityDecoder >>> from leap_ec.binary_rep.problems import MaxOnes >>> from leap_ec.ops import tournament_selection
>>> pop = [Individual([0, 0, 0], IdentityDecoder(), problem=MaxOnes()), ... Individual([0, 0, 1], IdentityDecoder(), problem=MaxOnes())]
We need to evaluate them to get their fitness to sort them for truncation.
>>> pop = Individual.evaluate_population(pop)
>>> best = tournament_selection(pop)
- Parameters
population – from which to select
k – are randomly drawn from which to choose the best; by
default this is 2 for binary tournament selection
- Returns
the best of k individuals drawn from population
-
leap_ec.ops.
truncation_selection
¶ return the size best individuals from the given population
This defaults to (mu, lambda) if parents is not given.
>>> from leap_ec.individual import Individual >>> from leap_ec.decoder import IdentityDecoder >>> from leap_ec.binary_rep.problems import MaxOnes >>> from leap_ec.ops import truncation_selection
>>> pop = [Individual([0, 0, 0], decoder=IdentityDecoder(), problem=MaxOnes()), ... Individual([0, 0, 1], decoder=IdentityDecoder(), problem=MaxOnes()), ... Individual([1, 1, 0], decoder=IdentityDecoder(), problem=MaxOnes()), ... Individual([1, 1, 1], decoder=IdentityDecoder(), problem=MaxOnes())]
We need to evaluate them to get their fitness to sort them for truncation.
>>> pop = Individual.evaluate_population(pop)
>>> truncated = truncation_selection(pop, 2)
TODO Do we want an optional context to over-ride the ‘parents’ parameter?
- Parameters
offspring – offspring to truncate down to a smaller population
size – is what to resize population to
second_population – is optional parent population to include with population for downsizing
- Returns
truncated population
-
leap_ec.ops.
uniform_crossover
¶ Generator for recombining two individuals and passing them down the line.
>>> from leap_ec.individual import Individual >>> from leap_ec.ops import uniform_crossover
>>> first = Individual([0,0]) >>> second = Individual([1,1]) >>> i = iter([first, second]) >>> result = uniform_crossover(i)
>>> new_first = next(result) >>> new_second = next(result)
- Parameters
next_individual – where we get the next individual
p_swap – how likely are we to swap each pair of genes
- Returns
two recombined individuals
Pipeline operators for binary representations¶
Binary representation specific pipeline operators.
-
leap_ec.binary_rep.ops.
mutate_bitflip
¶ mutate and return an individual with a binary representation
>>> from leap_ec.individual import Individual >>> from leap_ec.binary_rep.ops import mutate_bitflip
>>> original = Individual([1,1]) >>> mutated = next(mutate_bitflip(iter([original])))
- Parameters
next_individual – to be mutated
expected – the expected number of mutations, on average
- Returns
mutated individual
Pipeline operators for real-valued representations¶
Pipeline operators for real-valued representations
-
leap_ec.real_rep.ops.
mutate_gaussian
¶ mutate and return an individual with a real-valued representation
TODO hard_bounds should also be able to take a sequence —Siggy
- Parameters
next_individual – to be mutated
std – standard deviation to be equally applied to all individuals; this can be a scalar value or a “shadow vector” of standard deviations
expected – the expected number of mutations per individual, on average. If None, all genes will be mutated.
hard_bounds – to clip for mutations; defaults to (- ∞, ∞)
- Returns
a generator of mutated individuals.