Getting Started

This tutorial introduces and steps through a simple program which uses the CGP-Library to solve a symbolic regression problem.

Symbolic regression is the task of fitting a symbolic equation to a series of data points.  If you have not already downloaded and/or installed the CGP-Library please see the Installation tutorial.

Summary
Getting StartedThis tutorial introduces and steps through a simple program which uses the CGP-Library to solve a symbolic regression problem.
The ProgramA simple symbolic regression solving program used to illustrate the basic operation of the CGP-Library.
Stepping Through the CodeA close look at each line of the example code.

The Program

A simple symbolic regression solving program used to illustrate the basic operation of the CGP-Library.

The program below is provided in the CGP-Library download within /examples/gettingStarted.c and is included in the code::blocks project.

#include <stdio.h>
#include "../src/cgp.h"

int main(void){

    struct parameters *params = NULL;
    struct dataSet *trainingData = NULL;
    struct chromosome *chromo = NULL;

    int numInputs = 1;
    int numNodes = 15;
    int numOutputs = 1;
    int nodeArity = 2;

    int numGens = 10000;
    int updateFrequency = 500;
    double targetFitness = 0.1;

    params = initialiseParameters(numInputs, numNodes, numOutputs, nodeArity);

    addNodeFunction(params, "add,sub,mul,div,sin");

    setTargetFitness(params, targetFitness);

    setUpdateFrequency(params, updateFrequency);

    printParameters(params);

    trainingData = initialiseDataSetFromFile("./dataSets/symbolic.data");

    chromo = runCGP(params, trainingData, numGens);

    printChromosome(chromo, 0);

    freeDataSet(trainingData);
    freeChromosome(chromo);
    freeParameters(params);

    return 0;
}

Stepping Through the Code

A close look at each line of the example code.

First cgp.h must be included to use the CGP-Library.  The cgp.h is located in the /src directory.

#include "../src/cgp.h"

The CGP-Library provides a number of structures which describe, among other things, the CGP parameters and CGP chromosomes.  These structures are initialised and free’d using provided functions.  The attributes of these structures should not, and in fact cannot, be accessed directly and must also be accessed using provided getter and setter functions.

Below three CGP-Library structures are defined which will be used later in this tutorial.

  • The parameters structure is used to store general CGP parameters including the size of the chromosomes, the evolutionary strategy to use and the mutation rate.
  • The dataSet structure is used to store input output pairs of data which may be used by the fitness function.  For instance in this example a dataSet structure will be used to store the data points to which an equation shall be fitted.
  • The chromosome structure is used to store the fittest chromosome found after CGP has been applied towards the symbolic regression task.

Note: all of the structures have been initialised to NULL.  This is best practice and many of the CGP library functions check to see if structures have been initialised by seeing of they are not equal to NULL.

struct parameters *params = NULL;
struct dataSet *trainingData = NULL;
struct chromosome *chromo = NULL;

Later when the parameters structure is initialise using initialiseParameters we will need to specify the structure of the chromosomes to be evolved.  This requires specifying the number of chromosome inputs, nodes, outputs and the arity of each node.

int numInputs = 1;
int numNodes = 15;
int numOutputs = 1;
int nodeArity = 2;

Later when CGP is applied towards the symbolic regression task the number of allowed generations before terminating the search will be needed to be specified.

int numGens = 10000;

Later when the parameters structure is initialise using initialiseParameters many CGP-Library parameters will be set to default values.  However all of these CGP-Library parameters can be changed using provided functions.  This will be demonstrated by changing the default target fitness and the frequency at which the user is updated on progress.

double targetFitness = 0.1;
int updateFrequency = 500;

Here the parameters structure is initialised.  As mentioned initialiseParameters requires the user to specify the number of chromosome inputs, nodes, outputs and each nodes arity.

Again as previously mentioned initialiseParameters sets the CGP-Library parameters to typical default values.

params = initialiseParameters(numInputs, numNodes, numOutputs, nodeArity);

One of the roles of the parameters structure is to store a function set which contains the set of possible node functions.  Newly initialised parameters contain an empty function set.  It is therefore up to the user to populate this function set with node functions.  Node functions are added to the function set using addNodeFunction.  For a complete list of possible node functions see addNodeFunction.  To clear the functions currently stored in the function set use clearFunctionSet.

The node functions used in this tutorial are: addition, subtraction, multiplication, division and sine.

addNodeFunction(params, "add,sub,mul,div,sin");

The default target fitness used by the CGP-Library is ‘0’.  This default target fitness can be altered using setTargetFitness.

setTargetFitness(params,targetFitness);

The update frequency defined in parameters specifies the number of generations which elapse between updating the user of the progress made by CGP.

The default update frequency used by the CGP-Library is ‘1’ generation.  This default value is altered using setUpdateFrequency.

setUpdateFrequency(params, updateFrequency);

The values stored in the parameters structure can be displayed to the terminal by calling printParameters.

printParameters(params);

When printed, the parameters should appear in the terminal as below.  Most of the parameters have been left as the defaults but the target fitness and update frequency have been altered.  Additionally the function set contains the the node functions specified.

For details of the parameters stored in parameters see parameters.

----------------------------------------------------
                  Parameters
----------------------------------------------------
Evolutionary Strategy:            (1+4)-ES
Inputs:                           1
Nodes:                            15
Outputs:                          1
Node Arity:                       2
Connection weights range:         +/- 1.000000
Mutation Type:                    probabilistic
Mutation rate:                    0.050000
Recurrent Connection Probability: 0.000000
Fitness Function:                 supervisedLearning
Target Fitness:                   0.100000
Selection scheme:                 selectFittest
Reproduction scheme:              mutateRandomParent
Update frequency:                 500
Function Set: add sub mul div sin (5)
----------------------------------------------------

Now the parameters associated with the CGP-Library are initialised we next need to define the task to be solved.  By default a supervised learning fitness function is used.  The supervised learning fitness function applies sets of user defined input output pairs to each chromosome and assigns a fitness proportional to how closely the chromosomes outputs match the desired target outputs.  The fitness used is the sum of the absolute differences between the generated and correct outputs over all given inputs.

Input output pairs are stored using dataSet structures which also store meta data describing the number of data samples and how many inputs and outputs each sample contains.

One method of initialising a dataSet structure is to load the dataSet from a predefined file using initialiseDataSetFromFileinitialiseDataSetFromFile returns an initialised dataSet structure containing the data in the file specified.

Here we load a data set containing 101 evenly sampled points from y = x^6 - 2x^4 + x^2 within the range x = [-5,5].  The data set is given insymbolic the /dataSets/symbolic.data directory.

trainingData = initialiseDataSetFromFile("./dataSets/symbolic.data");

In order for initialiseDataSetFromFile to parse the given file it must use the following format below.  Where the first line contains the number of inputs for each sample, followed by the number of outputs, followed by the total number of samples.  The remaining lines give each individual sample specifying the input values followed by the output values.  All values must be comma separated.

1,1,101,
-5.000000,14400.000000,
-4.900000,12712.338867,
-4.800000,11191.949219,
-4.700000,9825.367188,

After we have set up a parameters structure and a dataSet structure we now run CGP!  This is achieved using the runCGP function.  runCGP takes as augments our initialised parameters, the training dataSet and the allowed number of generations before terminating the search.  The runCGP function returns an initialised chromosome structure which contains the best found solution to the given task.

Whilst CGP is running the current progress will be displayed to the terminal at the update frequency specified in the parameters structure.

chromo = runCGP(params, trainingData, numGens);

To inspect the best found solution the returned chromosome can be displayed in the terminal using printChromosome.

printChromosome(chromo, 0);

The printed chromosome take the following form.  Each line starting with a number represents a node; where a node can be an input node or a function node.  Each function node is labelled with a textural description of the operation it undertakes i.e.  ‘add’.  This is followed by the inputs to each node.  Additionally the function nodes labelled with an ‘*’ indicate that they are active in calculating the overall program outputs.  The output nodes are given on the last line.

(0):   input
(1):   mul 0 0 *
(2):   sin 1
(3):   mul 0 0 *
(4):   add 1 2
(5):   mul 3 1 *
(6):   sub 1 5 *
(7):   mul 4 3
(8):   sub 5 1 *
(9):   mul 1 8 *
(10):  sin 7
(11):  add 10 5
(12):  sin 11
(13):  add 6 9 *
(14):  div 7 0
(15):  sub 4 13
outputs: 13

Finally at the end of the program the memory associated with the structures should be free’d using freeDataSet, freeChromosome and freeParameters.

freeDataSet(trainingData);
freeChromosome(chromo);
freeParameters(params);

And that’s it, in just 39 lines of code we have applied CGP to a symbolic regression task (and it could easily be done in less).

First download the latest CGP-Library from Download and unzip the file.
struct parameters
Stores general evolutionary and chromosome parameters used by the CGP-Library.
struct dataSet
Stores a data set which can be used by fitness functions when calculating a chromosomes fitness.
struct chromosome
Stores a CGP chromosome instances used by the CGP-Library.
DLL_EXPORT struct parameters *initialiseParameters(const int numInputs,
const int numNodes,
const int numOutputs,
const int arity)
Initialises a parameters structure used throughout the CGP-Library.
DLL_EXPORT void addNodeFunction(struct parameters *params,
char const *functionNames)
Adds pre-defined node function(s) to the set of functions stored by a parameters structure.
DLL_EXPORT void clearFunctionSet(struct parameters *params)
Resets the function set stored by a parameters structure to contain no functions.
DLL_EXPORT void setTargetFitness(struct parameters *params,
double targetFitness)
Sets the target fitness used when running CGP.
DLL_EXPORT void setUpdateFrequency(struct parameters *params,
int updateFrequency)
Sets the frequency of the updates to the user when using runCGP.
DLL_EXPORT void printParameters(struct parameters *params)
Prints the given parameters to the screen in a human readable format.
DLL_EXPORT struct dataSet *initialiseDataSetFromFile(char const *file)
Initialises a dataSet structures using the given file.
DLL_EXPORT struct chromosome* runCGP(struct parameters *params,
struct dataSet *data,
int numGens)
Applies CGP to the given task.
DLL_EXPORT void printChromosome(struct chromosome *chromo,
int weights)
Displays the given chromosome to the terminal / command prompt in a human readable format.
DLL_EXPORT void freeDataSet(struct dataSet *data)
Frees dataSet instance.
DLL_EXPORT void freeChromosome(struct chromosome *chromo)
Frees chromosome instance.
DLL_EXPORT void freeParameters(struct parameters *params)
Frees parameters structure instance.
Close