Multiple Threads

This tutorial introduces and steps through a simple program which instructs the CGP-Library to use multiple threads.

One of the advantages of evolutionary algorithms is that they are “embarrassingly parallel”.  For instance the fitness of each member of a population can be calculated in parallel.  Additionally when conducing multiple runs to assess Average Behaviour each individual run can be conducted in parallel.

Summary
Multiple ThreadsThis tutorial introduces and steps through a simple program which instructs the CGP-Library to use multiple threads.
The ProgramA simple program which demonstrates using multiple threads with the CGP-Library.
Stepping Through the CodeA close look at each line of the example code.

The Program

A simple program which demonstrates using multiple threads with the CGP-Library.  The program is also provided in the CGP-Library download within /examples/MultipleThreads.c and is included in the code::blocks project.

Note: on windows in order to use the CGP library with multiple threads you must link against openMP.  To enable this in code::blocks see the following video https://www.youtube.com/watch?v=p0woqF2XCeg.  If you want to enable multiple threads using the compiled CGP-Library.dll you must first recompile the .dll whilst linking against openMP.  To recompile the CGP .dll select the compile dll build target from within code:blocks and press compile; again code::blocks must be set up to link against openMP.  Under linux ‘make so’ automatically links against openMP.  If you want to compile cgp.c and cgp.h yourself and use multiple threads link against openMP by adding the ‘-fopenmp’ flag to gcc.

#include <stdio.h>
#include <time.h>
#include "../src/cgp.h"

/*
    Custom fitness function which just does some work and returns 10.
*/
double fitnessFunction(struct parameters *params, struct chromosome *chromo, struct dataSet *data) {

    int i;
    double inputs[1] = {0.5};
    int numExec = 10000;

    for (i = 0; i < numExec; i++) {
        executeChromosome(chromo, inputs);
    }

    return 10;
}


int main(void) {

    time_t timeStart, timeEnd;
    double singleThreadTime, multipleThreadTime;
    double seedUp;

    struct parameters *params = NULL;

    int numInputs = 1;
    int numNodes = 100;
    int numOutputs = 1;
    int arity = 2;

    int gens = 100;
    int runs = 10;

    params = initialiseParameters(numInputs, numNodes, numOutputs, arity);
    setRandomNumberSeed(123456789);
    addNodeFunction(params, "add,sub,mul,div,sin");
    setMutationRate(params, 1.0);
    setCustomFitnessFunction(params, fitnessFunction, "FF");

    setNumThreads(params, 1);
    timeStart = time(NULL);
    repeatCGP(params, NULL, gens, runs);
    timeEnd = time(NULL);
    singleThreadTime = difftime(timeEnd, timeStart);

    setNumThreads(params, 4);
    timeStart = time(NULL);
    repeatCGP(params, NULL, gens, runs);
    timeEnd = time(NULL);
    multipleThreadTime = difftime(timeEnd, timeStart);

    seedUp = 100 * (singleThreadTime - multipleThreadTime) / multipleThreadTime;

    printf("Single thread time: %.f seconds\n", singleThreadTime);
    printf("Multiple thread time: %.f seconds\n", multipleThreadTime);
    printf("Speed up = %.f%%\n", seedUp);

    freeParameters(params);

    return 0;
}

Stepping Through the Code

A close look at each line of the example code.  First a number of header files are included including cgp.h for the CGP-Library and time.h so the the speed up of using multiple threads can be measured.

#include <stdio.h>
#include <time.h>
#include "../src/cgp.h"

A fitness function is also defined which executes the given chromosomes 10,000 times in order to simulate a typical computationally intensive fitness evaluation.

double fitnessFunction(struct parameters *params, struct chromosome *chromo, struct dataSet *data) {

    int i;
    double inputs[1] = {0.5};
    int numExec = 10000;

    for (i = 0; i < numExec; i++) {
        executeChromosome(chromo, inputs);
    }

    return 10;
}

In the main function a number of variables are declared and initialised including setting up a CGP parameters structure which is configured to use the custom fitness function.

time_t timeStart, timeEnd;
double singleThreadTime, multipleThreadTime;
double seedUp;

struct parameters *params = NULL;

int numInputs = 1;
int numNodes = 100;
int numOutputs = 1;
int arity = 2;

int gens = 100;
int runs = 10;

params = initialiseParameters(numInputs, numNodes, numOutputs, arity);
setRandomNumberSeed(123456789);
addNodeFunction(params, "add,sub,mul,div,sin");
setMutationRate(params, 1.0);
setCustomFitnessFunction(params, fitnessFunction, "FF");

Next the number of threads used by the CGP-Library is set as one (which is also the default).  This is done using the setNumThreads function.

Once the number of threads has been set as one the time it takes to run CGP 10 times via a call to repeatCGP is recorded in the singleThreadTime variable.

setNumThreads(params, 1);
timeStart = time(NULL);
repeatCGP(params, NULL, gens, runs);
timeEnd = time(NULL);
singleThreadTime = difftime(timeEnd, timeStart);

Next the number of threads is increased to four and the time taken to run CGP 10 times is once again recorded in the multipleThreadTime variable.

setNumThreads(params, 4);
timeStart = time(NULL);
repeatCGP(params, NULL, gens, runs);
timeEnd = time(NULL);
multipleThreadTime = difftime(timeEnd, timeStart);

The speed up in run time due to utilising the number threads is then calculated and displayed.  When this was ran on a CrayPat/X super computer there was a >250% speed up in run time.  However typical improvements are around 100% i.e. twice as fast.

Note: Simply using more threads does not always improve runtime.  A general rule of thumb is to use as many threads as you have logical CPU cores i.e. two on a dual core processor or four on a quad core processor.

seedUp = 100 * (singleThreadTime - multipleThreadTime) / multipleThreadTime;

printf("Single thread time: %.f seconds\n", singleThreadTime);
printf("Multiple thread time: %.f seconds\n", multipleThreadTime);
printf("Speed up = %.f%%\n", seedUp);

Finally the memory allocated to the parameters structure is freed and the program ends.

freeParameters(params);

And that’s it, By simply setting the number of threads via a call to setNumThreads the performance of the CGP-Library can be significantly increased.

This tutorial introduces the use of repeatCGP to assess the average behaviour of CGP towards a given application rather than a one-off instance.
DLL_EXPORT void setNumThreads(struct parameters *params,
int numThreads)
Sets the number of threads in the given parameters.
DLL_EXPORT struct results* repeatCGP(struct parameters *params,
struct dataSet *data,
int numGens,
int numRuns)
Repeatedly applies CGP to the given task.
Close