-
Notifications
You must be signed in to change notification settings - Fork 92
Scripts
Besides the main PonyGE.py
file that can be found in the source directory, a number of extra scripts are provided with PonyGE2. These are located in the scripts
folder. These extra scripts have been designed to work either as standalone files, or to work in tandem with PonyGE2. Various functions from within these scripts can provide extra functionality to PonyGE2.
A basic experiment manager is provided in the scripts
folder. This experiment manager allows users to execute multiple evolutionary runs across multiple cores using python's multiprocessing
library. Experiments are saved in results/[EXPERIMENT_NAME]
where [EXPERIMENT_NAME]
is a parameter which specifies the name of the experiment. This can be set with the argument:
--experiment_name [EXPERIMENT_NAME]
or by setting the parameter EXPERIMENT_NAME
to [EXPERIMENT_NAME]
in either a parameters file or in the params dictionary, where [EXPERIMENT_NAME]
is a string which specifies the desired name of the experiment.
NOTE that the EXPERIMENT_NAME
parameter must be set when using the experiment manager.
The number of evolutionary runs to be executed can be set with the arguemt:
--runs [INT]
or by setting the parameter RUNS
to [INT]
in either a parameters file or in the params dictionary, where [INT]
is an integer which specifies the number of evolutionary runs to be completed. The experiment manager initialises each evolutionary run with a different unique random seed. The random seeds for a batch of evolutionary experiments are the indexes of the individual experiments (i.e. the first run will have seed 0, the second will have seed 1, and so on up to seed [INT] - 1
.)
NOTE that the experiment manager uses pythons multiprocessing
library to launch multiple runs simultaneously. As such, it is not possible to use the experiment manager with the MULTICORE
parameter set to True
. If the MULTICORE
parameter is already set to True
, it will be turned off automatically.
Since python uses the central algorithm.parameters.params
and stats.stats.stats
dictionaries to manage various aspects of individual runs, it is not currently possible to launch multiple simultaneous runs of PonyGE2 from within a Python environment as the central dictionaries would be overwritten by the concurrent processes. As such, the experiment manager calls individual PonyGE2 runs from the command line using Python's subprocess.call()
function.
NOTE that all functionality available to the main PonyGE file is available to the experiment manager, i.e. all command line arguments can be used including the specification of parameters files.
To run the experiment manager, type:
$ python scripts/experiment_manager.py --experiment_name [EXPERIMENT_NAME] --runs [INT]
where [EXPERIMENT_NAME]
is a string which specifies the desired name of the experiment and where [INT]
is an integer which specifies the number of evolutionary runs to be completed.
NOTE that since the [MULTICORE]
parameter in PonyGE2 does not work with Windows operating systems, at present the experiment manager will not work with Windows operating systems.
A basic statistics parser is included in the scripts
folder. This statistics parser can be used to generate summary .csv
files and .pdf
graphs for all stats generated by all runs saved in an [EXPERIMENT_NAME]
folder.
The statistics parser extracts the stats.tsv
files from all runs contained in the specified [EXPERIMENT_NAME]
folder. For each stat, a unique .csv
file is generated containing that statistic across all stats files. Average and standard deviations for each stat are calculated, and graphs displaying the average values (with standard deviations) across all generations are generated. Finally, the statistics parser saves a main full_stats.csv
file containing all statistics across all runs in a single file. All .csv
summary files can be used with any numerical statistics package, such as R.
While the experiment manager calls the statistics parser after all experiments have been completed, it is possible to call the statistics parser as a standalone program to generate these files for any given [EXPERIMENT_NAME]
folder. This can be done from the command line by typing:
$ python scripts/parse_stats.py --experiment_name [EXPERIMENT_NAME]
where [EXPERIMENT_NAME]
is a string which specifies the desired name of the experiment contained in the results
folder.
A powerful script that has been included with PonyGE2 is the deterministic GE LR Parser. This script will parse a given target string using a specified .bnf
grammar and will return a PonyGE2 individual that can be used in PonyGE2. Provided the target string can be fully and correctly represented by the specified grammar, the LR parser uncovers a derivation tree which matches the target string by building the overall tree from the terminals used in the solution. A repository of phenotypically correct sub-trees whose outputs match portions of the target string (termed 'snippets') is compiled. Deterministic concatenation operators are employed to build the desired solution. Provided the grammar remains unchanged, these reverse-engineered solutions can be saved and used in an evolutionary setting.
Since the GE LR Parser is fully deterministic, the same GE individual will be returned every time it is executed.
To run the GE LR Parser, only two parameters need to be specified:
--grammar_file [FILE_NAME.bnf]
--reverse_mapping_target [TARGET_STRING]
where [TARGET_STRING]
is a string specifying the target string to be parsed by the GE LR Parser.
NOTE that the full file extension for the grammar file (e.g. ".bnf") must be specified, but the full file path for the grammar file (e.g. grammars/example_grammar.bnf
) does not need to be specified.
Alternatively, both the GRAMMAR_FILE
and REVERSE_MAPPING_TARGET
can be specified in either the algorithm.parameters.params
dictionary or in a separate parameters file. An example parameters file can be seen in the parameters folder. To run this example, type:
$ python scripts/GE_LR_parse.py --parameters GE_parse.txt
Combining the GE LR Parser with the full PonyGE2 library, it is possible to parse a target string into a GE individual and then to seed an evolutionary run of PonyGE2 with that individual. Provision is made in PonyGE2 to allow for the seeding of as many target individuals as desired into an evolutionary run.
There are two ways to seed individuals into a PonyGE2 run:
If a single target phenotype string is to be included into the initial population, users can specify the argument:
--reverse_mapping_target [TARGET_STRING]
or set the parameter REVERSE_MAPPING_TARGET
to [TARGET_STRING]
in either a parameters file or in the params dictionary, where [TARGET_STRING]
is a phenoytpe string specifying the target string to be parsed by the GE LR Parser into a GE individual.
NOTE that as with the GE LR Parser described above, a compatible grammar file needs to be specified along with the target string. If the target string cannot be parsed using the specified grammar, an error will occur.
Alternatively, if one or more target individuals are to be seeded into a GE population, a folder has been made available for saving populations of desired individuals for seeding. The root directory contains a seeds
folder. Any number of desired target individuals for seeding can be saved in separate text files within a unique folder in the scripts directory. This target seed folder can be specified with the argument:
--target_seed_folder [TARGET_SEED_FOLDER]
or by setting the parameter TARGET_SEED_FOLDER
to [TARGET_SEED_FOLDER]
in either a parameters file or in the params dictionary, where [TARGET_SEED_FOLDER]
is the name of the target folder within the scripts
directory which contains target seed individuals.
PonyGE2 currently supports four formats for saving and re-loading of such individuals (examples of each are given in the seeds/example_pop
folder):
- (
example_1.txt
inseeds/example_pop
) PonyGE2 can re-load "best.txt" outputs from previous PonyGE2 runs. These files contain the saved genotypes and phenotypes of the best solution evolved over the course of an evolutionary run. Re-using these output files greatly improves the seeding process, as the genotypes can be quickly used to re-map the exact identical individual evolved by PonyGE2. If possible, this is the preferred option for seeding populations as the use of genomes to re-build previous individuals guarantees the same genetic information will be retained. - (
example_2.txt
inseeds/example_pop
) Target phenotypes can be saved as a simple text file with a single header of "Phenotype:
", followed by the phenotype string itself on the following line. The phenotype will then be parsed into a PonyGE2 individual using the GE LR Parser. - (
example_3.txt
inseeds/example_pop
) Target genotypes can be saved as a simple text file with a single header of "Genotype:
", followed by the genotype itself on the following line. The genotype will then be mapped into a PonyGE2 individual using the normal GE mapping process. As with option 1 above, this will result in an identical PonyGE2 individual being re-created from the specified genome. - (
example_4.txt
inseeds/example_pop
) Target phenotypes can be saved as a simple text file where the only content of the file is the phenotype string itself (i.e. no descriptive text, headers, comments, etc). The content of these files will then be parsed into PonyGE2 individuals using the GE LR Parser.
NOTE that the names of individual files contained in a specified target population folder in the seeds
directory do not matter. These files can be named however so desired.
NOTE that as with the GE LR Parser described above, a compatible grammar file needs to be specified along with the target seeds
folder. If the target string cannot be parsed using the specified grammar, an error will occur. If the target genotype results in a different phenotype to that specified, an error will occur.
NOTE that at present, phenotypes spanning multiple lines can only be parsed correctly using file format 4 above, i.e. the phenotype string constitutes the sole information in the file. If a genotype exists for such phenotypes, best practice is to use the genotype to seed the solution using file format 3 above, i.e. discard the phenotype string and allow the genotype to re-produce it.
All initialisation techniques existing in PonyGE2 are compatible for seeding evolutionary runs with target individuals. However, an additional initialisation option is included which may be of some use in the case of Genetic Improvement. An option is available to initialise the entire population with only identical copies of the specified seed individual (or individuals). If only one target seed is specified, the initial population will consist of POPULATION_SIZE
copies of that individual. If multiple target seeds are specified, the initial population will consist of equal amounts of copies of each specified seed. This option can be specified with the argument:
--initialisation seed_individuals
or by setting the parameter INITIALISATION
to seed_individuals
in either a parameters file or in the params dictionary.
It is possible to set the random seeds for the various random number generators (RNGs) used by PonyGE2 in order to exactly re-create any given evolutionary run. All PonyGE2 runs by default save their random seeds. By simply specifying the argument:
--random_seed [RANDOM_SEED]
or by setting the parameter RANDOM_SEED
to [RANDOM_SEED]
in either a parameters file or in the params dictionary, where [RANDOM_SEED]
is an integer which specifies the desired random seed.
At present, the main branch of PonyGE2 only uses two RNGs:
- The core Python
random
module, and - The numpy
np.random
module.
Both of these RNGs are set using the same seed. When the RANDOM_SEED
parameter is set, and provided the grammar, fitness function, and all parameters remain unchanged, then PonyGE2 will produce identical results to any previous run executed using this random seed.
An example parameters file for seeding runs with a number of individuals has been included in the parameters folder under seed_run_target.txt
. An example folder named example_pop
with a range of compatible formatting types for seeding target solutions is included in the seeds
directory.