SPL Conqueror is a library to identify and visualize the influence of configuration options on non-functional properties such as performance of footprint of configurable software systems. It consists of a set of sub-projects, which we roughly explain in the following. For further details, we refer to the dedicated sections.
-
The core project ("SPLConqueror_Core") provides basic functionalities to model the variability of a software system including their configuration options and constraints among them.
-
The MachineLearning sub-project provides an algorithm to learn a performance-influence model describing the influence of configuration options on non-functional properties. It also specifies interfaces for satisfiability checking of configurations with respect to the variability model and optimization with respect to finding an optimal configuration for a given objective function and non-functional property.
-
The PyML sub-project provides a set of interface to scikit learn, which is a machine-learning framework implemented in python. Using this interface different regression techniques of scikit learn can be used.
-
The CommandLine sub-project offers an interface to automatically execute experiments using different sampling strategies and different machine-learning techniques. To specify the experiments, SPL Conqueror offers a set of commands, which we explain in the dedicated section.
-
The PerformancePrediction_GUI provides a graphical user interface to learn performance-influence models based on a learning set of configurations. Using this GUI, specific sampling strategies can be used.
-
The SPLConqueror_GUI provides a set of different visualisations that can be used to further understand a learned performance-influence model.
-
The ScriptGenerator provides an interface to generate script files that can be used in the CommandLine sub-project.
-
The VariabilityModel_GUI offers the possibility of defining a variability model of the configurable system being considered.
Note: Since mono version 5.16.0, the results between mono and .NET when performing machine learning differ.
Via Docker
We provide a Dockerfile in this repository for an automatic setup of SPL Conqueror in a Docker container. To set up the Docker container, make sure that you have properly installed Docker (https://docs.docker.com/get-started/) and the Docker daemon is running. Afterwards, the image is set up by:sudo docker build -t splconqueror ./
Based on this image, a container is created and an interactive session is started by:
sudo docker run -ti splconqueror
On Ubuntu 16.04
- Clone the git repository and its submodules. Submodules can be cloned on the command line by:
git submodule update --init --recursive
- Install Mono and MonoDevelop(Recommended: Mono-Version 5.4.1.6+ -- description available on https://www.mono-project.com/download/stable/ -- and MonoDevelop-Version 5.10.0+)
sudo apt install mono-complete monodevelop
- Start MonoDevelop and open the root project:
<SPLConquerer-GitRoot>/SPLConqueror/SPLConqueror.sln
-
Perform a right-click on every project of the solution and select the preferred target framework (e.g., .NET4.5) in Options -> Build -> General
-
Perform a right-click on the solution and select Restore NuGet packages Be aware that an internet connection is required to perform this step.
-
Build the root project
-
Optionally: To use the interface to scikit learn install Python3 along with the scikit-learn(1.0.2), numpy(1.22.2) and scipy(1.8.0) packages.
-
Optionally: To include the Variant Generator using CPlex place the required libraries in the folders SPLConqueror/packages/cplex before building the project. For SCIP place the libraries in SPLConqueror/packages/scip.
On a Mac (OS X (10.11.6))
- Clone the git repository and its submodules. Submodules can be cloned on the command line by:
git submodule update --init --recursive
-
Download and install latest Xamarin-IDE from https://www.xamarin.com
-
Start Xamarin-IDE and open the root project:
<SPLConquerer-GitRoot>/SPLConqueror/SPLConqueror.sln
-->
- Build root project
6.Optionally: To use the interface to scikit learn install Python3 along with the scikit-learn(0.20.3), numpy(1.16.2) and scipy(1.2.1) packages.
- Optionally: To include the Variant Generator using CPlex place the required libraries in the folders SPLConqueror/packages/cplex before building the project. For SCIP place the libraries in SPLConqueror/packages/scip.
On Windows 10
- Clone the git repository and its submodules. Submodules can be cloned on the command line by:
git submodule update --init --recursive
-
Install Visual Studio(Recommended: Visual Studio 2015 or newer)
-
Open Visual Studio and open the solution
-
Perform a right-click on the solution and select Restore NuGet Packages
- Build the root project
6.Optionally: To use the interface to scikit learn install Python3 along with the scikit-learn(0.20.3), numpy(1.16.2) and scipy(1.2.1) packages.
- Optionally: To include the Variant Generator using CPlex place the required libraries in the folders SPLConqueror/packages/cplex before building the project. For SCIP place the libraries in SPLConqueror/packages/scip.
Troubleshooting
1. NuGetIf the NuGet is not able to restore the packages, the following packages have to be added to the projects:
- MachineLearning:
- Accord
- Accord.Math
- SPLConqueror_GUI:
- ILNumerics (v3.3.3.0)
- SolverFoundationWrapper:
- Microsoft.Solver.Foundation (>= v3.0.0)
Additionally, if the package Microsoft.Solver.Foundation is needed, the following steps should be performed:
-
Create a directory for the dll:
mkdir "<SPLConquerer-GitRoot>/SPLConqueror/dll"
-
Copy Microsoft.Solver.Foundation.dll (>= v3.0.0) to "/SPLConqueror/dll"
SPL Conqueror provides four different graphical user interfaces.
VariabilityModel_GUI
The VariabilityModel_GUI can be used to define the variability model of a configurable system or to modify existing models. To create a new variability model for a system, fist use File>New Model. Then, an empty model containing only a root configuration option is created. New options can be added to the model by a right click on an existing option that should be the parent option of the new one. In the Create new Feature dialogue, it is possible to define whether the new option is a binary or a numeric one. For numeric options, also a minimal and maximal value of the value domain have to defined. Besides, if only a subset of all values between the minimal and the maximal value of the domain are allowed, a specific step function can be defined. In this function it is possible to use an alias for the numeric option (n). In the following, we give two examples of the step functions:
- n + 2 (using this function, only even or odd values depending on the minimal value of the value domain are allowed)
- n * 2 (using this function, the minimal value is multiplied by two until the maximal value is reached)
Additionally, constraints between different configuration options can be defined using Edit>Edit Constraints. Last, an alternative group of options can be created using Edit>Edit Alternative Groups.
An example for a variability model is given below:
<vm name="exampleVM">
<binaryOptions>
<configurationOption>
<name>xorOption1</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<impliedOptions/>
<excludedOptions>
<option>xorOption2</option>
</excludedOptions>
<optional>False</optional>
</configurationOption>
<configurationOption>
<name>xorOption2</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<impliedOptions/>
<excludedOptions>
<option>xorOption1</option>
</excludedOptions>
<optional>False</optional>
</configurationOption>
</binaryOptions>
<numericOptions>
<configurationOption>
<name>numericExample</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<impliedOptions/>
<minValue>1</minValue>
<maxValue>10</maxValue>
<stepFunction>numericExample + 2</stepFunction>
</configurationOption>
</numericOptions>
</vm>
Tags:
Name | Parent | Descriptions |
---|---|---|
vm | xml root | Variability model node |
binaryOptions | vm | Xml node containing all binary configuration option nodes |
numericOptions | vm | Xml node containing all binary configuration option nodes |
configurationOption | numericOptions/binaryOptions | Node that describes a configuration option. Contains the name, parent option, output string and prefix and postfix string for output. Nodes that describe binary options also contain information about implied or excluded configuration options. Nodes that describe numeric options contain information about min and max value as well as the step function for the value between min and max. |
name | configurationOption | Contains the name of the configuration option |
outputString | configurationOption | String that will be printed when printing the configuration option |
prefix | configurationOption | Prefix that will be attached to the output string |
postfix | configurationOption | Postfix that will be attached to the output string |
parent | configurationOption | Parent configuration option of this configuration options |
impliedOptions | configurationOption | Collection of configuration options wrapped in option nodes that have to be selected if this option is selected |
option | impliedOptions/excludedOptions | Node that wraps a configuration option |
excludedOptions | configurationOption | Collection of configuration options wrapped in option nodes that cant be selected if this option is selected |
optional | configurationOption | Node that contains information whether this option is optional or mandatory |
minValue | configurationOption | Minimum value a numeric option can assume |
maxValue | configurationOption | Maximum value a numeric option can assume |
stepFunction | configurationOption | Mathematical function that describes the values between min and max a numeric option can assume |
deselectedFlag | configurationOption | Value that will be interpreted as the deselection of a numeric configuration options. If the tag is not defined the option will be assumed mandatory |
booleanConstraints | vm | Collection of logical expressions with binary options a configuration of this model has to fulfill |
numericConstraints | vm | Collection of mathematical expressions with numeric options a configuration of this model has to fulfill |
mixedConstraints | vm | Collection of mathematical expressions with configuration options a configuration of this model has to fulfill |
constraint | booleanConstraints/numericConstraints/mixedConstraints | Wrappper for a single constraint can either be a logical expression or mathematical expression(for mixed constraints attribute exists, see below) |
Interactions can also be defined between numeric and binary configuration options in the variability model. As an example:
<mixedConstraints>
<constraint req="all" exprKind="neg">LocalMemory * bs_32x32 * pixelPerThread = 3</constraint>
</mixedConstraints>
The req attribute determines how the expression is evaluated in case not all configurations options are present or partial configurations are evaluated. req="all" results in the constraints always being true if at least one configuration options of the expression is not present in the configuration, otherwise the constraint will be evaluated as is. req="none" results in missing configuration options automatically being treated as deselected and the expression being then evaluated as is. exprKind="neg" negates the result of the evaluation, while exprKind="pos" simply uses the result of the evaluation.
PerformancePrediction_GUI
The PerformancePrediction_GUI provides an interface to learn performance-influence models. To use this GUI, first a variability model and dedicated measurements of the system has to be provided. Afterwards, in the middle are of the GUI, a binary and numeric sampling strategies has to be selected to define a set of configuration used in the learning process. To customize the machine-learning algorithm all of its parameters can be modified. To start the learning process, press the Start learning button.
Note: Please make sure that bagging will be set to false when using this GUI. If bagging is selected, a set of models are learned and all of them are presented in the GUI, which makes understanding the model hard.
After the learning is started, the models, which are learned in an iterative manner are displayed in the lower part of the GUI. Here, the model is split by the different terms, where each term described the identified influence of an individual option or an interaction between options.
SPLConqueror_GUI
This GUI can be used to visualize a learned performance-influence model.
Script generator
The Script generator can be used to define .a-script files that are needed in the CommandLine project.
The CommandLine sub-project provides the possibility to automatically execute experiments using different sampling strategies on different case study systems. To this end, a .a-script file has to be defined. In the following, we explain the different commands in detail.
Basic command-line commands
As SPL Conqueror provides a lot of commands, some of which are vital for an execution of SPL Conqueror. Unless the GUI is not used, knowing the basic command-line commands is crucial for the user.
log <path_to_a_target_file>
Using this command, the output of SPL Conqueror is redirected to the given file.
SPL Conqueror will automatically create this file if it does not existis, otherwise the file will be overwritten. Additionally, an .log_error file is created, which includes the errors during the execution.
Note: If the log
-command is missing, the output will be prompted directly to the console.
For example:
log C:\exampleLog.log
or
log /home/username/exampleLog.log
vm <path_to_model.xml>
To actually perform experiments on a given system, a variability model that covers the variability domain of the system being considered has to be defined. This can be done using the VariabilityModel_GUI.
For example:
vm C:\exampleModel.xml
or
vm /home/username/exampleModel.xml
Such a variability model generally consists of binary and numeric options, with their properties, and optionally boolean and nonBoolean constraints between configuration options and has to be in a .xml-file.
For instance, a variability model with the name exampleVM is defined as follows:
<vm name="exampleVM">
<binaryOptions>
<configurationOption>
<name>xorOption1</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<impliedOptions/>
<excludedOptions>
<option>xorOption2<option>
</excludedOptions>
<defaultValue>Selected</defaultValue>
<optional>False</optional>
</configurationOption>
<configurationOption>
<name>xorOption2</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<impliedOptions/>
<excludedOptions>
<option>xorOption1<option>
</excludedOptions>
<defaultValue>Selected</defaultValue>
<optional>False</optional>
</configurationOption>
</binaryOptions>
<numericOptions>
<configurationOption>
<name>numericExample</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<impliedOptions/>
<minValue>1</minValue>
<maxValue>10</maxValue>
<stepFunction>numericExample + 1</stepFunction>
</configurationOption>
</numericOptions>
</vm>
The nodes outputString, prefix and postfix can be ignored for now. The parent-node can either be empty or have an option-node as child with the name of the option, that is the parent of the current option(similar to excludedOption). The children, impliedOptions and excludedOptions-nodes are analogous with the exception that they can contain several options and define the children and implied options of the current option and the options that are excluded by this option if it is selected. stepFunction defines the function that decides which values the numeric option can have. For further real world examples we refer to Suplemental Material.
all <path_to_a_measurement_file>
This command defines the file containing all measurements of a given system. Exampls for this command are:
all C:\exampleMeasurements.xml
or
all /home/username/exampleMeasurements.xml
Note that you can also enter relative paths (relative to the provided a file).
For this kind of files, two different formats are supported. The first one is a .csv format. Here each line of the file contains one the measurements for one configuration of the system. This file should contain a header that defines the names of the configuration options as well as the non-functional properies being considered. The second format is a .xml format. A short example using this format is provided in the following:
<results>
<row>
<data column="Configuration">xorOption1,</data>
<data column="Variable Features">numericExample;1</data>
<data column="nfp1">1234</data>
<data column="nfp2">2345</data>
</row>
<row>
<data column="Configuration">xorOption2,</data>
<data column="Variable Features">numericExample;10</data>
<data column="nfp1">4321</data>
<data column="nfp2">5432</data>
</row>
</results>
Optionally you, in case you have knowlegde about the relative deviation, you can also provide the deviation values in a coma separated format. The highest rel. deviation value is used as metric of how accurate the learning can and therefore used as the abort error for the learning.
<results deviation="2;3;9.13">
.
.
.
</results>
Further real world examples of measurements in xml format are provided in the Suplemental Material.
Alternatively, the measurements can be provided in a csv-format. Thereby, the first row has to be a header with the name of the binary and numeric options and the names of the non functional properties. In the column of binary options there has to be either true or false, indicating whether the feature was selected in this configuration or not, and in the columns of numeric options the values that were selected in this configuration. In the columns are the values of the properties that were measured for this property. So if we format the above example in csv format:
xorOption1; | xorOption2; | numericExample; | nfp1; | nfp2; |
---|---|---|---|---|
true; | false; | 1; | 1234; | 2345; |
false; | true; | 10; | 4321; | 5432; |
Note: The element separator is ;
, whereas the line separator is \n
.
Typically, SPL Conqueror checks all configurations for its validity (i.e., do they satisfy all constraints?).
Since this is a time-consuming process, the user can disable this validity check using check:False
.
For example, reading in a csv file and disabling the validity check could look as follows:
all check:False ./measurements.csv
Before starting the learning process upon the loaded data, one can adjust the settings used for machine learning. SPL Conqueror supports multiple different settings to refine the learning. A list of all currently supported settings is presented in the following:
Name | Description | Default Value | Value Range |
---|---|---|---|
lossFunction | The loss function on which bases options and interactions are added to the influence model | RELATIVE | RELATIVE, LEASTSQUARES, ABSOLUTE |
epsilon | The epsilon within the error of the loss Function will be 0. A epsilon of 0 is equal to this feature not being present | 0 | int |
parallelization | Turns the parallel execution of model candidates on/off. | true | true, false |
bagging | Turns the bagging functionality (ensemble learning) on. This functionality relies on parallelization (may require a larger amount of memory). | false | true, false |
baggingNumbers | Specifies how often an influence model is learned based on a subset of the measurement data. | 100 | int |
baggingTestDataFraction | Specifies the percentage of data taken from the test set to be used in one learning run. | 50 | int |
useBackward | Terms existing in the model can be removed during the learning procedure if removal leads to a better model. | 50 | int |
abortError | The threshold at which the learning process stops.(abortError can also be set via measurement file, see measurement section for more information) | 1 | double |
limitFeatureSize | Terms considered during the learning procedure can not become arbitrary complex. | false | true, false |
featureSizeThreshold | The maximal number of options participating in one interaction. | 4 | int |
quadraticFunctionSupport | The learner can learn quadratic functions of one numeric option, without learning the linear function apriory, if this property is true. | true | true, false |
crossValidation | Cross validation is used during learning process if this property is true. | false | true, false |
learn-logFunction | If true, the learn algorithm can learn logarithmic functions such as log(soption1). | false | true, false |
learn-accumulatedLogFunction | Allows the creation of logarithmic functions with multiple features such as log(soption1 * soption2). | false | true, false |
learn-asymFunction | Allows the creation of functions with the form 1/soptions. | false | true, false |
learn-ratioFunction | Allows the creation of functions with the form soptions1/soptions2. | false | true, false |
learn-mirrowedFunction | Allows the creation of functions with the form (numericOption.maxValue - soptions). | false | true, false |
numberOfRounds | Defines the number of rounds the learning process have to be performed. | 70 | int |
backwardErrorDelta | Defines the maximum increase of the error when removing a feature from the model. | 1 | double |
minImprovementPerRound | Defines the minimum error in improved a round must reach before either the learning is aborted or the hierarchy is increased for hierarchy learning. In combination with withHierarchy instead of aborting learning, minImprovementPerRound results in increasing the hierachy level. | 0.1 | double |
withHierarchy | Defines whether we learn our model in hierarchical steps. | false | true, false |
bruteForceCandidates | Defines how candidate features are generated. | false | true, false |
ignoreBadFeatures | Enables an optimization: we do not want to consider candidates in the next X rounds that showed no or only a slight improvement in accuracy relative to all other candidates. | false | true, false |
stopOnLongRound | If true, stop learning if the whole process is running longer than 1 hour and the current round runs longer then 30 minutes. | true | true, false |
candidateSizePenalty | If true, the candidate score (which is an average reduction of the prediction error the candidate induces) is made dependent on its size. | true | true, false |
learnTimeLimit | Defines the time limit for the learning process. If 0, no time limit. Format: HH:MM:SS | 0 | TimeSpan |
scoreMeasure | Defines which measure is used to select the best candidate and to compute the score of a candidate. | RELERROR | RELERROR, INFLUENCE |
outputRoundsToStdout | If true, the info about the rounds is output not only to the log file at the end of the learning, but also to the stdout during the learning after each round completion. | false | true, false |
debug | Print model of selected learners in scikit-learn(currently on supports Random Forest, DecisionTree and SVR with linear kernel). | false | true,false |
pythonInfluenceAnalysis | Perform regression on the predictions of algorithms provided by scikit-learn. | false | true,false |
learn-numeric-disabled | Additionally learn the influences of the deselection of optional numeric configuration options. | true | true,false |
Generally, to change the default settings, there are two options, namely:
-
The first is to add the settings in the format
SETTING_NAME:VALUE
to the mlsettings-command. For instance, if the number of learning rounds should be reduced to 25, allow logarithmic functions and don't want to stop on long learning rounds, the associated command would be:mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
-
The second option is to define the settings in a separate text file with each line containing a single setting and its value in the format
SETTING_NAME VALUE
. This is useful to use the same machine learning settings across several different runs. Then the content of the text file for the example above should look like this:
numberOfRounds 25
learn_logFunction true
stopOnLongRound false
To load these settings, the command load-mlsettings
can be used with the path to the file with the settings as argument. For example:
load-mlsettings C:\exampleSettings.txt
Please note that all the settings that are not stated will automatically be set to the default values. So if the commands are used to change the settings several times during the same run, the previous settings have no impact on the new settings.
To learn with the data, the non functional property that will be used for the learning algorithm has to be set first. Therefore, any property can be used, which was defined previously in the measurement-file. If we use the previous example, we can either use nfp1 or nfp2. To set nfp1 or nfp2 use the nfp
command. Then the appropriate command with the argument is:
nfp nfp1
or
nfp nfp2
solver <the_solver_to_use>
Using this command, the solver is selected, which should be used to select valid configurations. Currently, the following solver can be selected:
Name | Description | Command |
---|---|---|
Microsoft Solver Foundation | The solver of the Microsoft Solver Foundation. | solver msf |
Z3 | The Z3 solver. | solver z3 |
CPLEX | Use CPLEX solver. Requires that you have a working CPLEX license, manually added the libraries to the project and compiled it with CPLEX support. | solver CPLEX |
By default, the solver from the Microsoft Solver Foundation is used to select valid configurations.
To enable learning with all measurements, use select-all-measurements true
command. After that just use the learn-splconqueror
command for learning.
For example:
log C:\exampleLog.log
vm C:\exampleModel.xml
all C:\exampleMeasurements.xml
mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
nfp nfp1
select-all-measurements true
learn-splconqueror
select-all-measurements false
To disable learning with all measurements you can use select-all-measurements false
.
The only thing missing for a very basic usage of SPL Conqueror, is displaying the learning results. For this use the analyze-learning
-command. This will print the current learning history with the learning error into the specified .log-file. Note, that each command for learning overwrites the previous learning history, so analyze-learning should always be the first command after a command for learning.
Finally, a complete basic .a-script file looks like this:
log C:\exampleLog.log
vm C:\exampleModel.xml
all C:\exampleMeasurements.xml
mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
nfp nfp1
select-all-measurements true
learn-splconqueror
select-all-measurements false
analyze-learning
SPLConqueror also supports learning on a subset of the data. Therefore, one has to set at least one sampling strategy for the binary options first and at least one for the numeric options. Numeric sampling strategies have to always start with numeric
, while binary sampling strategies have to start with binary
. In the following, we list all sampling strategies:
Binary/Numeric | Name | Description | Command | Example |
---|---|---|---|---|
Binary | allbinary | Uses all available binary options to create configurations. | binary allbinary |
binary allbinary |
Binary | featurewise | Determines all required binary options and then adds options until a valid configuration is reached. | binary featurewise |
binary featurewise |
Binary | pairwise | Generates a configuration for each pair of configuration options. Exceptions: parent-child-relationships, implication-relationships. | binary pairwise |
binary pairwise |
Binary | negfw | Get one variant per feature multiplied with alternative combinations; the variant tries to maximize the number of selected features, but without the feature in question. | binary negfw |
binary negfw |
Binary | random | Get certain number of random valid configurations. Seed sets the seed of the random number generator. The number of configurations that will be produced is set with numConfigs(Can either be an integer, or asOW/asTWX with X being an integer). If the whole population should not be computed but read in from a file, the fromFile-option should be used. | binary random seed:<int> numConfigs:<int/asOW/asTWX> fromFile:<csvFile> |
binary random seed:10 numConfigs:asTW2 |
Binary | distance-based | Creates a sample of configurations, by iteratively adding a configuration that has the maximal manhattan distance to the configurations that were previously selected. Note that this is not the distance-based sampling strategy from the ICSE paper from 2019. The distance-based sampling strategy is called distribution-aware sampling here. | binary distance-based optionWeight:<int> numConfigs:<int/asOW/asTWX> |
binary distance-based optionWeight:1 numConfigs:10 |
Binary | twise | Generates a configuration for each valid combination of a set consisting of t configuration options. Exceptions: parent-child-relationships, implication-relationships. | binary twise t:<int> |
binary twise t:3 |
Numeric | plackettburman | A description of the Plackett-Burman design is provided here. | numeric plackettburman measurements:<measurements> level:<level> |
numeric plackettburman measurements:125 level:5 |
Numeric | centralcomposite | The central composite inscribe design. This design is defined for numeric options that have at least five different values. | numeric centralcomposite |
numeric centralcomposite |
Numeric | random | This design selects a specified number of value combinations for a set of numeric options. The value combinations are created using a random selection of values of the numeric options. | numeric random sampleSize:<size> seed:<seed> |
numeric random sampleSize:50 seed:2 |
Numeric | fullfactorial | This design selects all possible combinations of numeric options and their values. | numeric fullfactorial |
numeric fullfactorial |
Numeric | boxbehnken | This is an implementation of the BoxBehnken Design as proposed in the "Some New Three Level Designs for the Study of Quantitative Variables". | numeric boxbehnken |
numeric boxbehnken |
Numeric | hypersampling | numeric hypersampling precision:<precisionValue> |
numeric hypersampling precision:25 | |
Numeric | onefactoratatime | numeric onefactoratatime distinctValuesPerOption:<values> |
numeric onefactoratatime distinctValuesPerOption:5 | |
Numeric | kexchange | numeric kexchange sampleSize:<size> k:<kvalue> |
numeric kexchange sampleSize:10 k:3 | |
Both | distribution-aware | Uses distribution-aware sampling to generate sample sets from binary and/or numeric options. This sampling strategy is published as distance-based sampling . |
hybrid distribution-aware distance-metric:<manhattan> distribution:<uniform> selection:<RandomSelection/SolverSelection> numConfigs:<number/asTW[n]> onlyNumeric:<true/false> onlyBinary:<true/false> optimization:<none/local/global> seed:<int> |
hybrid distribution-aware numConfigs:asTW3 |
Both | distribution-preserving | Uses distribution-preserving sampling to generate sample sets from binary and/or numeric options. | hybrid distribution-preserving distance-metric:<manhattan> distribution:<uniform> selection:<RandomSelection/SolverSelection> numConfigs:<number/asTW[n]> onlyNumeric:<true/false> onlyBinary:<true/false> optimization:<none/local/global> seed:<int> |
hybrid distribution-preserving numConfigs:asTW3 |
For instance, all binary options and random numeric options with a sample size of 50 and a seed of 3 should be used for learning, the following lines have to be appended to the .a-script:
binary allbinary
numeric random sampleSize:50 seed:3
If you want to use a hybrid sampling strategy instead, the following line has to be appended to the .a-script:
hybrid distribution-aware
Note: Currently, both distribution-aware
and distribution-preserving
sampling only support binary features.
Note: allbinary
in combination with fullfactorial
results in all valid measurements being taken into the sample set.
It also to consider only a subset of the configuration options for aampling. To do this, add the options that should be used in square brackets as additional argument when stating the sampling strategies. For example:
numeric random [numOpt1,numOpt2,numOpt3]
start
To learn only with a subset of the measurements, the command learn-splconqueror
can be used. This command requires having set a binary and a numeric sampling strategy, before executing it.
Note: A numeric sampling strategy is only needed if the variability model contains numeric options.
If, for instance, only a subset of the data should be used for learning, the result looks as follows:
log C:\exampleLog.log
vm C:\exampleModel.xml
all C:\exampleMeasurements.xml
mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
nfp nfp1
binary allbinary
numeric random sampleSize:50 seed:3
learn-splconqueror
analyze-learning
learn-splconqueror-opt
can be used to perform parameter optimization.
This command requires the parameter space ,that should be tested, as arguments in the form settingName=[v1,v2,v3,...,vn]
.
Additionally the following arguments are available: randomized
(use random approach instead of exhaustive search), seed:<value>
(seed for random approach) and samples
(number of settings that will be tested during random approach).
Example: learn-splconqueror-opt epsilon=[0.1,0.2,0.3,0.4] numberOfRounds=[10,20,30] randomized samples:2 seed:10
clean-sampling
Due to the different results of the sampling strategies, it is reasonable to try different sampling strategies and parameters for these strategies. To avoid having to start a new run for each sampling strategy combination, SPL Conqueror also supports clearing all strategies.
For this just use the command: clean-sampling
Of course, if someone wants to learn with a subset of the data after clearing the sampling, one has to first set sampling strategies before learning once again.
clean-learning
Under normal circumstances, SPL Conqueror cleans up the learning data itself. So handling this is usually not required, but if someone wants to forcefully clear all machine learning settings and the learned functions, the command clean-learning
could be used.
clean-global
If it is necessary to load different automation scripts in a single run of SPL Conqueror, the command clean-global
can be used, which removes all relevant data.
Note that one has to read in the variability model and the measurements again when using this command.
script <path_to_script>
Sometimes it makes sense to split up the current .a-script into smaller scripts or run a batch of scripts. For this SPL Conqueror has the script
command.
An example would be as follows:
script C:\subScript.a
It is possible to convert measurement files and variability models that contain numeric options to files that only contain binary options.
For this the convert-measurements
and convert-vm
commands are provided. These commands convert all files that match the patterns that are provided.
These commands are used as follows:
convert-[measurements/vm] <base_directory> <search_pattern_to_subdirectories> <pattern_to_find_files> <target_folder>
Note that to convert .csv measurements files, the corresponding needs to be loaded first.
An example for this command would be:
convert-measurements E:\case-studies\ system1-version1.5.* system1*.xml E:\case-studies\system1-bin\
Additional command-line commands
define-python-path <path-to-folder>
To set which python interpreter is used, use the define-python-path
command.
Note: In SPL Conqueror, only python3 is supported. Additionally, the sklearn
module (version >=0.20.3 recommended) is needed. We recommend to set up a virtual environment by using virtualenv.
learn-python <learner>
To learn with an algorithm provided by scikit-learn use the learn-python
command. Currently the SVR, DecisionTreeRegression, LinearDecisionTreeRegression, RandomForestRegressor, BaggingSVR, KNeighborsRegressor and Kernelridge learners are supported. The learning results will be written in the into the folder where the log file is located.
For more information on the algorithms see:Scikit-Learn
Further, machine-learning parameters for the individual strategies can be passed as additional arguments. The parameters have to be separated by whitespaces and each machine-learning paramter has to be passed in the form of
parameter_name:value
. The full list of the machine-learning parameters for each individual algorithm can be found in the Scikit-Learn API documentation
Typically, the final prediction results are written into a file. The path to the file is reported in the log file.
Optionally, writing the final results into a file can be turned off by providing the parameter writeFinalPredictionResults:false
.
learn-python-opt <learner> <file> <parameters>
To to find the optimal parameters for the scikit-learn algorithms use the learn-python-opt
command. Currently the SVR, DecisionTreeRegression, LinearDecisionTreeRegression, RandomForestRegressor, BaggingSVR, KNeighborsRegressor and Kernelridge learners are supported for the <learner>
parameter.
Additionally, a parameter <file>
in the form of file:<pathToCsvFile>
can be provided to write the results of the parameter optimization in a file for further analysis.
You can also provide a list of hyper parameters in <parameters>
of the form of key:[value1,value2]
and separated by spaces to consider only a subset of the hyper parameters.
For instance, if the RandomForestRegressor should only consider max_depth
and n_estimators
, you can provide these as:
learn-python-opt RandomForestRegressor n_estimators:[10,15,20] max_depth:[5,10,20]
After performing the parameter optimization, the optimal configuration is executed afterwards.
In this case, the optimal parameters will be written to the log.
If this is not intended, you can provide performWithOptimal:false
in the list of parameters (the order of the parameters does not matter).
Then, only the parameter optimization is performed.
printconfigs <file> <prefix> <postfix>
With the command printconfigs
, all sampled configurations are printed to a persistent file. The command requires a target file as first argument and optionally a prefix or prefix and postfix, that will be printed at the start or end of each configuration, respectively. A special usage of this command is printing all valid configurations of a variability model, using the allbinary
and fullfactorial
sampling strategies.
A short example using printconfigs to print all valid configurations into a text file:
vm C:\exampleVM.xml
binary allbinary
numeric fullfactorial
printconfigs C:\allConfigurations.txt prefix postfix
Until now, the elements outputString
, prefix
and postfix
of the variability model were ignored. These attributes are used by the printconfigs command and printed if the option in question is selected.
optionorder <firstOption> <secondOption> ...
In the case, that the options of a configuration should be printed in a certain order, e.g., to use the output as argument for the tested applicatin, the optionorder
command should be used, which sorts all options in the specified order and prints them.
For example:
optionorder optionC optionB optionA
<sampling strategy> validation
SPL Conqueror offers the possibility to use the validation set.
This validation set is then used to validate the learning results. In case no validation set is specified, the learning set will also be used to validate the results.
To do so, the command validation
has to be added after the sampling strategies.
For example:
allbinary validation
expdesign random sampleSize:50 seed:3 validation
truemodel <inputfile> [outputfile]
The truemodel
command offers the possibility to perform machine learning on a particular model.
Therefore, fitting is applied on the given model.
If an output path is given, the predictions by using the fitted model are written into the output path.
The model has to be stored in the given file, where each line contains one term of the model.
For example, a model with three features 'A', 'B', and 'C' could look like this:
A
C
A * B
B * C
A * B * C
evaluate-model <inputfile> [outputfile]
If one is interested to evaluate a specific model including coefficients, the evaluate-model
command can be used.
The arguments are the same as for truemodel
.
However, this function supports evaluating multiple models, where each model is written in one line of the input file.
For instance:
3 * A + 2 * A * B
1 * B + 1.5 * A * B
printsettings
Using the printsettings command, the current machine-learning settings are printed into the .log-file or ,in case yotogehteru didn't set a .log-file, into the console.
measurementstocsv <file>
In the case that the measurements should be printed to a .csv-file, the command measurementstocsv
can be used.
For example:
measurementstocsv C:\measurementsAsCSV.csv
Note: The element separator is ;
, whereas the line separator is \n
.
predict-configs-splconqueror
Predicts the nfp
value of all configurations loaded with the all
command and writes them together with the measured nfp
value and the configuration identifier in a file.
evaluationset <file>
By default, SPL Conqueror uses all measurements from the measurements-file for the computation of the error rate.
To change the evaluation set, the command evaluationset
can be used.
The file can be either a .csv-file or a .xml-file.
For example:
evaluationset C:\evaluationMeasurements.xml
Note: The format specified in the evaluation-file is the same as in the measurements-file.
Exemplary Script-Files
A .a-file contains the configuration of SPL Conqueror. If one is interested in using all measurement-data, the following .a-file could be used:# Lines containing a comment begin with '#'
# The log command and the destination file, where the learning progress should be written to
log ./learnOutput.txt
# The machine-learning settings for configuring different options for machine-learning. These are described in the documentation more precisely
mlsettings bagging:False stopOnLongRound:False parallelization:False lossFunction:RELATIVE useBackward:False abortError:10 limitFeatureSize:False featureSizeTreshold:7 quadraticFunctionSupport:True crossValidation:False learn_logFunction:True numberOfRounds:70 backwardErrorDelta:1 minImprovementPerRound:0.25 withHierarchy:False
# The path to the variability model (feature model)
vm ./VariabilityModel.xml
# The file containing all measurements needed for machine-learning
all ./measurements.xml
# The non-functional property, which was measured.
# Note that every configuration in the measurements-file needs a data-row with the attribute 'columname=Performance'
nfp Performance
# Learns with all configurations given in the measurements-file.
select-all-measurements true
learn-splconqueror
select-all-measurements false
# Cleans the sample set.
# Note that this command is needed if multiple different sampling sets are computed in one run of SPL Conqueror
clean-sampling
In SPL Conqueror, multiple different sampling strategies for binary and numeric features are implemented and can be used in the .a-file as follows:
# The first lines are the same as in the previous example
log ./learnOutput.txt
mlsettings bagging:False stopOnLongRound:False parallelization:False lossFunction:RELATIVE useBackward:False abortError:10 limitFeatureSize:False featureSizeTreshold:7 quadraticFunctionSupport:True crossValidation:False learn_logFunction:True numberOfRounds:70 backwardErrorDelta:1 minImprovementPerRound:0.25 withHierarchy:False
vm ./VariabilityModel.xml
all ./measurements.xml
nfp Performance
# Here, the binary sampling strategy FeatureWise (FW) is selected
binary featurewise
# The sampling strategy Plackett-Burman for numeric options is selected
numeric plackettburman measurements:125 level:5
# Print configurations selected by the sampling strategies in the file 'samples.txt'
printconfigs ./samples.txt
# Start learning using the sampled configurations
learn-splconqueror
# Predicts the performance value of all configurations of the measurements file using the learned performance-influence model
predict-configs-splconqueror
If multiple (different) .a-files should be executed, a super-script can be created as follows:
# Calls the first script
script ./scriptA.a
# Removes all variables related to the invocation of the first script
clean-global
# Calls the second script
script ./scriptB.a
clean-global
See the previous chapters for a more detailed description of the commands. For further examples, see the directory 'Example Files'.