Implementation of RETECS and data sets for the paper Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration.
Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes and traceability links between code and tests are not available. This paper introduces RETECS, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize round-trip-times between code commits and developer feedback on failed test cases. The RETECS method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the RETECS method learns to prioritize error-prone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying RETECS on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing.
retecs.py
is the main file of the project.
It provides command line arguments to control the execution.
Run ./retecs.py -h
for an overview of all arguments and a description.
Common configuration is located in run_experiment_common.py
, but can be overwritten from each experiment file.
USE_LATEX
: If True, figures are plotted by xelatex (has to be installed separately), else the standard matplotlib backend is used (default: False)DATA_DIR
: Directory to store raw results of experiments (default: RESULTS)FIGURE_DIR
: Directory to store exported figures from evaluation (default: RESULTS)ITERATIONS
: Number of times the experiment is repeated (default: 30)PARALLEL
: Run iterations in parallel (default: True)PARALLEL_POOL_SIZE
: Number of parallel executions (default: 6)
-
(Optionally, but recommended) Create a virtual environment and activate it
$ virtualenv2 venv_retecs
and$ source venv_retecs/bin/activate
-
Install requirements
$ pip install -r requirements.txt
-
Run an experiment: a) All experiments:
$ ./run_all_experiments.sh
or b) a specific experiment: Call experiment file, e.g.$ ./run_experiment_rq1.py
Build the image: $ docker build -t retecs .
Run all experiments: $ docker run -it --rm -v "$PWD":/usr/src/app -w /usr/src/app --name run_all_experiments retecs ./run_all_experiments.sh
Run a specific experiment (replace $EXPERIMENT with the experiment file): $ docker run -it --rm -v "$PWD":/usr/src/app -w /usr/src/app --name $EXPERIMENT retecs ./$EXPERIMENT
, e.g. $ docker run -it --rm --name run_experiment_rq1.py -v "$PWD":/usr/src/app -w /usr/src/app retecs ./run_experiment_rq1.py
Results are placed in the local RESULTS directory.
File/Directory | Content |
---|---|
DATA | Industrial data sets: Paint Control, IOF/ROL, and GSDTSR (including file format description) |
RESULTS | Results of experimental runs are placed here, both raw data and visualization |
requirements.txt | Python requirements of RETECS |
retecs.py | Main file for training and running |
reward.py | Reward functions |
run_all_experiments.sh | Run all experiments (calls each experiment script below) |
run_experiment_common.py | Common configuration and functions for experiment runners |
run_experiment_rq0_history_length.py | Evaluate history length influence |
run_experiment_rq0_network_hidden_nodes.py | Evaluate sizes of neural network |
run_experiment_rq0_reward_function.py | Compare reward functions on same data set |
run_experiment_rq0_scheduling_time.py | Evaluate different times for test execution per CI cycle |
run_experiment_rq0_tableau_action_size.py | Evaluate different numbers of actions for the Tableau-based RL agent |
run_experiment_rq1.py | Run all combinations of RL agents and reward functions on three data sets, evaluate their learning performance |
run_experiment_rq2.py | Run all combinations of RL agents and reward functions on three data sets, evaluate their performance compared to basic prioritization functions |
scenarios.py | Main Class 'IndustrialDataSetProvider': Reads data sets, provides CI cycles, evaluates created schedules |
stats.py | Helper functions for evaluation |
run_experiment_rq1.py
and run_experiment_rq2.py
run the same experiments, but their evaluation differs.
To save computational cost, the actual experiment execution in run_experiment_rq2.py
is disabled, but this requires run_experiment_rq1.py
to be run before.