-
Notifications
You must be signed in to change notification settings - Fork 1
/
10-testing.Rmd
381 lines (267 loc) · 15.2 KB
/
10-testing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
# Testing
## Introduction
Our testing framework includes several types of tests. Unit and functional tests are created for individual functions and modules, respectively, where unit tests test each of the smallest components and functional tests ensure that the application of small components function within the requirements. Integration tests verify that modules work well together. Run-time tests include checks that are integrated into the code to catch errors related to user input. Regression testing and platform compatibility testing are executed before pre-releasing FIMS to ensure that previously tested pieces are still performing after the code has been changed on all platforms of interest. Beta-testing is used to gather feedback from users (i.e., members of FIMS Implementation Team and other users) during the pre-release stage. Finally, one-off testing is used for replicating and fixing user-reported bugs. Some of these one-off tests may eventually be integrated into the permanent testing framework. Or, if while you are coding you find a useful interactive tests that helped you build something, then it should be converted into {testthat} or GoogleTest tests.
FIMS uses GoogleTest to build a C++ unit testing framework and {testthat} to build an R testing framework. FIMS uses Google Benchmark to measure the real time and CPU time used for running the produced binaries. All required software for testing can be installed using the instructions in the [Software to install section](#software-to-install).
## C++ unit testing and benchmarking
Inside of your cloned version of FIMS, the file `CMakeLists.txt`, in the top-level directory, instructs `Cmake` on how to create the build files, including setting up Google Test. The Google Test code is in `tests/gtest`. Within this subdirectory is another `CMakeLists.txt` that contains additional specifications on how to register the individual tests.
### Build and run the tests
The following commands on the command line (note that Windows users cannot use Git bash and must use a native Windows shell) execute the outlined steps and are needed to build the tests:
1. Generate the build system using `Ninja` as the generator, which creates the `build` directory.
1. Use `Cmake` to build in the `build` directory in parallel using 16 jobs but `--parallel 16` can be deleted.
1. Run the tests using `ctest` in parallel using 16 jobs but `--parallel 16` can be deleted.
```bash
cmake -S . -B build -G Ninja
cmake --build build --parallel 16
ctest --test-dir build --parallel 16
```
The output from running the tests should look something like the following:
```bash
Internal ctest changing into directory: C:/github_repos/NOAA-FIMS_org/FIMS/build
Test project C:/github_repos/NOAA-FIMS_org/FIMS/build
Start 1: dlognorm.use_double_inputs
1/5 Test #1: dlognorm.use_double_inputs ....... Passed 0.04 sec
Start 2: dlognorm.use_int_inputs
2/5 Test #2: dlognorm.use_int_inputs .......... Passed 0.04 sec
Start 3: modelTest.eta
3/5 Test #3: modelTest.eta .................... Passed 0.04 sec
Start 4: modelTest.nll
4/5 Test #4: modelTest.nll .................... Passed 0.04 sec
Start 5: modelTest.evaluate
5/5 Test #5: modelTest.evaluate ............... Passed 0.04 sec
100% tests passed, 0 tests failed out of 5
```
### Adding a C++ test
Create a file dlognorm.hpp within the `src` subfolder that contains a simple function:
```c
#include <cmath>
template<class Type>
Type dlognorm(Type x, Type meanlog, Type sdlog){
Type resid = (log(x)-meanlog)/sdlog;
Type logres = -log(sqrt(2*M_PI)) - log(sdlog) - Type(0.5)*resid*resid - log(x);
return logres;
}
```
Then, create `dlognorm-unit.cpp` in `tests/gtest` that has a test suite for the `dlognorm` function:
```c
#include "gtest/gtest.h"
#include "../../src/dlognorm.hpp"
// # R code that generates true values for the test
// dlnorm(1.0, 0.0, 1.0, TRUE) = -0.9189385
// dlnorm(5.0, 10.0, 2.5, TRUE) = -9.07679
namespace {
// TestSuiteName: dlognormTest; TestName: DoubleInput and IntInput
// Test dlognorm with double input values
TEST(dlognormTest, DoubleInput) {
EXPECT_NEAR( dlognorm(1.0, 0.0, 1.0) , -0.9189385 , 0.0001 );
EXPECT_NEAR( dlognorm(5.0, 10.0, 2.5) , -9.07679 , 0.0001 );
}
// Test dlognorm with integer input values
TEST(dlognormTest, IntInput) {
EXPECT_NEAR( dlognorm(1, 0, 1) , -0.9189385 , 0.0001 );
}
}
```
`EXPECT_NEAR(val1, val2, absolute_error)` verifies that the difference between `val1` and `val2` does not exceed the absolute error bound `absolute_error`. `EXPECT_NE(val1, val2)` verifies that `val1` is not equal to `val2`. See GoogleTest [assertions reference](https://google.github.io/googletest/reference/assertions.html) for more `EXPECT_` macros.
Finally, the test must be added to `tests/gtest/CMakeLists.txt` before running the tests. Add the following contents to the end of `tests/gtest/CMakeLists.txt` to enable testing in CMake, declare the C++ test
binary you want to build (dlognorm_test), and link it to GoogleTest (gtest_main). :
```cmake
add_executable(dlognorm_test
dlognorm-unit.cpp
)
target_include_directories(dlognorm_test
PUBLIC
${CMAKE_SOURCE_DIR}/../
)
target_link_libraries(dlognorm_test
gtest_main
)
include(GoogleTest)
gtest_discover_tests(dlognorm_test)
```
Now you can build and run your test using the [instructions in Build and run the tests](#build-and-run-the-tests). The output when running `ctest` will look something like the following, note that there is a failing test:
```bash
Internal ctest changing into directory: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
Test project C:/Users/Kathryn.Doering/Documents/testing/FIMS/build
Start 1: dlognorm.use_double_inputs
1/7 Test #1: dlognorm.use_double_inputs ....... Passed 0.04 sec
Start 2: dlognorm.use_int_inputs
2/7 Test #2: dlognorm.use_int_inputs .......... Passed 0.04 sec
Start 3: modelTest.eta
3/7 Test #3: modelTest.eta .................... Passed 0.04 sec
Start 4: modelTest.nll
4/7 Test #4: modelTest.nll .................... Passed 0.04 sec
Start 5: modelTest.evaluate
5/7 Test #5: modelTest.evaluate ............... Passed 0.04 sec
Start 6: dlognormTest.DoubleInput
6/7 Test #6: dlognormTest.DoubleInput ......... Passed 0.04 sec
Start 7: dlognormTest.IntInput
7/7 Test #7: dlognormTest.IntInput ............***Failed 0.04 sec
86% tests passed, 1 tests failed out of 7
Total Test time (real) = 0.28 sec
The following tests FAILED:
7 - dlognormTest.IntInput (Failed)
Errors while running CTest
Output from these tests are in: C:/Users/Kathryn.Doering/Documents/testing/FIMS/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
```
### Debugging a C++ test
There are two ways to debug a C++ test, interactively using `gdb` or via print statements.
To debug C++ code (e.g., segmentation error/memory corruption) using gdb:
```bash
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Debug
cmake --build build --parallel 16
ctest --test-dir build --parallel 16
gdb ./build/tests/gtest/population_dynamics_population.exe
c // to continue without paging
run // to see which line of code is broken
print this->log_naa // for example, print this->log_naa to see the value of log_naa;
print i // for example, print i from the broken for loop
bt // backtrace
q // to quit
```
To debug C++ code with print statements you must update the C++ in the desired `.hpp` file by adding `std::ofstream out(“file_name.txt”)` and using `out << variable;` to print out values of the variable. The output of the print statements will be in `FIMS/build/tests/gtest/debug.txt` after you run the `cmake` and `ctest` calls the [instructions in Build and run the tests](#build-and-run-the-tests).
```c
nfleets = fleets.size();
std::ofstream out("debug.txt");
out <<nfleets;
```
More complex examples with text identifying the quantities
```c
out <<" fleet_index: "<<fleet_index<<" index_yaf: "<<index_yaf<<" index_yf: "<<index_yf<<"\n";
out <<" population.Fmort[index_yf]: "<<population.Fmort[index_yf]<<"\n";
```
### Benchmark example
[Google Benchmark](https://github.com/google/benchmark) measures the real time and CPU time used for running the produced binary. We will continue using the dlognorm.hpp example. Create `dlognorm_benchmark.cpp` in `tests/gtest` with the following code to run the `dlognorm` function and use BENCHMARK to see how long it takes.
```c
#include "benchmark/benchmark.h"
#include "../../src/dlognorm.hpp"
void BM_dlgnorm(benchmark::State& state)
{
for (auto _ : state)
dlognorm(5.0, 10.0, 2.5);
}
BENCHMARK(BM_dlgnorm);
```
Next, the following needs to be added to the end of `tests/gtest/CMakeLists.txt`:
```cmake
FetchContent_Declare(
googlebenchmark
URL https://github.com/google/benchmark/archive/refs/tags/v1.6.0.zip
)
FetchContent_MakeAvailable(googlebenchmark)
add_executable(dlognorm_benchmark
dlognorm_benchmark.cpp
)
target_include_directories(dlognorm_benchmark
PUBLIC
${CMAKE_SOURCE_DIR}/../
)
target_link_libraries(dlognorm_benchmark
benchmark_main
)
```
To run the benchmark, run cmake sending output to the build subfolder and run the created executable:
```bash
cmake --build build
build/tests/gtest/dlognorm_benchmark.exe
```
The output from `dlognorm_benchmark.exe` might look like this:
```bash
Run on (8 X 2112 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
BM_dlgnorm 153 ns 153 ns 4480000
```
### Clean up after running C++ tests
After running the examples above, the build generates files (i.e., the source code, libraries, and executables) and saves the files in the `build` subfolder. The example above demonstrates an "out-of-source" build which puts generated files in a completely separate directory, so that the source tree is unchanged after running tests. Using a separate source and build tree reduces the need to delete files that differ between builds. If you still would like to delete CMake-generated files, just delete the `build` folder, and then build and run tests by repeating the commands below. The files from the `build` folder are included in the FIMS repository's .gitignore file, so should *not* be pushed to the FIMS repository.
For simple C++ functions like the examples above, we do not need to clean up the tests. Clean up is only necessary in a few situations.
- If memory for an object was allocated during testing and not deallocated then the object needs to be deleted (e.g., `delete object`).
- If you used a test fixture from GoogleTest to use the same data configuration for multiple tests, `TearDown()` can be used to clean up the test and then the test fixture will be deleted. See [GoogleTest user's guide](https://google.github.io/googletest/primer.html#same-data-multiple-tests) for more details.
- If you do not want to keep any of the files produced by the example and want to completely clear any uncommitted changes and files from the git repo, run `git restore .`, which removes any committed changes in files tracked by git or `git clean -fd` to get remove all untracked files in the repository.
## Templates for GoogleTest testing
This section includes templates for creating unit tests and benchmarks.
This is the code that would go into the .cpp files in tests/gtest.
### C++ test templates
#### Unit test template
```c
#include "gtest/gtest.h"
#include "../../src/code.hpp"
// # R code that generates true values for the test
namespace {
// Description of Test 1
TEST(TestSuiteName, Test1Name) {
... test body ...
}
// Description of Test 2
TEST(TestSuiteName, Test2Name) {
... test body ...
}
}
```
#### Benchmark template
```c
#include "benchmark/benchmark.h"
#include "../../src/code.hpp"
void BM_FunctionName(benchmark::State& state)
{
for (auto _ : state)
// This code gets timed
Function()
}
// Register the function as a benchmark
BENCHMARK(BM_FunctionName);
```
#### `tests/gtest/CMakeLists.txt` template
These lines are added each time a new test, e.g., `TestSuiteName1`, is added:
```c
// Add test suite 1
add_executable(TestSuiteName1
test1.cpp
)
target_link_libraries(TestSuiteName1
gtest_main
)
gtest_discover_tests(TestSuiteName1)
```
These lines are added each time a new benchmark, e.g., `benchmark1`, is added:
```c
// Add benchmark 1
add_executable(benchmark1
benchmark1.cpp
)
target_link_libraries(benchmark1
benchmark_main
)
```
## R testing
R tests are written using {testthat}, which can be [installed as an R package](https://testthat.r-lib.org/). More details on {testthat} can be found in the [testing chapter of R packages](https://r-pkgs.org/tests.html).
### Testing FIMS locally
To test FIMS R functions, interactively and locally, use `devtools::install()` rather than `devtools::load_all()` because using `load_all()` will turn on the debugger, bloating the .o file and may lead to a compilation error (see [Fixing Fatal Error](#fixing-fatal-error) for more information).
### Testing using `gdbsource`
You can interactively debug C++ code using `TMB::gdbsource()` in RStudio.
### and file organization
- Group functions and their helpers together, i.e., the "main function plus helpers" approach.
- {testthat} tests that are a test of Rcpp code should be called `test-rcpp-[description].R`.
- Integration tests that do not have a corresponding `.R` file should use the follwing convention: `test-integration-[description].R`.
### R template
Naming conventions for {testthat} files follow the [tidyverse test convention](https://style.tidyverse.org/tests.html) as well as the using `test-rcpp-[description].R` for tests of Rcpp code and `test-integration-[description].R` for integration tests without a corresponding R function.
The format for an individual testthat test is is:
```r
test_that("TestName", {
...test body...
})
```
### Random numbers
Simulation results might be dependent on the order of calls, leading to failed tests just because different random numbers are used or the order of the simulation changes through model
development (see [FIMS-planning discussion 25](https://github.com/NOAA-FIMS/FIMS-planning/issues/25) for details). Below are some potential soluations.
- Add a TRUE/FALSE parameter in each FIMS simulation module for
setting up testing seed. When testing the module, set the parameter to
TRUE to fix the seed number in R and conduct tests. If adding a TRUE/FALSE parameter does not work as expected, then carefully check simulated data from each component and make sure it is not a model coding error.
- Use `set.seed()` from R to set the seed and investigate using {rstream} to generate multiple streams of random numbers to associate distinct streams of random numbers with different sources of randomness. {rstream} was specifically designed to address the issue of needing very long streams of pseudo-random numbers for parallel computations. See [the rstream paper](https://www.iro.umontreal.ca/~lecuyer/myftp/papers/rstream.pdf) and [RngStreams](http://www-labs.iro.umontreal.ca/~lecuyer/myftp/streams00/) for more details.