`Development`: Execute only tests for changed modules in GitHub test job for non-default branches #10036

ole-ve · 2024-12-17T00:25:13Z

Checklist

General

Language: I followed the guidelines for inclusive, diversity-sensitive, and appreciative language.
I chose a title conforming to the naming conventions for pull requests.

Motivation and Context

Most often developers don't change functionality related to all of Artemis, but only to some modules. Thus, running the whole test-suite for each commit is ineffective. Instead, for pull requests etc. it might be fine to run the reduced test suite for the in-memory DB test pipeline on non-default branches. To ensure everything still works, all tests should be run for every default-branch push.

Description

Matching tests and source code

By determining the module-folders with changed files, the tests for the respective module are executed. For example changing src/main/java/de/tum/cit/aet/artemis/text/service/ABCService.java, all tests within the folder src/main/java/de/tum/cit/aet/artemis/text are executed.

This is a first step and in the future, it might be an idea to extend the reduced test set to modules on which the changed module is dependent. This however, is not that trivial because of transitivity.

Per-Module Coverage Report & Verification

When running a reduced set of tests for specific modules, we might be interested in having per-module reports and validate thresholds per-module. Validating on a module basis is not only more insightful but also required as running a reduced set of tests might lead to reduced coverage metrics causing verification to fail.

Steps for Testing

These testing steps are a bit different and don't involve Artemis.

Clone this branch locally.
Create a new branch using git checkout -b feat/test-changed-modules-1
Commit some changes mapping to two or three different modules.
Push the branch and create a (draft) pull request. Otherwise, you could add your branch name to the triggering branches in .github/workflows/test.yml
Wait for the test pipeline to finish.
Only on success: See a summary of the whole test job (including coverage metrics of changed modules).
Only on success: Verify that the mysql/postgres test pipelines are started and run all tests

Screenshots

The job summary generated on a branch where a single file in the fileupload-module has been changed.

Review Progress

Code Review

Code Review 1
Code Review 2

Manual Tests

Test 1
Test 2

Summary by CodeRabbit

New Features
- Introduced a new script to identify changed modules in a specified branch.
- Added a script for processing JaCoCo coverage reports and generating summaries.
- Registered a new task for generating test reports.
Bug Fixes
- Updated the server start count check in the log extraction script for improved error handling.
Chores
- Modified the pull request template to reflect changes in the code coverage generation path.
- Enhanced test configuration and dependency management in the build setup.
- Implemented a new Jacoco configuration for coverage thresholds across multiple modules.

…h echo

This reverts commit 786d172.

…ic modules

…icit

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

gradle/jacoco.gradle (1)
190-191: ⚠️ Potential issue

Prevent path traversal vulnerability

The module path construction should validate the module name to prevent path traversal.
+    if (!moduleName.matches("^[a-zA-Z0-9-_]+$")) {
+        throw new GradleException("Invalid module name: $moduleName")
+    }
     def modulePath = "$BasePath/$moduleName/**/*.class"
     task.sourceDirectories.setFrom(project.files("src/main/java/$modulePath"))

🧹 Nitpick comments (4)

gradle/jacoco.gradle (4)
2-5: Consider increasing the aggregated coverage thresholds

The aggregated instruction coverage threshold (0.895) seems reasonable, but allowing 56 uncovered classes might be too permissive for maintaining code quality.

Consider gradually reducing the maximum uncovered classes threshold over time to improve overall test coverage.

7-80: Review low coverage thresholds for critical modules

Several modules have notably low instruction coverage thresholds:

core: 65.7%

exercise: 64.9%

buildagent: 68.6%

These modules might be critical to the application's functionality. Consider:

Setting up a plan to gradually increase these thresholds

Adding more test coverage for these modules

Documenting why these modules have lower thresholds if there are specific technical constraints

170-170: Fix typo in task description

There's a typo in the task description:
-    task.description = "Validates JaCoCo coverage for vioalations for $moduleName"
+    task.description = "Validates JaCoCo coverage for violations for $moduleName"
184-200: Consider adding module existence validation

The prepareJacocoReportTask method should validate that the module directory exists before configuring the task.
 private void prepareJacocoReportTask(JacocoReportBase task, String moduleName, JacocoReportBase rootTask) {
+    def moduleDir = project.file("src/main/java/$BasePath/$moduleName")
+    if (!moduleDir.exists()) {
+        throw new GradleException("Module directory does not exist: $moduleDir")
+    }
     task.group = "Reporting"
     // ... rest of the method
 }

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cacdf7d and 477411e.

📒 Files selected for processing (2)

build.gradle (2 hunks)
gradle/jacoco.gradle (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

build.gradle

🔇 Additional comments (3)

gradle/jacoco.gradle (3)

98-110: LGTM! Well-structured report configuration

The report configuration:

Properly generates both XML and HTML reports

Dynamically creates per-module reports

Uses clear output paths

112-124: LGTM! Verification configuration aligns with PR objectives

The coverage verification:

Correctly handles module-specific vs. full coverage scenarios

Properly applies thresholds from configuration

Is integrated into the build lifecycle via check task

94-96: Verify JaCoCo version compatibility and security

The JaCoCo version 0.8.12 should be verified for:

Compatibility with the project's Java version

Latest security patches

✅ Verification successful

Let me check the Java version compatibility since we found that build.gradle contains the relevant configuration.

JaCoCo version 0.8.12 is appropriate and up-to-date

The verification confirms that:

JaCoCo 0.8.12 is currently the latest available version

The project uses Java 21, which is fully compatible with JaCoCo 0.8.12 (JaCoCo 0.8.11+ added support for Java 21)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check latest JaCoCo version and release notes
curl -s https://repo1.maven.org/maven2/org/jacoco/jacoco/maven-metadata.xml | grep -oP '(?<=<version>)0\.8\.[0-9]+(?=</version>)' | sort -V | tail -n 1

# Check if build.gradle specifies Java version
rg -l "sourceCompatibility|targetCompatibility" "build.gradle"
Length of output: 236

Script:
#!/bin/bash
# Check Java version configuration in build.gradle
rg "sourceCompatibility|targetCompatibility" build.gradle -A 2
Length of output: 165

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py (2)

1-4: Add type hints imports and module documentation.

Add typing imports and a module docstring to improve code maintainability:

+"""
+Process JaCoCo coverage reports and generate a summary in markdown format.
+
+This module provides functionality to:
+- Scan directories for JaCoCo coverage reports
+- Extract coverage metrics from XML reports
+- Generate a markdown summary table
+"""
+from typing import Dict, List, Union
 import os
 import xml.etree.ElementTree as ET
 import argparse

67-76: Enhance command-line interface and add edge case handling.

The main execution block could be improved with better error handling and more descriptive help messages:

 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description="Process JaCoCo coverage reports.")
-    parser.add_argument("input_directory", type=str, help="Root directory containing JaCoCo coverage reports")
-    parser.add_argument("output_file", type=str, help="Output file to save the coverage results")
+    parser.add_argument(
+        "input_directory",
+        type=str,
+        help="Root directory containing module-specific JaCoCo coverage reports (jacocoTestReport.xml)"
+    )
+    parser.add_argument(
+        "output_file",
+        type=str,
+        help="Path to the markdown file where coverage results will be saved"
+    )

     args = parser.parse_args()

     reports = get_report_by_module(args.input_directory)
+    if not reports:
+        print("No coverage reports found. Ensure the input directory contains valid JaCoCo XML reports.")
+        exit(1)
+
     coverage_by_module = extract_coverage(reports)
+    if not coverage_by_module:
+        print("Failed to extract coverage data from any of the reports.")
+        exit(1)
+
     write_summary_to_file(coverage_by_module, args.output_file)
+    print(f"Coverage summary written to {args.output_file}")

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 477411e and 1fa439c.

📒 Files selected for processing (1)

supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py

11-1111: f-string without any placeholders

Remove extraneous f prefix

(F541)

35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

🔇 Additional comments (3)

supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py (3)

1-76: Implementation aligns well with PR objectives.

The script effectively supports the goal of executing tests only for changed modules by:

Processing per-module coverage reports

Generating a clear markdown summary

Providing useful feedback about missing or invalid reports

This implementation will help developers quickly understand the coverage impact of their changes when running tests only for modified modules.

🧰 Tools

🪛 Ruff (0.8.2)

11-1111: f-string without any placeholders

Remove extraneous f prefix

(F541)

35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

5-21: 🛠️ Refactor suggestion

Add error handling and fix f-string usage.

The function needs error handling for OS operations and has an unnecessary f-string:
-def get_report_by_module(input_directory):
+def get_report_by_module(input_directory: str) -> List[Dict[str, str]]:
     results = []
+    if not os.path.isdir(input_directory):
+        raise ValueError(f"Input directory does not exist: {input_directory}")
+
     for module_folder in os.listdir(input_directory):
-        module_path = os.path.join(input_directory, module_folder)
+        try:
+            module_path = os.path.join(input_directory, module_folder)

-        if os.path.isdir(module_path):
-            report_file = os.path.join(module_path, f"jacocoTestReport.xml")
+            if os.path.isdir(module_path):
+                report_file = os.path.join(module_path, "jacocoTestReport.xml")

-            if os.path.exists(report_file):
-                results.append({
-                    "module": module_folder,
-                    "report_file": report_file
-                })
-            else:
-                print(f"No XML report file found for module: {module_folder}. Skipping...")
+                if os.path.exists(report_file):
+                    results.append({
+                        "module": module_folder,
+                        "report_file": report_file
+                    })
+                else:
+                    print(f"No XML report file found for module: {module_folder}. Skipping...")
+        except OSError as e:
+            print(f"Error accessing module {module_folder}: {e}")
+            continue

     return results
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

11-1111: f-string without any placeholders

Remove extraneous f prefix

(F541)

24-55: 🛠️ Refactor suggestion

Fix None comparisons and add type hints.

The function needs proper None comparisons and type hints:
-def extract_coverage(reports):
+def extract_coverage(reports: List[Dict[str, str]]) -> List[Dict[str, Union[str, float, int]]]:
     results = []

     for report in reports:
         try:
             tree = ET.parse(report['report_file'])
             root = tree.getroot()

             instruction_counter = root.find("./counter[@type='INSTRUCTION']")
             class_counter = root.find("./counter[@type='CLASS']")

-            if instruction_counter == None or class_counter == None:
+            if instruction_counter is None or class_counter is None:
                 continue
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff (0.8.2)

35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

SimonEntholzer

Executing tests only for the modules with changes works as described, tested it with #10101.

BBesrour

code

ole-ve · 2025-01-03T15:37:17Z

supporting_scripts/extract_number_of_server_starts.sh

 then
-  echo "The number of Server Starts should be equal to 4! Please adapt this check if the change is intended or try to fix the underlying issue causing a different number of server starts!"
+  echo "The number of Server Starts should be lower than/equals 4! Please adapt this check if the change is intended or try to fix the underlying issue causing a different number of server starts!"


when running a reduced set of tests (changed modules), we might have less different configurations -> less server starts. Therefore, we cannot compare for equality

DominikRemo

The idea of running tests only for changed modules is interesting, but I see some challenges with this approach. While it should work fine with unit tests, Artemis' integration tests often cover multiple modules. For example, an exam integration test might indirectly test parts of text exercises that aren't directly covered by their own tests.

To better assess the impact, I suggest comparing test coverage for a few selected modules. Specifically, compare the coverage of each module when running only its module-specific tests against the coverage for the same module when running the entire test suite. If coverage is significantly lower for the module-specific tests, it could suggest weaker test reliability. In that case, we should carefully weigh the benefit of faster CI feedback against the potential risk of reduced test effectiveness.

That said, exploring this approach further could still be valuable. Generating module-specific coverage reports would offer more targeted insights and, in the long term, could help improve our test suite if module-specific tests become more specialized and testing functionality becomes less dependent on seemingly unrelated tests.

Would you be open to running this coverage comparison?

ole-ve · 2025-01-03T17:25:21Z

@DominikRemo Thanks for your valuable feedback. Maybe its worth mentioning that the mysql+postgres tests are executed once the in-memory ("normal") test pipeline has succeeded. Therefore, it's a faster feedback cycle and some kind of "first check", while all tests are still ran once that succeeded.

To better assess the impact, I suggest comparing test coverage for a few selected modules. Specifically, compare the coverage of each module when running only its module-specific tests against the coverage for the same module when running the entire test suite. If coverage is significantly lower for the module-specific tests, it could suggest weaker test reliability. In that case, we should carefully weigh the benefit of faster CI feedback against the potential risk of reduced test effectiveness.

The main idea was to keep it simple. For that purpose of running change-affected tests, there are sophistic approaches like the Test Impact Analysis that was already evaluated as part of a bachelor thesis. Came to the conclusion, that it's too complex.

In a follow-up, we want to add the possibility to define dependant modules, i.e. modules that use functionality of the changed module.

That said, exploring this approach further could still be valuable. Generating module-specific coverage reports would offer more targeted insights and, in the long term, could help improve our test suite if module-specific tests become more specialized and testing functionality becomes less dependent on seemingly unrelated tests.

Regardless of whether the reduced set of tests would cover everything - in a future scenario, testing the functionality (i.e. covering) should be the responsibility of the respective module devs, not of the devs of other modules.

Would you be open to running this coverage comparison?

I general yes, but I would like to keep things rather simple and see the benefit of this changes rather in getting a fast feedback cycle, than "waterproofly" determine and run all related tests.

DominikRemo

I've looked through the workflow and since we still run all tests for MySql and Postgres / the default branch, I understand that the downsides are limited. I also really like the idea of coverage thresholds per module.

Thanks for the clarification and the improvement. Keep it up so we can see the benefit of module specific reports manifest :)

ole-ve added 30 commits December 7, 2024 15:56

Add dashboard for endpoint monitoring

30a887e

Update dashboard

f584586

Make number of top endpoints a variable

1acbe40

Debug: Append something to job summary

c9732aa

Append per-module coverage to job summary (hardcoded module)

0bb5333

Temporarily add current branch to push triggering branches

c273d60

draft for module coverage

c2d0dfc

Extract jacoco gradle config from build.gradle to jacoco.gradle

48a4396

Only test changed modules

2e4d796

Parse and append per-module coverage results

739f60a

Temporarily remove codacy coverage report upload

fd39101

Upload whole build coverage report directory

5a409be

Temporarily fix tested modules

9e7d3cf

temporarily add ArchitectureTest to have at least one test

5083184

fix GH variable name of default branch

27b68ba

clone default branch if not running for the base branch

d2a6f33

prefix branch name to compare with "origin/"

5cbeac4

remove null tasks from dependant list

f7e1408

correctly type testedModules as ArrayList

49c151f

(temporarily) directly set TEST_MODULE_TAGS instead of evaluating wit…

e3def21

…h echo

Add logging on executed test tags

43b65eb

quote TEST_MODULE_TAGS

476c792

log gradle command

786d172

Revert "log gradle command"

b31f0c8

This reverts commit 786d172.

split into includeModules and includeTags

2fd5936

change buildagent thresholds to match module-only results

f394c70

Move basePath from jacoco.gradle to build.gradle

e022592

ensure module-specific jacoco task runs after "rootTask"

7dd9730

add comment on including junit tags logic

98227e8

create jacoco report for full Artemis bundle also when testing specif…

c860849

…ic modules

github-actions bot removed the ready for review label Dec 25, 2024

ole-ve added ready for review and removed stale labels Jan 2, 2025

coderabbitai bot previously approved these changes Jan 2, 2025

View reviewed changes

Merge branch 'develop' into chore/ci-test-changed-modules

cacdf7d

ole-ve dismissed coderabbitai[bot]’s stale review via cacdf7d January 2, 2025 19:09

coderabbitai bot previously approved these changes Jan 2, 2025

View reviewed changes

fix merge aftermath

e59cea7

ole-ve dismissed coderabbitai[bot]’s stale review via e59cea7 January 2, 2025 19:46

ole-ve added 5 commits January 2, 2025 22:00

describe command on how to run tests per module

5f030df

fix JavaDoc

1cf6690

make message on skipping modules from coverage verification more expl…

2e71473

…icit

Merge branch 'refs/heads/develop' into chore/ci-test-changed-modules

0f1d918

update coverage threshold to current values

477411e

coderabbitai bot reviewed Jan 2, 2025

View reviewed changes

coderabbitai bot previously approved these changes Jan 2, 2025

View reviewed changes

add note to summary on meaning of module "test"

1fa439c

ole-ve dismissed coderabbitai[bot]’s stale review via 1fa439c January 2, 2025 22:20

fix wrong buildagent instruction coverage threshold

ea64960

coderabbitai bot reviewed Jan 2, 2025

View reviewed changes

ole-ve changed the title ~~Development: Execute tests only for changed modules on non-default branchs~~ Development: Execute only tests for changed modules in non-default GitHub test job Jan 2, 2025

ole-ve changed the title ~~Development: Execute only tests for changed modules in non-default GitHub test job~~ Development: Execute only tests for changed modules in GitHub test job for non-default branches Jan 2, 2025

ole-ve added the tests label Jan 2, 2025

SimonEntholzer approved these changes Jan 3, 2025

View reviewed changes

BBesrour approved these changes Jan 3, 2025

View reviewed changes

ole-ve commented Jan 3, 2025

View reviewed changes

DominikRemo reviewed Jan 3, 2025

View reviewed changes

DominikRemo self-requested a review January 3, 2025 17:34

DominikRemo approved these changes Jan 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Development`: Execute only tests for changed modules in GitHub test job for non-default branches #10036

`Development`: Execute only tests for changed modules in GitHub test job for non-default branches #10036

ole-ve commented Dec 17, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

SimonEntholzer left a comment

BBesrour left a comment

ole-ve Jan 3, 2025

DominikRemo left a comment •

edited

Loading

ole-ve commented Jan 3, 2025

DominikRemo left a comment

Development: Execute only tests for changed modules in GitHub test job for non-default branches #10036

Are you sure you want to change the base?

Development: Execute only tests for changed modules in GitHub test job for non-default branches #10036

Conversation

ole-ve commented Dec 17, 2024 • edited Loading

Checklist

General

Motivation and Context

Description

Matching tests and source code

Per-Module Coverage Report & Verification

Steps for Testing

Screenshots

Review Progress

Code Review

Manual Tests

Summary by CodeRabbit

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

SimonEntholzer left a comment

Choose a reason for hiding this comment

BBesrour left a comment

Choose a reason for hiding this comment

ole-ve Jan 3, 2025

Choose a reason for hiding this comment

DominikRemo left a comment • edited Loading

Choose a reason for hiding this comment

ole-ve commented Jan 3, 2025

DominikRemo left a comment

Choose a reason for hiding this comment

`Development`: Execute only tests for changed modules in GitHub test job for non-default branches #10036

`Development`: Execute only tests for changed modules in GitHub test job for non-default branches #10036

ole-ve commented Dec 17, 2024 •

edited

Loading

DominikRemo left a comment •

edited

Loading