Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development: Execute only tests for changed modules in GitHub test job for non-default branches #10036

Open
wants to merge 78 commits into
base: develop
Choose a base branch
from

Conversation

ole-ve
Copy link
Contributor

@ole-ve ole-ve commented Dec 17, 2024

Checklist

General

Motivation and Context

Most often developers don't change functionality related to all of Artemis, but only to some modules. Thus, running the whole test-suite for each commit is ineffective. Instead, for pull requests etc. it might be fine to run the reduced test suite for the in-memory DB test pipeline on non-default branches. To ensure everything still works, all tests should be run for every default-branch push.

Description

Matching tests and source code

By determining the module-folders with changed files, the tests for the respective module are executed. For example changing src/main/java/de/tum/cit/aet/artemis/text/service/ABCService.java, all tests within the folder src/main/java/de/tum/cit/aet/artemis/text are executed.

This is a first step and in the future, it might be an idea to extend the reduced test set to modules on which the changed module is dependent. This however, is not that trivial because of transitivity.

Per-Module Coverage Report & Verification

When running a reduced set of tests for specific modules, we might be interested in having per-module reports and validate thresholds per-module. Validating on a module basis is not only more insightful but also required as running a reduced set of tests might lead to reduced coverage metrics causing verification to fail.

Steps for Testing

These testing steps are a bit different and don't involve Artemis.

  1. Clone this branch locally.
  2. Create a new branch using git checkout -b feat/test-changed-modules-1
  3. Commit some changes mapping to two or three different modules.
  4. Push the branch and create a (draft) pull request. Otherwise, you could add your branch name to the triggering branches in .github/workflows/test.yml
  5. Wait for the test pipeline to finish.
  6. Only on success: See a summary of the whole test job (including coverage metrics of changed modules).
  7. Only on success: Verify that the mysql/postgres test pipelines are started and run all tests

Screenshots

The job summary generated on a branch where a single file in the fileupload-module has been changed.

coverage-summary

Review Progress

Code Review

  • Code Review 1
  • Code Review 2

Manual Tests

  • Test 1
  • Test 2

Summary by CodeRabbit

  • New Features

    • Introduced a new script to identify changed modules in a specified branch.
    • Added a script for processing JaCoCo coverage reports and generating summaries.
    • Registered a new task for generating test reports.
  • Bug Fixes

    • Updated the server start count check in the log extraction script for improved error handling.
  • Chores

    • Modified the pull request template to reflect changes in the code coverage generation path.
    • Enhanced test configuration and dependency management in the build setup.
    • Implemented a new Jacoco configuration for coverage thresholds across multiple modules.

ole-ve added 30 commits December 7, 2024 15:56
This reverts commit 786d172.
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 2, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
gradle/jacoco.gradle (1)

190-191: ⚠️ Potential issue

Prevent path traversal vulnerability

The module path construction should validate the module name to prevent path traversal.

+    if (!moduleName.matches("^[a-zA-Z0-9-_]+$")) {
+        throw new GradleException("Invalid module name: $moduleName")
+    }
     def modulePath = "$BasePath/$moduleName/**/*.class"
     task.sourceDirectories.setFrom(project.files("src/main/java/$modulePath"))
🧹 Nitpick comments (4)
gradle/jacoco.gradle (4)

2-5: Consider increasing the aggregated coverage thresholds

The aggregated instruction coverage threshold (0.895) seems reasonable, but allowing 56 uncovered classes might be too permissive for maintaining code quality.

Consider gradually reducing the maximum uncovered classes threshold over time to improve overall test coverage.


7-80: Review low coverage thresholds for critical modules

Several modules have notably low instruction coverage thresholds:

  • core: 65.7%
  • exercise: 64.9%
  • buildagent: 68.6%

These modules might be critical to the application's functionality. Consider:

  1. Setting up a plan to gradually increase these thresholds
  2. Adding more test coverage for these modules
  3. Documenting why these modules have lower thresholds if there are specific technical constraints

170-170: Fix typo in task description

There's a typo in the task description:

-    task.description = "Validates JaCoCo coverage for vioalations for $moduleName"
+    task.description = "Validates JaCoCo coverage for violations for $moduleName"

184-200: Consider adding module existence validation

The prepareJacocoReportTask method should validate that the module directory exists before configuring the task.

 private void prepareJacocoReportTask(JacocoReportBase task, String moduleName, JacocoReportBase rootTask) {
+    def moduleDir = project.file("src/main/java/$BasePath/$moduleName")
+    if (!moduleDir.exists()) {
+        throw new GradleException("Module directory does not exist: $moduleDir")
+    }
     task.group = "Reporting"
     // ... rest of the method
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cacdf7d and 477411e.

📒 Files selected for processing (2)
  • build.gradle (2 hunks)
  • gradle/jacoco.gradle (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • build.gradle
🔇 Additional comments (3)
gradle/jacoco.gradle (3)

98-110: LGTM! Well-structured report configuration

The report configuration:

  • Properly generates both XML and HTML reports
  • Dynamically creates per-module reports
  • Uses clear output paths

112-124: LGTM! Verification configuration aligns with PR objectives

The coverage verification:

  • Correctly handles module-specific vs. full coverage scenarios
  • Properly applies thresholds from configuration
  • Is integrated into the build lifecycle via check task

94-96: Verify JaCoCo version compatibility and security

The JaCoCo version 0.8.12 should be verified for:

  1. Compatibility with the project's Java version
  2. Latest security patches
✅ Verification successful

Let me check the Java version compatibility since we found that build.gradle contains the relevant configuration.


JaCoCo version 0.8.12 is appropriate and up-to-date

The verification confirms that:

  • JaCoCo 0.8.12 is currently the latest available version
  • The project uses Java 21, which is fully compatible with JaCoCo 0.8.12 (JaCoCo 0.8.11+ added support for Java 21)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check latest JaCoCo version and release notes
curl -s https://repo1.maven.org/maven2/org/jacoco/jacoco/maven-metadata.xml | grep -oP '(?<=<version>)0\.8\.[0-9]+(?=</version>)' | sort -V | tail -n 1

# Check if build.gradle specifies Java version
rg -l "sourceCompatibility|targetCompatibility" "build.gradle"

Length of output: 236


Script:

#!/bin/bash
# Check Java version configuration in build.gradle
rg "sourceCompatibility|targetCompatibility" build.gradle -A 2

Length of output: 165

coderabbitai[bot]
coderabbitai bot previously approved these changes Jan 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py (2)

1-4: Add type hints imports and module documentation.

Add typing imports and a module docstring to improve code maintainability:

+"""
+Process JaCoCo coverage reports and generate a summary in markdown format.
+
+This module provides functionality to:
+- Scan directories for JaCoCo coverage reports
+- Extract coverage metrics from XML reports
+- Generate a markdown summary table
+"""
+from typing import Dict, List, Union
 import os
 import xml.etree.ElementTree as ET
 import argparse

67-76: Enhance command-line interface and add edge case handling.

The main execution block could be improved with better error handling and more descriptive help messages:

 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description="Process JaCoCo coverage reports.")
-    parser.add_argument("input_directory", type=str, help="Root directory containing JaCoCo coverage reports")
-    parser.add_argument("output_file", type=str, help="Output file to save the coverage results")
+    parser.add_argument(
+        "input_directory",
+        type=str,
+        help="Root directory containing module-specific JaCoCo coverage reports (jacocoTestReport.xml)"
+    )
+    parser.add_argument(
+        "output_file",
+        type=str,
+        help="Path to the markdown file where coverage results will be saved"
+    )

     args = parser.parse_args()

     reports = get_report_by_module(args.input_directory)
+    if not reports:
+        print("No coverage reports found. Ensure the input directory contains valid JaCoCo XML reports.")
+        exit(1)
+
     coverage_by_module = extract_coverage(reports)
+    if not coverage_by_module:
+        print("Failed to extract coverage data from any of the reports.")
+        exit(1)
+
     write_summary_to_file(coverage_by_module, args.output_file)
+    print(f"Coverage summary written to {args.output_file}")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 477411e and 1fa439c.

📒 Files selected for processing (1)
  • supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py

11-1111: f-string without any placeholders

Remove extraneous f prefix

(F541)


35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)


35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

🔇 Additional comments (3)
supporting_scripts/code-coverage/per_module_cov_report/parse_module_coverage.py (3)

1-76: Implementation aligns well with PR objectives.

The script effectively supports the goal of executing tests only for changed modules by:

  1. Processing per-module coverage reports
  2. Generating a clear markdown summary
  3. Providing useful feedback about missing or invalid reports

This implementation will help developers quickly understand the coverage impact of their changes when running tests only for modified modules.

🧰 Tools
🪛 Ruff (0.8.2)

11-1111: f-string without any placeholders

Remove extraneous f prefix

(F541)


35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)


35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)


5-21: 🛠️ Refactor suggestion

Add error handling and fix f-string usage.

The function needs error handling for OS operations and has an unnecessary f-string:

-def get_report_by_module(input_directory):
+def get_report_by_module(input_directory: str) -> List[Dict[str, str]]:
     results = []
+    if not os.path.isdir(input_directory):
+        raise ValueError(f"Input directory does not exist: {input_directory}")
+
     for module_folder in os.listdir(input_directory):
-        module_path = os.path.join(input_directory, module_folder)
+        try:
+            module_path = os.path.join(input_directory, module_folder)

-        if os.path.isdir(module_path):
-            report_file = os.path.join(module_path, f"jacocoTestReport.xml")
+            if os.path.isdir(module_path):
+                report_file = os.path.join(module_path, "jacocoTestReport.xml")

-            if os.path.exists(report_file):
-                results.append({
-                    "module": module_folder,
-                    "report_file": report_file
-                })
-            else:
-                print(f"No XML report file found for module: {module_folder}. Skipping...")
+                if os.path.exists(report_file):
+                    results.append({
+                        "module": module_folder,
+                        "report_file": report_file
+                    })
+                else:
+                    print(f"No XML report file found for module: {module_folder}. Skipping...")
+        except OSError as e:
+            print(f"Error accessing module {module_folder}: {e}")
+            continue

     return results

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

11-1111: f-string without any placeholders

Remove extraneous f prefix

(F541)


24-55: 🛠️ Refactor suggestion

Fix None comparisons and add type hints.

The function needs proper None comparisons and type hints:

-def extract_coverage(reports):
+def extract_coverage(reports: List[Dict[str, str]]) -> List[Dict[str, Union[str, float, int]]]:
     results = []

     for report in reports:
         try:
             tree = ET.parse(report['report_file'])
             root = tree.getroot()

             instruction_counter = root.find("./counter[@type='INSTRUCTION']")
             class_counter = root.find("./counter[@type='CLASS']")

-            if instruction_counter == None or class_counter == None:
+            if instruction_counter is None or class_counter is None:
                 continue

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff (0.8.2)

35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)


35-3535: Comparison to None should be cond is None

Replace with cond is None

(E711)

@ole-ve ole-ve changed the title Development: Execute tests only for changed modules on non-default branchs Development: Execute only tests for changed modules in non-default GitHub test job Jan 2, 2025
@ole-ve ole-ve changed the title Development: Execute only tests for changed modules in non-default GitHub test job Development: Execute only tests for changed modules in GitHub test job for non-default branches Jan 2, 2025
@ole-ve ole-ve added the tests label Jan 2, 2025
Copy link
Contributor

@SimonEntholzer SimonEntholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing tests only for the modules with changes works as described, tested it with #10101.

Copy link
Member

@BBesrour BBesrour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code

then
echo "The number of Server Starts should be equal to 4! Please adapt this check if the change is intended or try to fix the underlying issue causing a different number of server starts!"
echo "The number of Server Starts should be lower than/equals 4! Please adapt this check if the change is intended or try to fix the underlying issue causing a different number of server starts!"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when running a reduced set of tests (changed modules), we might have less different configurations -> less server starts. Therefore, we cannot compare for equality

Copy link
Contributor

@DominikRemo DominikRemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of running tests only for changed modules is interesting, but I see some challenges with this approach. While it should work fine with unit tests, Artemis' integration tests often cover multiple modules. For example, an exam integration test might indirectly test parts of text exercises that aren't directly covered by their own tests.

To better assess the impact, I suggest comparing test coverage for a few selected modules. Specifically, compare the coverage of each module when running only its module-specific tests against the coverage for the same module when running the entire test suite. If coverage is significantly lower for the module-specific tests, it could suggest weaker test reliability. In that case, we should carefully weigh the benefit of faster CI feedback against the potential risk of reduced test effectiveness.

That said, exploring this approach further could still be valuable. Generating module-specific coverage reports would offer more targeted insights and, in the long term, could help improve our test suite if module-specific tests become more specialized and testing functionality becomes less dependent on seemingly unrelated tests.

Would you be open to running this coverage comparison?

@ole-ve
Copy link
Contributor Author

ole-ve commented Jan 3, 2025

@DominikRemo Thanks for your valuable feedback. Maybe its worth mentioning that the mysql+postgres tests are executed once the in-memory ("normal") test pipeline has succeeded. Therefore, it's a faster feedback cycle and some kind of "first check", while all tests are still ran once that succeeded.

To better assess the impact, I suggest comparing test coverage for a few selected modules. Specifically, compare the coverage of each module when running only its module-specific tests against the coverage for the same module when running the entire test suite. If coverage is significantly lower for the module-specific tests, it could suggest weaker test reliability. In that case, we should carefully weigh the benefit of faster CI feedback against the potential risk of reduced test effectiveness.

The main idea was to keep it simple. For that purpose of running change-affected tests, there are sophistic approaches like the Test Impact Analysis that was already evaluated as part of a bachelor thesis. Came to the conclusion, that it's too complex.

In a follow-up, we want to add the possibility to define dependant modules, i.e. modules that use functionality of the changed module.

That said, exploring this approach further could still be valuable. Generating module-specific coverage reports would offer more targeted insights and, in the long term, could help improve our test suite if module-specific tests become more specialized and testing functionality becomes less dependent on seemingly unrelated tests.

Regardless of whether the reduced set of tests would cover everything - in a future scenario, testing the functionality (i.e. covering) should be the responsibility of the respective module devs, not of the devs of other modules.

Would you be open to running this coverage comparison?

I general yes, but I would like to keep things rather simple and see the benefit of this changes rather in getting a fast feedback cycle, than "waterproofly" determine and run all related tests.

@DominikRemo DominikRemo self-requested a review January 3, 2025 17:34
Copy link
Contributor

@DominikRemo DominikRemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through the workflow and since we still run all tests for MySql and Postgres / the default branch, I understand that the downsides are limited. I also really like the idea of coverage thresholds per module.

Thanks for the clarification and the improvement. Keep it up so we can see the benefit of module specific reports manifest :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready For Review
Development

Successfully merging this pull request may close these issues.

4 participants