Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize and update test documentation - Part 1 Explanation section #4686

Merged
merged 29 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
157bbcd
Centralize and update test documentation
Ndacyayisenga-droid Jul 25, 2023
dff8c5c
draft aqa-explanation.md docs
Ndacyayisenga-droid Jul 26, 2023
e2e2f69
include files
Ndacyayisenga-droid Jul 26, 2023
893f0a8
Delete aqa-explanation.markdown
Ndacyayisenga-droid Jul 26, 2023
cb113d0
Delete aqa-howto.markdown
Ndacyayisenga-droid Jul 26, 2023
875f970
Delete aqa-reference.markdown
Ndacyayisenga-droid Jul 26, 2023
43d3b11
Delete aqa-tutorials.markdown
Ndacyayisenga-droid Jul 26, 2023
3ffb4f6
Edit docs
Ndacyayisenga-droid Jul 27, 2023
2a826ec
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 28, 2023
ff43f33
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 28, 2023
c72d9d3
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 28, 2023
581134d
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 28, 2023
09a5454
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 28, 2023
5c8c66e
Update docs/pages/Scope.md
Ndacyayisenga-droid Jul 28, 2023
0b7918c
Update docs/pages/Scope.md
Ndacyayisenga-droid Jul 28, 2023
23c7fcd
Update docs/pages/Scope.md
Ndacyayisenga-droid Jul 28, 2023
b3df090
Update docs/pages/Scope.md
Ndacyayisenga-droid Jul 28, 2023
97364a8
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 28, 2023
4974071
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 28, 2023
1b1db7b
Update docs/pages/Scope.md
Ndacyayisenga-droid Jul 28, 2023
1888420
Update docs/pages/Scope.md
Ndacyayisenga-droid Jul 28, 2023
46d1919
Update docs/pages/Scope.md
Ndacyayisenga-droid Jul 28, 2023
4235fbf
Update docs
Ndacyayisenga-droid Jul 28, 2023
1ffa885
Update docs/pages/Manifesto.md
Ndacyayisenga-droid Jul 31, 2023
f70bcfc
Update docs/pages/LayeredDesign.md
Ndacyayisenga-droid Jul 31, 2023
9c32606
Update docs/pages/Manifesto.md
Ndacyayisenga-droid Jul 31, 2023
65c06f2
Update docs
Ndacyayisenga-droid Jul 31, 2023
57ecdd2
Update docs/pages/Scope.md
Ndacyayisenga-droid Aug 2, 2023
b45548a
Update Scope.md
Ndacyayisenga-droid Aug 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/diagrams/LayeredDesign_3LayerCake.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 32 additions & 0 deletions docs/pages/LayeredDesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## Why AQAvit Uses the '3 Layer Cake' Architecture

The Adoptium Quality Assurance (AQA) process plays a crucial role in ensuring the reliability, stability, and security of OpenJDK builds distributed by the Eclipse Adoptium project. AQA employs the '3 Layer Cake' architecture, consisting of three layers - Test Result Summary Service (TRSS), CI Systems, and TaskKitGen (TKG). This documentation explores the rationale behind the adoption of the '3 Layer Cake' architecture and its significance in delivering high-quality Java runtimes.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

### Overview of the '3 Layer Cake' Architecture:

![LayeredDesign_3LayerCake](../diagrams/LayeredDesign_3LayerCake.jpg)

The '3 Layer Cake' architecture in Adoptium AQA comprises three interconnected layers, each responsible for specific aspects of testing and build validation:
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

- **Test Result Summary Service (TRSS):** TRSS serves as the top layer and is responsible for monitoring multiple CI servers, generating graphical aggregate summaries, providing deep historical data, and offering search, sort, and filter capabilities. It also supports pluggable parsers, forming the basis for deep analytics and deep learning services.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

- **CI Systems:** The middle layer consists of various Continuous Integration (CI) systems such as Jenkins, Tekton, Azure DevOps (AzDo), GitHub Actions, and others. CI systems schedule regular builds, distribute tests across multiple nodes and platforms, and provide basic GUI views of test results. They also enable basic forms of parallelization.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

- **TaskKitGen (TKG):** The bottom layer, TKG, categorizes logically, generates test targets based on playlists (level/platform/version/implementation-specific), and executes tests via common command-line or make target patterns. TKG adds tests through auto-detected directories, standardizes exclusion, and dynamically generates test playlists for smart parallelization.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
### The Rationale Behind '3 Layer Cake' Architecture:

The adoption of the '3 Layer Cake' architecture in Adoptium AQA is driven by several key reasons:
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

- **Modular Testing Approach:** The '3 Layer Cake' architecture provides a modular testing approach, dividing responsibilities among three layers. This modularity allows for easy maintenance, upgrades, and improvements of individual layers, promoting flexibility and adaptability to evolving testing needs.

- **Comprehensive Testing Coverage:** Each layer in the architecture addresses specific aspects of testing. TRSS focuses on summary services and deep analytics, CI Systems handle continuous integration and basic GUI views, and TKG is responsible for test categorization and execution. This layered approach ensures comprehensive testing coverage of OpenJDK builds.

- **Efficient Bug Resolution:** The architecture's layered design allows for efficient bug resolution. Issues identified in a specific layer can be isolated and addressed within that layer, streamlining the bug-fixing process and reducing turnaround time.

- **Scalability and Platform Flexibility:** The '3 Layer Cake' architecture can scale effortlessly to accommodate varying testing requirements and support multiple CI systems. Its platform flexibility ensures compatibility with different operating systems, architectures, and cloud environments, enhancing the build's portability.

Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
- **Data-Driven Decisions:** The TRSS layer provides graphical aggregate summaries and deep historical data, enabling data-driven decisions in the AQA process. This data-driven approach allows the AQA team to monitor trends, identify patterns, and make informed decisions for further improvement.

### In Conclusion:
The '3 Layer Cake' architecture employed in Adoptium AQA is a strategic choice that enhances the reliability, scalability, and efficiency of the testing process for OpenJDK builds. By dividing responsibilities among TRSS, CI Systems, and TKG, Adoptium AQA achieves comprehensive testing coverage, modularity, and platform flexibility. This architecture ensures the delivery of high-quality Java runtimes, reinforcing the Eclipse Adoptium project's commitment to excellence and building trust among developers and users worldwide.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
190 changes: 190 additions & 0 deletions docs/pages/Manifesto.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

[1]https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
-->

# Adoptium Quality Assurance (AQA) Definition v1.0

### Table of contents

1. [Introduction](#introduction)
2. [Guidelines for AQA](#guidelines)
1. [Open and Transparent](#openTransparent)
2. [Diverse and Robust](#diverseRobust)
1. [Functional and Regression Tests](#functionalRegression)
2. [External Application Tests](#externalApp)
3. [Performance Benchmarks](#performance)
4. [Security Tests](#security)
3. [Evolution alongside Implementations](#evolution)
1. [Continual Investment](#continualInvest)
2. [Process to Modify](#processToModify)
3. [Codecov and other Metrics](#metrics)
4. [Comparative Analysis](#comparativeAnalysis)
4. [Portable](#portable)
5. [Tag and Publish](#tagPublish)
3. [Summary](#summary)

<a name="introduction"></a>

## 1 Introduction

At the Adoptium project, we are committed to producing high-quality OpenJDK binaries, which we call Eclipse Temurin, for consumption by the community. After speaking with different implementers, and listening to the needs of our community, developers and enterprise consumers alike, we have heard very clearly the desire to guarantee a certain level of quality. We organise this quality assurance work within the Eclipse AQAvit project, which is named by the Adoptium Quality Assurance (AQA) acronym, in combination with 'vit' and its dual meaning for speed and vitality.

Quality Assurance means, “Make quality certain to happen”. Our goal is to make it certain that the project delivers high-quality binaries, that satisfy enterprise-grade requirements. Enterprises and developers alike require that the binaries be:
- Secure – ensure binaries are not susceptible to known security vulnerabilities
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
- Functional – runs the apps they care about, on platforms they care about
- Performant – meets or exceeds a set performance bar for various types of operations
- Scalable – can span and scale to enterprise workloads

<a name="guidelines"></a>

## 2 Guidelines for AQA

<a name="openTransparent"></a>

### 2.1 Open & Transparent
We believe open languages deserve open tests. This means test source should be open and tests should be executed in the open with their results openly available.

Transparent processes strengthen confidence. Consumers get to see test results directly and know a certain quality bar was achieved, rather than just be told some testing was done. Open tests get scrutinized, get fixed, get loved.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

Open testing engages more people and helps to drive innovation and build community. Being able to see and compare test results across implementations also creates a healthy and competitive ecosystem that ultimately benefits all.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

<a name="diverseRobust"></a>

### 2.2 Diverse & Robust Set of Test Suites
We want a diverse and robust set of test suites to fulfill enterprise and developer requirements.
These tests should cover different categories including functional/regression, security, performance, scalability, load/stress. The tests also need to span different JDK versions, implementations, platforms, and applications.
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved

<a name="functionalRegression"></a>

#### 2.2.1 Functional and Regression Tests
We currently utilize both the OpenJDK regression test suite and the Eclipse OpenJ9 functional test suite. While there remains some effort to segregate the portions of these test suites that are VM-implementation specific (OpenJDK functional originally written to target Hotspot VM, Openj9 functional originally written to target Openj9 VM), we see there is a great number of tests that are implementation agnostic and can be used to verify different implementations.

Both of these test groups, which we call “openjdk” and “functional” are typically unit tests designed to verify the APIs and features of the JDK. By thorough coverage of the APIs and coverage of the JEPs/JSRs, we identify interoperability issues and consistency results. Both test suites are continuously augmented at their source projects/repositories, OpenJDK and Eclipse OpenJ9 respectively. For this reason, we have chosen to include portions of these as part of AQA, since the tests are being kept current and relevant to the changes to the binaries we test.

While there are different JDK implementations (ranging from different garbage collection policies to different code generation options to different JVM technologies), they all draw from OpenJDK. Functional correctness and consistency can be measured through a common set of quality metrics.
2.2.2 System and Load Tests
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
This category of testing includes tests designed and written from a system-wide perspective. Quality engineers have written these tests from a consumer perspective, designing common customer scenarios based on feedback and observation from service professionals and consumer feedback. Importantly, this test group also includes load and stress testing which is important to enterprise consumers. These tests ask the question, “what level of load can this binary take before it breaks?”.

Some of these load and stress tests fire up thousands of threads and iterate 10’s of thousands of times. The tests can also be tuned, so that as binaries become more resilient, we can increase the load to further stress and push the load bar higher.

<a name="externalApp"></a>

#### 2.2.3 External Application Tests
This test category includes various suites, at present, functional and smoke tests from a set of third-party applications and Microprofile TCKs (run on different implementations of application server).

Current External Applications Suites being run at AQAvit can be found within adoptium/aqa-tests/external and include third-party applications / running Functional and Smoke tests, and application servers / running Microprofile TCKs.

These tests are run in containers based from Temurin and optionally alternate vendor docker images in order to both exercise those images from docker hub, and to verify these third-party applications work well against them. Application suites are selected for both their real-world popularity and because they offer interesting workloads that best exercise an OpenJDK binary. This suite is expandable and there are more applications in the plan captured in adoptium/aqa-tests issue #172.

We are interested in ensuring these suites run well. We want to engage and share with application communities, but more importantly, we aim to demonstrate to enterprise consumers the correct execution of key applications.

<a name="performance"></a>

#### 2.2.4 Performance Tests
Performance benchmarks are important verification tools to compare binaries against an acceptable baseline. Consumers of our binaries require them to be performant. This category of tests includes microbenchmarks and large open-source benchmark suites.

In preparing AQA, we ask how can we run performance benchmarks against Temurin and other OpenJDK binaries and analyze and compare results? What information is coming out of some common open-source benchmarks and why might it be interesting?

The challenge of selecting a set of benchmarks is the varied performance metrics are often in opposition with each other, throughput versus footprint being the classic example. When adding benchmarks, it will be important to clarify what the benchmark measures and how we run the benchmark (how many repetitions, with what command line options was it run, what gc policies were used, etc). We will favour benchmarks that are not implementation-specific, but rather gather a broad set of metrics that enterprise customers may expect in various deployment scenarios.

Along with the output and ‘score’ from the benchmark itself, the metrics gathered from a performance test run should also include the calculated confidence interval along with information about the machine upon which the test was run. We need to be confident that we have a repeatable measure, or at least understand in what ways a particular benchmark score can vary.

We will set baseline scores for the performance benchmarks included in AQA that binaries must meet or exceed in order to pass the performance bar.

<a name="security"></a>

#### 2.2.5 Security Tests
While the regression and functional suites contain many security tests, we intend to increase the level of security tests run against the Temurin and other OpenJDK binaries. We will include open security test suites that test for known vulnerabilities. We also intend to use fuzzers to search for security vulnerabilities.

<a name="evolution"></a>

### 2.3 Evolution Alongside Implementations

<a name="continualInvest"></a>

#### 2.3.1 Continual Investment
Tests (like the products they verify) need to continuously evolve & change. This is not a small effort, so is best achieved when we coordinate our efforts with many like-minded quality professionals and developers on a common goal of quality assurance.
Tests require maintenance and care. We want to continuously improve and become more effective and efficient. This includes:
Refining automation and tools
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
Automate re-inclusions upon fixes
Remove friction, make testing easier
Reduce process, make tools simpler

<a name="processToModify"></a>

#### 2.3.2 Process to Modify
As described in more detail in Section 2.5, the AQA will be a versioned, reproducible set of suites. As new OpenJDK versions with their new features come along, so do new or updated test suites. We may also gather metrics and additional information that will inform us that we should revise the AQA. For these reasons, the test suites that form the AQA need to be reviewed and modified on occasion. This process will be lead by the TSC test leads and should include:
Review/assess existing & new tests on a regular basis (initially as a quarterly review)
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
Community awareness of major additions/modifications/deletions to the AQA
Update the current active set of tests (fixes, levels)
The project should also produce guides for plugging in new suites, the test source plus build file and playlist files, essential instructions for how to build and execute new test material. In this way, interested parties may introduce new test suites for trial and review and possible inclusion into an AQA update.

To summarize this section on the process to modify, the selection criteria would favour test suites that address the secure, functional, performant and scalable requirements best. We aim to improve code coverage in prioritized areas of the heat map (to best reflect real-world API usage), broaden execution of edge cases, test for newly identified vulnerabilities and new features in later releases.

<a name="metrics"></a>

#### 2.3.3 Codecov & Other Metrics
We should continually review the value of the tests and processes that we employ. The project should gather data to measure the effectiveness of tests. This data helps inform our process of improvement and change.

Metrics of interest are:
- Heat maps
- List of most used APIs to evaluate gaps and risk and prioritize testing
- Code coverage
- Set bar to stay above a certain score (especially for priority areas in the heat map)
- Shows gaps, limited, code coverage does not equate to functional coverage
Ndacyayisenga-droid marked this conversation as resolved.
Show resolved Hide resolved
- Bug prediction
- List of most changed files, bugs often to occur in more changed areas
- Bug injection, mutation testing (in the plan, to measure the quality of our testing)
- Comparative analysis
- Test-the-tests (Section 2.3.4)

In summary, we will gather metrics for the purpose of improving test material and our ability to assure quality.

<a name="comparativeAnalysis"></a>

#### 2.3.4 Comparative Analysis
We can employ a “test-the-tests” mechanism, running tests and seeing how they perform across implementations/versions etc. This allows for a repeatable pattern to follow when triaging test results, look first at the failure and look to see if it fails across versions, platforms and implementations to hone in on root cause. We can also employ tools for a ‘diff’ of test results, to compare across the variations that we encounter at the project.

One of the greatest benefits that the work encapulated by AQAvit offers is that we test many different implementations, versions and platforms in one spot, making it easy to compare. This comparison informs stakeholders, enterprises, open-source communities, and developers on the qualities of a particular binary as it compares to others. Stakeholders have answers to questions like:
- how did each implementation fair against particular test criteria?
- how stable/fast/secure is the new release?

<a name="portable"></a>

### 2.4 Portable
Test portability is related to an open and transparent statement. The tests that form the AQA need to be easily run:
- on newly-added platforms
- against new JDK versions
- by any developer on a laptop via command line
- in any distributors' continuous integration (CI) server
Portable tests allow for both easier upkeep and maintenance but also faster evolution of the AQA. This also allows developers to reproduce test failures, make changes and retest, reducing turnaround time when incorporating fixes and features.

<a name="tagPublish"></a>

### 2.5 Tag & Publish
We want consumers of our binaries to be able to see what tests were run against them, and how the binaries scored against a rigorous onslaught of tests. Consumers should be able to get a Bill of Materials (BoM), essentially a listing of the test materials used to verify an OpenJDK binary.

Per test run:
- Set the bar by which binaries are judged (target pass rate)
- Determining which failures/exclude ‘block’ releases

Ability to associate binaries to:
- Test summary (pass/fail/excluded totals)
- BoM - list of SHAs of every test repo used
- Downloadable relevant test artifacts
- Badge/indicator on website marking binaries that pass the AQA
The goal would be the ability for 100% reproducible test results per release. Anyone should be able to grab a binary and the test artifacts including the BoM and reproduce the same test set that was originally run at the project.

<a name="summary"></a>

## 3 Summary
Our goals and intention for AQA is to provide the community with a valuable quality assurance toolkit and set a high bar for OpenJDK binaries being produced and distributed. We believe that working together on this toolkit in an open and collaborative environment will be beneficial to all implementers, developers, consumers, and stakeholders within the open community.
Loading