-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MBUILDCACHE-104] Allow multiple cache entries per checksum #175
base: master
Are you sure you want to change the base?
Conversation
443d29d
to
4ef63e5
Compare
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a descriptive class annotation will be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please provide more details on the reasons behind this change? What specific situations or scenarios require introduction of these zones? Personally, I would prefer to initiate a discussion via email with members of the community first in order to gain a better understanding of the problem and to establish a broad consensus on the solution before proceeding with implementation.
@AlexanderAshitkin the use case was already in https://issues.apache.org/jira/browse/MBUILDCACHE-104 . I paste it here:
Of course, I can still open a discussion on the mailing list for such a large change. |
I'm trying to understand that is causing issues and what type of issues it is, in this case with different
Different plugin parameters result in different hash sums, and these different checksums are stored separately. If there are 3 different build configurations (though profiles or command line parameters) with 3 different groups, it yields 3 different cache records under different checksums. This makes different checksums equivalent to different zones. Another point that's unclear to me is the statement:
I don't see any problem using classifiers or some other method to differentiate artifacts stored in the build. If we split build in logical parts, we must consider the case of the next plugins, which depend on the "zoned" cache records. And it must run consistently with the non-cached build. From what I understand now, this scenario is already supported. There are two options:
We also should consider, that if the plugins in questions are not leaf plugins. Overall, I would like to better understand the problem, starting with the build configuration and the use case. |
This is only true if the plugin parameter is tracked by the effective pom. By default, it is not the case (I use maven flags to pass those parameters). Plus I track those parameters via the reconciliation tag.
I want a My CI flow is
When I said |
Please clarify:
You can move the parameter to the effective pom. One way to do this is by using a profile that defines a literal property value and referencing that value in a plugin. By doing so, the value will be interpolated to the plugin parameters and become part of the effective pom. |
Let me try to be more concrete by exposing a simplified example. I have a maven project with a single module. In this module, I have many prod classes and many slow tests. My initial naive CI (Jenkins) pipeline is a single step: My CI can execute jobs among multiple machines. So I decide to distribute the pipelines of my tests among my CI machines. I ajust my build cache config as follow: <?xml version="1.0" encoding="UTF-8" ?>
<cache xmlns="http://maven.apache.org/BUILD-CACHE-CONFIG/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/BUILD-CACHE-CONFIG/1.0.0 https://maven.apache.org/xsd/build-cache-config-1.0.0.xsd">
<configuration>
<remote>
<url>XXX</url>
</remote>
<attachedOutputs>
<dirNames>
<dirName>classes</dirName>
<dirName>test-classes</dirName>
</dirNames>
</attachedOutputs>
</configuration>
<executionControl>
<reconcile>
<plugins>
<plugin artifactId="maven-surefire-plugin" goal="test">
<reconciles>
<reconcile propertyName="skipTests" skipValue="true"/>
<reconcile propertyName="groups"/>
</reconciles>
</plugin>
</plugins>
</reconcile>
</executionControl>
</cache> My new pipeline is:
I want:
Yes
(was Yes) Sorry, no.
Not necessarily. You could have a test group spanning across multiple modules.
Yes I know, and as I said I don't want a different checksum between all those steps. |
4ef63e5
to
001fffb
Compare
@reda-alaoui I am currently trying to better understand this request, and I have some conceptual concerns. To enhance the build time, traditional approach would be to divide the project into smaller modules and allow tests to run in parallel. The propozed As i see, this change aims to address the following Maven limitations:
In summary, this approach appears unconventional and is chosen because reworking the existing mono-project is more expensive. I see value in this idea because such "mono-projects" are not uncommon, and zones by itself look like a good generalization of "local" cache and it could be handy to split builds by quality levels. But I need some time to digest it. I will take my time to understand why this is necessary and how to address it consistently. Cheers |
I do not agree with these assumptions. I have another project having more than 10 modules, where each module run its own integration tests, and the constraints are exactly the same.
I guess you mean inside the same reactor? This doesn't allow to distribute the load across multiple machines. This need is maybe rare in the Maven ecosystem, but this extension is probably heavily used by big projects needing exactly that. Please note that I applied all of my PRs to a forked version with the distributed CI pipeline. It has been working correctly for almost a week, still counting. Of course, that doesn't mean we can't do better using better concepts or implementations. |
The distribution is a good example. However, what's unusual here is an attempt to distribute a single-module build. The zones are intended to divide the build into smaller, shareable parts, but there is already a mechanism for that - the projects themselves. The projects (modules) exist to make the build more manageable and optimized by splitting the project into smaller parts. Smaller projects allow more efficient results sharing as well. Sharing intermediate builds is commonly achieved through staging repositories or by sharing results at the file system level. I can see the value in this change, but it will take some time to evaluate the idea, tradeoffs, and options. |
This PR adds the concept of cache zone. There can be multiple zones per checksum.
This allows to store multiple final cache states and to use some zones as input and others as output.
The first input zone with a cache hit "wins". When the cache entry is saved, it is written to all configured output zones. By default, there is one input/output zone named
default-zone
.Following this checklist to help us incorporate your
contribution quickly and easily:
for the change (usually before you start working on it). Trivial changes like typos do not
require a JIRA issue. Your pull request should address just this issue, without
pulling in other changes.
[MBUILDCACHE-XXX] - Fixes bug in ApproximateQuantiles
,where you replace
MBUILDCACHE-XXX
with the appropriate JIRA issue. Best practiceis to use the JIRA issue title in the pull request title and in the first line of the
commit message.
mvn clean verify
to make sure basic checks pass. A more thorough check willbe performed on your pull request automatically.
If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.
To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.
I hereby declare this contribution to be licenced under the Apache License Version 2.0, January 2004
In any other case, please file an Apache Individual Contributor License Agreement.