Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hacktoberfest [JENKINS-69731] Enhance @Library annotation to load same version (feature branch name) of library as pipeline script from SCM #19

Open
wants to merge 78 commits into
base: master
Choose a base branch
from

Conversation

jimklimov
Copy link

@jimklimov jimklimov commented Sep 28, 2022

Detailed in https://issues.jenkins.io/browse/JENKINS-69731

Documentation proposed in jenkins-infra/jenkins.io#5537

Short summary is: This issue states the desired ability to load at least same "version" (branch name) of the trusted globally configured library as that of the pipeline being executed, with little hassle to scale it on the pipeline coding and Jenkins configuration side.

The PR aims to (optionally, site-configured) allow literally requesting this:

@Library('libname@${BRANCH_NAME}') _
import my.lib.name.class;
... 

If the "version" requested in pipeline script is a literal string ${BRANCH_NAME}, the implementation would try to learn the "BRANCH_NAME" environment variable set for the current Run object. If that is present, it would validateVersion() as implemented by this or that LibraryRetriever backend (SCM here, HTTP in workflow-cps-global-lib-http-plugin, etc.) and if the string is not rejected, it would be substituted as the version to use. Otherwise the configured default version would be used as fallback.

UPDATE: For good measure, this PR also proposes optional ability to retrieve a library version specified by environment variables, whatever way they get populated (by some plugins, by build agent settings, etc.):

@Library('libname@${env.SOME_VAR_NAME}') _
import my.lib.name.class;
... 

This one is an open-ended story - ability is there, but how to use it may be up to imagination :) e.g. make chosen library version dependent on build agent architecture, dev/qa/stage/prod env, or something.

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

Tests added so far cover (hopefully) non-regression and interaction of code with the new feature for situations where it is expected to be a no-op. They also test related functionality which existed before but I think was not covered explicitly.

More relevant tests directly dealing with the added feature (and non-regression over time) were added, to nail the expected behaviors for Multi-Branch Project pipelines and Single SCM pipelines with mock Git repos, and "inline" pipelines that do not come from an SCM. They cover pipeline and library branches specified as "static" strings like "feature" or "master" (and an absent "bogus"), as well as interactions with expandable-looking patterns that the added logic substitutes (${BRANCH_NAME} and ${env.SOME_VAR_NAME}) when such interactions are permitted or forbidden by configuration of the library.

CAVEATS:

I added a test that I failed to complete in SCMSourceRetrieverTest.java - I hope someone better versed in the ecosystem can find some nuance I missed and complete those tests (Initially posted about several such tests, but managed to finish most of them):

  • checkDefaultVersion_inline_allowVersionEnvvar() - need an acceptable way to inject the environment variable to the build (or build agent); I added an envinject action to no avail... so that part of the test case is fenced away (developers trying to fix the test can enable it by running ENABLE_TEST_VAR_NAME_JOBLEVEL=true mvn test -Dtest='SCMSourceRetrieverTest#checkDefaultVersion_inline_allowVersionEnvvar, but by default it does not break auto-testing).
    • Currently this test does verify interaction with a Jenkins global configuration setting of an envvar, and has TODOs commented for other cases of interest.

NOTE: I've got stuck deciding "what is right" for a MultiBranch Pipeline with a SingleSCMSource (a way to add specific "Single repository & branch" to an MBP normally populated by a query from perhaps some other repository).

In one test I've prepared, there is an MBP with two single-branch sources with different names "feature" and "featurette" backed by the same GitSCM branch. It contains a Jenkinsfile that loads fixed-name @Library('branchylib@feature').

In both leaf job runs I see SingleSCMSource checkout out the "*/feature" source as specified in its GitSCM, and so request the same @Library('branchylib@feature') according to pipeline source text. For the second of those cases the MBP sets BRANCH_NAME='featurette' (and leaf job name "mbp/featurette") per SingleSCMSource "name" attribute, as well as the WorkflowJob BranchJobProperty.

Question: for the context of my PR, loading @Library('branchylib@${BRANCH_NAME}') evaluated at run-time aiming to use "same branch of library as of pipeline from SCM", which of those would be "correct"? The name of SingleSCMSource which might be arbitrary, but could intentionally be requested by user to set a specific BRANCH_NAME (and job name) with some purpose? Or the name of actual SCM branch that contained the pipeline? I can imagine arguing that either of the opposite choices is correct :) So need help picking one as the truer truth :\

For now I sided with the MBP plugin that goes to great lengths to make-believe that the "name" specified in config is the branch name, even if it does not exist in actual SCM.

Technically this is implemented as the choice of whether we first look into envvars or into SCM associated with the job/build, shuffling lego blocks around...


One important point highlighted in review is that one of several ways to learn the "version" (e.g. branch name) of the source SCM associated with the pipeline is not following Java inheritance nicely (that would need more work in SCM class and its descendants), but instead implements Reflection-based queries to "hudson.plugins.git.GitSCM" if the run-time instance of the SCM claims to be it. Code dealing with this is now separated into defaultedVersionSCM() and is a good target for refactoring by someone who knows what they are doing across the board (I don't claim to). On one hand this approach allows the feature to be instantly usable by a majority of practical use-cases (Git being the most popular SCM nowadays) and just to get it tested and validated as the proof-of-concept. On another, it is not nice ideologically and not easily scalable to handle other SCM types.


Another point raised in review (generally about BRANCH_NAME "learned" from SCM context for the pipeline code), is that "it could be misleading because the branch of the SCM used to define the Pipeline script does not necessarily have anything to do with the real sources—the SCM(s) you check out during the build". For example, when a Jenkinsfile is not part of the built/tested codebase and the same devops code is loaded for numerous built/tested projects. This is not a problem for this PRs goals where we want pipeline and library code to be used in sync, but may be a problem for generalization of BRANCH_NAME definition for cases where it currently is not implemented.

@jimklimov jimklimov force-pushed the JENKINS-69731-scm-BRANCH_NAME branch 2 times, most recently from 743b703 to 925cf30 Compare September 28, 2022 19:27
@jimklimov

This comment was marked as outdated.

@jimklimov
Copy link
Author

jimklimov commented Sep 29, 2022

Wondering about a couple more nuances here:

  1. Handle pull requests, where BRANCH_NAME="PR-123" is not really helpful, and would cause fallback to default library branch name. Here we can (separately-optionally?) try CHANGE_BRANCH and/or CHANGE_TARGET to poke whether relevant trusted library branch names exist (same as pipeline PR source or target branches respectively).

  2. How many different cases exist? Are there only WorkflowRun jobs that might have a pipeline from SCM (if not "inline") as an ordinary "Pipeline" or a "Multi-Branch Pipeline", and are these guaranteed to set the envvars which I propose to handle here? Or can there be other job types and/or cases when it does not set envvars?

  3. I've checked that "ordinary" pipelines using a single branch SCM source do not populate BRANCH_NAME envvar. Multi-branch pipelines configured to use a "Single repository and branch" populate it with the user-specified "Name" string unrelated to "Branch specifier" used for the checkout (okay, that might be intentional, and matches the generated MBP job name too).

  4. So, there are cases where relevant SCM source exists, but is not communicated with the envvar. Fallbacks for that seem useful, and some may be viable. But there are some things I am not certain about:

In particular, I am inclined to add support for Java equivalent of common pipeline groovy queries for "first SCM used in the job" (likely - but not certainly - meaning the source of pipeline) like:

BRANCH_NAME = scm.getBranches()[0]

or as I was recently advised, with envvar expansion so shell-templated branch specifications are resolved at run-time:

BRANCH_NAME = scm.getBranches().first().getExpandedName(env.getEnvironment())

The question here is whether the first entry of SCM list is really guaranteed to be the source of the pipeline?

  • I do not (yet) know how this list looks during job start-up, as it goes over script source and finds the @Library lines to handle. It may differ from queries over historic data that I summarize below.
  • If there are "pre-loaded libraries" (implicit JSL, etc) in a Jenkins deployment, can they interfere? Especially if an otherwise "inline" pipeline script is used (not from SCM), would it appear to have SCM sources?..
  • I queried existing build history with Jenkins script console, and the first entry is the library (with component source coming in second), which matches the order of entries on the build overview page as well:
for (def s : Jenkins.instance.getItemByFullName('componentname/master').getBuild('444').getSCMs()) {
  println s.getUserRemoteConfigs()
  println s.getBranches()
}

This may have to do with lightweight checkouts, but if they are not recorded as a source of the job script, it is not helpful for the purpose of this PR. Maybe they are recorded not in the SCM list however?

  • Corner case of an "inline pipeline" job without an SCM source of its own, which requests several libraries or does other SCM checkouts: as it passes the SCM operations, it does (I checked) populate the List<SCM> roster. This way the first entry of the list has no relation to branch name relevant for the pipeline source to which Library should align (if requested to). Looking for a way to determine that the WorkflowJob is inline...
  • Worth mentioning that an edited Replay of a build which originally came from an SCM retains that original BRANCH_NAME and scm.getBranches() even though this particular executed source might completely differ from whatever is in SCM. So far treating this as a curiosity; anyways the Replay ability is limited via Permissions to "those who know".

… @Library('libname@${BRANCH_NAME}') [JENKINS-69731]

To simplify co-existence of feature-branched pipeline scripts with
feature-branched Jenkins Shared Libraries, allow literal use of
    @Library('libname@${BRANCH_NAME}') _
lines to load a variant of a trusted (global) library with one case
of arbitrary branch naming.

This should be permitted by allowBRANCH_NAME checkbox (independent
of generic allowVersionOverride setting) in the first place.

If enabled, the value of BRANCH_NAME environment variable set by
the current build's Run object would be queried, and if resolvable -
verified with retriever.validateVersion() to match the backend
(Legacy SCM, Modern SCM, and other retrievers like HTTP/ZIP from
workflow-cps-global-lib-http-plugin).
…aultedVersion() for JENKINS-69731 related behaviors
…RANCH and CHANGE_TARGET be considered as fallbacks [JENKINS-69731]
…Ms() and handle if it is GitSCM with branches, before looking at envvars [JENKINS-69731]
…; and then it is safer to check getSCMs() not for WorkflowRun, but for the FlowDefinition from WorkflowJob [JENKINS-69731]
…ersionStaticStrings() and limit influence of caller-defined BRANCH_NAME [JENKINS-69731]
…efaultVersion_BRANCH_NAME_notAllowed() cases [JENKINS-69731]
@jimklimov
Copy link
Author

UPDATE: Developing tests for single-branch pipelines to make sure these work as expected.
Got some hiccups so changes to code or tests are anticipated.

Comments welcome in the meanwhile :)

@jimklimov jimklimov force-pushed the JENKINS-69731-scm-BRANCH_NAME branch from 5d65092 to 2d098a5 Compare October 17, 2022 19:30
@jimklimov
Copy link
Author

jimklimov commented Oct 17, 2022

Clean-up/refactor some tests, and address (and test) a problem with lightweight CpsScmFlowDefinition serving its SCM info differently.

Note: even if this PR in whole does not make it, cleanup in d67c2d1 and 786092d should be useful on its own. I tried to keep that separable, but did not check thoroughly.

@jimklimov
Copy link
Author

FWIW, https://github.com/jenkinsci/workflow-cps-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java pointed me to https://javadoc.jenkins.io/plugin/scm-api/jenkins/scm/api/SCMFileSystem.html which can getRevision() and that can getHead() for an https://javadoc.jenkins.io/plugin/scm-api/jenkins/scm/api/SCMHead.html that I might have a chance to (in some very roundabout way) generalize the search for branch if the concept exists for that SCM... loose thread in the world sticks out here:

try (SCMFileSystem fs = SCMFileSystem.of(build.getParent(), scm)) {

@jimklimov
Copy link
Author

So that loose thread did not lead far yet, getRevision() returned null (at least for completed builds inspected via script console).

Looking at SCMSourceRetriever.java and SCMRetriever.java (which mostly wraps SCMSourceRetriever to present an interface for plain SCM derived classes serving the library source) was not too encouraging either. Essentially code refers to scm.fetch(thing, ...) and https://github.com/jenkinsci/scm-api-plugin/blob/master/src/main/java/jenkins/scm/api/SCMSource.java#L640 describes the thing as "a specific revision based on some SCM-specific set of permissible syntaxes" (which in assumptions of this plugin is a branch, tag, git commit hash...) and the "thing" is provided by caller from some external knowledge, but is not (currently) extractable from SCM itself.

Updated the PR with further dismemberment of defaultedVersionSCM() to separate the logic which IMHO should stay in this plugin (inspecting different classes of Jobs and Runs to find an SCM instance for pipeline script source) vs. decisions that this is an SCM class that we can inspect, and separately to extractDefaultedVersionSCM() (in particular the hacky logic to extractDefaultedVersionGitSCM()) if we do find one. The latter is what may/should be refactored, but possibly with broadside shot at the whole SCM API ecosystem -- so perhaps in a separate set of PRs (to let any SCM descendant state if it supports "versions" and which name(s) of those it currently considers).

…separate extractDefaultedVersionGitSCM() hack in a scalable manner [JENKINS-69731]
@jimklimov jimklimov force-pushed the JENKINS-69731-scm-BRANCH_NAME branch from 95c9343 to daba570 Compare October 18, 2022 15:41
@jimklimov jimklimov force-pushed the JENKINS-69731-scm-BRANCH_NAME branch from daba570 to ac9f40a Compare October 18, 2022 15:48
@jimklimov
Copy link
Author

jimklimov commented Oct 18, 2022

Also added a method extractDefaultedVersionSCMFS() to look into SCMFileSystem (of the run and an scm instance found in it by defaultedVersionSCM()), but in unit tests with all their job types tried, it never sees a non-null SCMRevision.

Codepath ever gets to it at all in a few test cases:

  • checkDefaultVersion_singleBranch_BRANCH_NAME_after_staticStrings (3 times)
  • checkDefaultVersion_singleBranch_BRANCH_NAME

In others it does not have to look into the SCM sources with this method: either finds the info from MBP context, or finds no SCMs (inline pipeline scripts).

UPDATE: Double-checked by always trying to request defaultedVersionSCM() (even if data was found earlier), the SCMRevision still never appears - even not in MBPs.

@jimklimov
Copy link
Author

jimklimov commented Oct 18, 2022

Leaving the PR "as is" for now, hope for reviews and merge based on merit (added feature usable in many real-life cases) to proceed :)

Scaling this solution beyond GitSCM (hopefully after the PR merge) is possible:

  • (poorly) by cloning extractDefaultedVersionGitSCM() which is now relatively compact (and also updating a couple of lines in isDefaultedVersionSCMSupported() and the stack of callouts from extractDefaultedVersionSCM()), or
  • (preferably) by revamp of SCM API so any SCM class can tell us what we need right away.

@jimklimov

This comment was marked as outdated.

@myusuf3
Copy link

myusuf3 commented Dec 6, 2022

I would love to see this merged and would help in the integration testing of shared libraries.

@jimklimov jimklimov requested a review from a team as a code owner March 12, 2023 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants