Jenkins: make it possible to run notebooks from external repos programmatically #138

tlvu · 2024-05-15T00:12:24Z

This is done via the CONFIG_PARAMETERS_SCRIPT_URL override. See the sample jenkins-params-external-repos.include.sh where I run external notebooks from https://github.com/roocs/rook/tree/master/notebooks.

Changes:

Do not download repos not enabled to run (fixes Unnecessary downloading of other repos when their notebooks are not being run #30).
Do not delete setup.cfg tox.ini and pyproject.toml files.
Do not hardcode the list of archiveArtifacts in Jenkinsfile because the list of repos is dynamic now.
(Breaking) Archived build artifacts of original notebooks moved to under buildout/ dir, together with their corresponding *.output.ipynb files. Previously they were at the same checkout path as their repo.
- This artifact filename format can be override by CONFIG_OVERRIDE_SCRIPT_URL, see demo in test-override/jenkins-params-external-repos.include.sh
downloadrepos is both a script and a library to centralize all helper functions in one place. Otherwise we would have helpers for downloading repos in one file and helpers for other tasks in other files. Backward compatibility is preserved when downloadrepos is used as a script in autodeploy of tutorial notebooks to the Jupyter env.
New ability to run post-processing steps at the end of Jenkins build, see demo in test-override/jenkins-params-external-repos.include.sh
NBVAL_SANITIZE_CFG_FILE and DEFAULT_PRODUCTION_HOST now can also be overridden by the external repo if needed.

This PR is almost fixing #57.

There is no easy way to programmatically change the Jenkins "build request UI" to add on-the-fly the new params for the external repo. I think we will have to have another separate Jenkinsfile.other_repo.

This PR would help testing bird-house/birdhouse-deploy#455.

@mishaschwartz This would be useful to you if you have notebooks for your own WPS services.

However, old "download all default repos" have to be preserved for back-compat with other external scripts. Fixes #30

…METERS_SCRIPT_URL

…b from new repos No need to update 'archiveArtifacts' in Jenkinsfile anymore. Since we only download enabled repos, we could not hardcode artifact repo list anyways. DEFAULT_PRODUCTION_HOST also overridable for other repos.

…g files created by the script

…_SCRIPT_URL

…from being logged

…_HOST

fmigneault · 2024-05-16T15:28:58Z

Testing on http://daccs-jenkins.crim.ca/job/DACCS-iac-birdhouse/2649/
Triggered http://daccs-jenkins.crim.ca/job/PAVICS-e2e-workflow-tests/job/make-it-easier-to-add-new-nb-or-repos/3

fmigneault

The optional download is a nice feature. See comment about its default.
Love the reuse of functions for recurring operations.

Not really sure about some other design choices.
Waiting on tests to complete to see if any other items are problematic.

fmigneault · 2024-05-16T15:33:53Z

downloadrepos

+    if [ -z "$DOWNLOAD_ALL_DEFAULT_REPOS" ]; then
+        # Back-compat with old default behavior, used in binder/reorg-notebook
+        # and other external scripts that autodeploy tutorial notebooks (see
+        # https://github.com/bird-house/birdhouse-deploy/blob/444a7c35a31aa8ad351e47f659383ba5c2919705/birdhouse/deployment/trigger-deploy-notebook#L64-L75)
+        DOWNLOAD_ALL_DEFAULT_REPOS=true
+    fi


I think I would rather have the other script set DOWNLOAD_ALL_DEFAULT_REPOS=true if it needs all of them and have the default behavior of skipping unnecessary downloads.

No, that's the point. If the other scripts have to set this new DOWNLOAD_ALL_DEFAULT_REPOS=true it means I break backward-compatibility with the other scripts.

I want no changes to other scripts outside this repo.

Instead, every CI needs to inject the new variable to take advantage of more intelligent download rather than getting it for free. Since birdhouse-deploy does wget to retrieve those scripts, it makes more sense IMO that it updates with new features, or use an explicit commit hash or tag to ensure the behavior remains consistent.

every CI needs to inject the new variable to take advantage of more intelligent download

Huh, the only one is this Jenkins or more precisely the testall script. If someone deploy this job on another server, the entrypoint is still the testall script, not this downloadrepos directly.

use an explicit commit hash or tag to ensure the behavior remains consistent.

Using exact commit hash is like pinning everything in your requirements.txt. We do not do this because update will be too tedious, same here.

fmigneault · 2024-05-16T15:41:43Z

downloadrepos

+# Presence of setup.cfg, tox.ini, pyproject.toml files confuse py.test execution rootdir discovery.
+# USAGE: delete_files_confusing_pytest "$CHECKOUT_DIR"
+delete_files_confusing_pytest() {
+    for afile in setup.cfg tox.ini pyproject.toml; do
+        if [ -f "$1/$afile" ]; then
+            rm -v "$1/$afile"
+        fi
+    done
+}


If the repository defined pytest options, such as necessary plugins or marker definitions, this will break their execution.

I agree this is weird. I have no other hypothesis other can because I run multiple notebooks from multiple repos at the same time, if each repo has their config, it breaks the full test run.

In that case we should be running a separate py.test process for each repo instead of all at once.

I agree with @fmigneault that there could be crucial definitions in the setup.cfg tox.ini pyproject.toml files that we do not want to lose or the tests may fail unexpectedly.

Is it possible those pytest definitions are for the unit test but here we are running notebook tests so they do not apply? Because for the current list of repos hardcoded in this Jenkins config, missing those pytest definitions are absolutely fine since forever. Keeping them are actually causing trouble.

The step to remove those files setup.cfg tox.ini pyproject.toml are not hardcoded so for external repos, they can choose to keep them if needed. See example in jenkins-params-external-repos.include.sh

PAVICS-e2e-workflow-tests/test-override/jenkins-params-external-repos.include.sh

Line 43 in 0182490

delete_files_confusing_pytest "$ROOK_DIR"

, just do not include that call to delete_files_confusing_pytest.

fmigneault · 2024-05-16T15:43:45Z

Jenkinsfile

-            archiveArtifacts(artifacts: 'esgf-compute-api-*/examples/*.ipynb', fingerprint: true)
-            archiveArtifacts(artifacts: 'PAVICS-landing-*/content/notebooks/climate_indicators/*.ipynb', fingerprint: true)
-            archiveArtifacts(artifacts: 'buildout/*.output.ipynb', fingerprint: true, allowEmptyArchive: true)
+            archiveArtifacts(artifacts: 'buildout/*.ipynb', fingerprint: true, allowEmptyArchive: true)


Why not use a buildout/**/*.ipynb and leave the directory structure as originally defined by each repository? That would avoid all the directory renaming manipulations and potential side effects of removing pytest configs.

I previously thought that too. But actually real life usage, a flat directory is much easier to use, meaning to download the notebooks I want. In a nested layout, if I want multiple notebooks under multiple different sub-folders, I have to click a lot more. It's tiring.

Now with the option to only archive the notebooks enabled for the run, the chance to have name clash is even lower.

That would avoid all the directory renaming manipulations and potential side effects of removing pytest configs.

Not sure what you mean. I do not perform any directory renaming manipulations. The notebooks are run from the original location of the checkout because some will use relative path to get their test data in the same checkout.

I only copy the notebooks only, not the test data, to the buildout/ folder so they are archived by Jenkins, along side their matching "output" notebook so we can download both of them and diff them manually if the diff presented by Jenkins console output is impossible to understand.

As for "potential side effects of removing pytest configs", I also do not understand how that relate to Jenkins archiving config change.

All the notebooks are copied as a flat list, which could lead to conflicts between repositories. I was under the impression that the repository name was used as prefix to the archived notebook names to avoid this, but it is not the case. With the results archived this way, it is very hard to trace back the original source of the notebook.

As for "potential side effects of removing pytest configs", I also do not understand how that relate to Jenkins archiving config change.

It does not affect the archiving, but it affects how the notebooks are executed by pytest in order to generate their outputs. Therefore, if any pytest configurations options are applied to obtain certain effects needed by the tests, this could have bad side effects.

it is very hard to trace back the original source of the notebook.

I never had trouble knowing which nb is from which repo but I agree not everyone know this. Just so it is clear, it is already like this before (flat list of *.output.ipynb), I am just adding the original along-side the corresponding output.ipynb.

However, I'll add the repo name as prefix to the nb filename to help differentiate and avoid name clash.

By the way, I did preserve the already existing name clash prevention:

PAVICS-e2e-workflow-tests/runtest

Lines 88 to 89 in d6fd500

# prevent name clash

filename="${filename}_`date '+%s'`"

It does not affect the archiving

But you had your comment in this Jenkins archiving section so that's why I was mixed up.

Like I said here #138 (comment), it weird but I had to do it, otherwise the test run do not work at all. The only hypothesis I have is mentioned in that comment.

By the way, deleting the extra files is an existing behavior. I did not just add it.

This PR is maximally backward-compatible. I did not add any new processing steps other than making these processing steps available to external repos.

"look at the output in the artifact"? You mean look at the output in the "Console Output"?

No, I open the notebook directly from the artifacts page and look at its contents from the browser.

Because if you only look at the list of saved notebooks in the artifacts, you won't know which notebooks actually failed.

If I went through the artifact to look for a notebook, it's because I already know which test case I'm investigating to debug.

The listing of notebooks in the "Console Output" where any errors are also found, are using the original structure, not flat. This will give you exactly the repo and path of the notebook you want to run manually.

This is exactly why I would rather have the notebooks with the generated outputs located under the same locations as the current artifacts using the same structure as displayed in the console outputs.

This is exactly why I would rather have the notebooks with the generated outputs located under the same locations as the current artifacts using the same structure as displayed in the console outputs.

@fmigneault
Okay, how about I make all the files (original and output) keep the same folder structure as prefix. Then you have your structure to easy search for your file, I have my flat list to easily switch between multiples repos.

Something like buildout/pavics-sdi-master--docs--source--notebooks--WCS_example{,-output]}.ipynb. Good enough compromise so we can merge this?

I don't think it is a matter of compromise. The artifact structure should simply reflect the source repository. There is no need for an alternate naming convention and there is no need to define more code to handle it. The outputs should simply be saved where the notebooks are naturally collected in artifacts.

I don't think it is a matter of compromise.

Agreed, no compromise for usability.

Let me repeat comment #138 (comment) again if not clear enough: if "notebooks are under different nested folders ! You'll have to click to get to the first older, back out, then again to the 2nd folder ! The more folders, the more back and forth it is !!!"

Just a reminder, this artifact archiving just for immediate troubleshooting, it has no long term persistance value. I'd rather focus on usability and productivity than following some standard for something that is ephemeral and not important. Once you manage a lot of notebook repos, a flat list if much faster to deal with.

Even then, I do not agree. If you know the path of the notebook you are interested in after getting an error, you have no reason to go back-and-forth between folders.

I see the addition of code, special character handling and deletion of repo-specific configs as a sign of bigger maintenance burden and potential side effects.

tlvu · 2024-05-16T16:24:10Z

Testing on http://daccs-jenkins.crim.ca/job/DACCS-iac-birdhouse/2649/ Triggered http://daccs-jenkins.crim.ca/job/PAVICS-e2e-workflow-tests/job/make-it-easier-to-add-new-nb-or-repos/3

You should try https://daccs-jenkins.crim.ca/job/PAVICS-e2e-workflow-tests/job/make-it-easier-to-add-new-nb-or-repos/build?delay=0sec

and set CONFIG_PARAMETERS_SCRIPT_URL="https://raw.githubusercontent.com/Ouranosinc/PAVICS-e2e-workflow-tests/25c0abb438c43b6d8de313e98684dca5846b0231/test-override/jenkins-params-external-repos.include.sh"

…po they came from and further avoid name clash

…dd-new-nb-or-repos

…with --rootdir

tlvu · 2024-10-31T18:41:28Z

@fmigneault Apologies for taking more than haft a year to come back to this PR after our discussion. I was completely under the water.

So per our previous discussion, you wanted

to avoid deleting setup.cfg tox.ini and pyproject.toml files, done in: 36fba3d
to allow user override whether the artifacts files are a flat list or nested list, done in c9c8d50
to test how Jenkins will handle the display of a massive flat list of artifact files, done in 7fce738

@mishaschwartz Just FYI, with this PR, you will be able to run any notebooks unknown to this repo, as long as the docker images has the required packages to run the notebook.

As a demo, the sample config file test-override/jenkins-params-external-repos.include.sh runs the notebooks from https://github.com/roocs/rook/tree/master/notebooks which is not hardcoded in this repo like https://github.com/Ouranosinc/pavics-sdi/tree/master/docs/source/notebooks or https://github.com/bird-house/finch/tree/master/docs/source/notebooks.

fmigneault · 2024-11-04T18:12:18Z

@tlvu
I am not sure if I'm interpreting c9c8d50 correctly. If I want to preserve the current behavior that uses the nested file structure as per the original repository references, I have to explicitly override the function?? If that's the case, the implementation is backward. I shouldn't have to update my configuration to avoid introducing side effects. This is not backward compatible. It is fine to allow override by scripts that want to explicitly modify the structure, since that's what overrides are for, modifying behavior.

fmigneault

Wait for answer regarding file path handling vs override function.

tlvu · 2024-11-08T15:55:29Z

f I want to preserve the current behavior that uses the nested file structure

@fmigneault the current behavior for the output file is already a flat list !

This PR basically harmonize also the location of the original file to be next to the output file. Currently the output file is flat but the original file is nested.

The output file is the most important one as it contains the error or the refreshed output to be used when refreshing notebooks output.

fmigneault · 2024-11-08T17:11:16Z

the current behavior for the output file is already a flat list !

I don't know which source you use that makes you think this, but our server definition doesn't override anything and presents the artifacts in a hierarchy:

Given the PR introduces:

choose_artifact_filename() {
    echo "$1" | sed "s@/@__@g"
}

This explicitly replaces the / to generate a flat-list of names rather than the nested hierarchy paths.

In other words, what I want as a result of this PR is that, without any need of an override, the notebooks found in eg. pavics-sdi-master/docs/source are still stored in there. I should have to apply a choose_artifact_filename() { echo "$1" } to preserve the behavior on my end.

tlvu · 2024-11-08T17:46:15Z

our server definition doesn't override anything

@fmigneault I think you guys override SAVE_RESULTING_NOTEBOOK to false, which do not perform a 2nd run of jupyter nbconvert to actually create the output file.

And the reason is to save time since a 2nd run means doubling the time of the CI pipeline and the pass or failure status of the pipeline only depend on the first run (py.test), not the 2nd run (jupyter nbconvert).

If you happen to not override that config, then all the output file are flat under buildout/.

Here is the screenshot of the currently flat output files from our end:

fmigneault · 2024-11-08T18:19:48Z

Ok. As long as it doesn't affect the output structure of the notebooks.

I still find it odd that a flat list would be used.
If we decide to enable SAVE_RESULTING_NOTEBOOK, we'll have mismatching notebook/output structures. Seems like a weird behavior that we must explicitly override the name mangling function to preserve the natural directory structure matching source notebooks.

Why not just dump the <file>.output.ipynb next to the <file>.ipynb by default? That seems like a more sensible default than rewriting paths and dump them in a separate directory.

tlvu · 2024-11-08T18:27:23Z

Why not just dump the <file>.output.ipynb next to the <file>.ipynb by default? That seems like a more sensible default than rewriting paths and dump them in a separate directory.

That's exactly the intend of this change: put the <file>.ipynb (currently nested) next to <file>.output.ipynb (currently flat) !

The screenshot I showed you was on master, switch do not have this change, so you see that the <file>.output.ipynb are flat and the <file>.ipynb are nested in another hierarchy so you do not see them with the <file>.output.ipynb.

fmigneault · 2024-11-08T23:47:25Z

I don't think you got what I meant by next to the <file>.ipynb.
I am talking next to the current hierarchy. As in:

finch-master
  docs
    source
      notebooks
        finch-usage.ipynb
        finch-usage.output.ipynb

No replace of / by __.

If you want to replace the / to generate a flat list on your end, you can do so, and it would modify both results to become:

finch-master__docs__source__notebooks__finch-usage.ipynb
finch-master__docs__source__notebooks__finch-usage.output.ipynb

But in either case, you don't get a mixture of nested and non-nested directories.

I went back to older comments when I did try using that branch, and got a flat list as shown in #138 (comment).
This is NOT what I want by default.
Whether I save notebook outputs or not should not suddenly change all the structure.

tlvu added 10 commits May 14, 2024 15:37

downloadrepos: allow to be used as library as well

e594d24

downloadrepos: only download repos that will be tested

5216bca

However, old "download all default repos" have to be preserved for back-compat with other external scripts. Fixes #30

testall: make processing here available to other repos via downloadrepos

57936f5

testall: make available all functions in downloadrepos to CONFIG_PARA…

cccae70

…METERS_SCRIPT_URL

testall: git clean before CONFIG_PARAMETERS_SCRIPT_URL to avoid wipin…

cfe5f95

…g files created by the script

runtest: allow to override --nbval-sanitize-with from CONFIG_OVERRIDE…

7253e6a

…_SCRIPT_URL

downloadrepos: avoid set +x because it hides all subsequent commands …

d93b229

…from being logged

jenkins sample override: demo runnings notebooks from an external repo

9160053

runtest: CONFIG_PARAMETERS_SCRIPT_URL can override DEFAULT_PRODUCTION…

25c0abb

…_HOST

tlvu requested review from mishaschwartz and fmigneault May 15, 2024 00:12

fmigneault requested changes May 16, 2024

View reviewed changes

runtest: add repo and branch name to archived nbs name to id which re…

0182490

…po they came from and further avoid name clash

tlvu requested a review from fmigneault May 24, 2024 18:32

tlvu added 6 commits June 5, 2024 18:31

Merge remote-tracking branch 'origin/master' into make-it-easier-to-a…

1cf434e

…dd-new-nb-or-repos

Merge remote-tracking branch 'origin/master' into make-it-easier-to-a…

a20ef26

…dd-new-nb-or-repos

testall: try to avoid delete_files_confusing_pytest by using py.test …

36fba3d

…with --rootdir

runtest: allow saved files under buildout/ to have hierarchy

5f13ec9

tests: allow to override artifact nb filename format archived by Jenkins

c9c8d50

runtest: allow overrideable post-processing steps

7fce738

tlvu force-pushed the make-it-easier-to-add-new-nb-or-repos branch from 644e1ec to 7fce738 Compare October 31, 2024 17:22

test artifacts: shorter filename format, name clash already handled

d0265f3

fmigneault assigned tlvu Nov 4, 2024

fmigneault requested changes Nov 4, 2024

View reviewed changes

tlvu requested a review from fmigneault November 8, 2024 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jenkins: make it possible to run notebooks from external repos programmatically #138

Jenkins: make it possible to run notebooks from external repos programmatically #138

tlvu commented May 15, 2024 •

edited

Loading

fmigneault commented May 16, 2024 •

edited

Loading

fmigneault left a comment

fmigneault May 16, 2024

tlvu May 16, 2024

fmigneault May 16, 2024

tlvu May 16, 2024

fmigneault May 16, 2024

tlvu May 16, 2024

mishaschwartz May 23, 2024

tlvu May 23, 2024

fmigneault May 16, 2024

tlvu May 16, 2024

tlvu May 16, 2024

fmigneault May 16, 2024

tlvu May 16, 2024

fmigneault May 24, 2024

tlvu May 28, 2024

fmigneault May 29, 2024

tlvu May 30, 2024

fmigneault May 30, 2024

tlvu commented May 16, 2024

tlvu commented Oct 31, 2024

fmigneault commented Nov 4, 2024 •

edited

Loading

fmigneault left a comment

tlvu commented Nov 8, 2024

fmigneault commented Nov 8, 2024

tlvu commented Nov 8, 2024

fmigneault commented Nov 8, 2024

tlvu commented Nov 8, 2024 •

edited

Loading

fmigneault commented Nov 8, 2024

Jenkins: make it possible to run notebooks from external repos programmatically #138

Are you sure you want to change the base?

Jenkins: make it possible to run notebooks from external repos programmatically #138

Conversation

tlvu commented May 15, 2024 • edited Loading

fmigneault commented May 16, 2024 • edited Loading

fmigneault left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlvu commented May 16, 2024

tlvu commented Oct 31, 2024

fmigneault commented Nov 4, 2024 • edited Loading

fmigneault left a comment

Choose a reason for hiding this comment

tlvu commented Nov 8, 2024

fmigneault commented Nov 8, 2024

tlvu commented Nov 8, 2024

fmigneault commented Nov 8, 2024

tlvu commented Nov 8, 2024 • edited Loading

fmigneault commented Nov 8, 2024

tlvu commented May 15, 2024 •

edited

Loading

fmigneault commented May 16, 2024 •

edited

Loading

fmigneault commented Nov 4, 2024 •

edited

Loading

tlvu commented Nov 8, 2024 •

edited

Loading