Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use HPC for CI #386

Merged
merged 94 commits into from
Apr 22, 2024
Merged

WIP: Use HPC for CI #386

merged 94 commits into from
Apr 22, 2024

Conversation

jakob-fritz
Copy link
Collaborator

No description provided.

- module load CuPy
- pwd
- ls -lah
- pip install -e .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to pip install in the CI? User installation with pip is persistent, afaik. Is the gitlab not associated with any user?
I would prefer to put the module load commands in the job scripts. But if you need to repeat the pip installs every time, it's better here.

Copy link
Collaborator Author

@jakob-fritz jakob-fritz Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, if we need pip install each time. The jobs are run as a predefined user (the one who triggers the CI in Gitlab). As mirroring is done with a personal access token, that user is impersonated and the CI will always run as this specific user. So if pip install is persistent, it will be available in all runs, as these runs are executed as the same user

Regarding module load: We can put them in the script. I find it easier/nicer if the scripts are as short as possible. Therefore, I moved the module load into the YAML-file. Furthermore, these steps (as module load or pip install) are executed on a login-node. The content of the sh-file is executed on a compute-node. In term of quota, it is "cheaper" if we don't spend compute-time for module load (although it does not take too long).

If you want to, feel free to move the pip install and module load into the script. If I shall do that, feel free to ping me!

paths:
- benchmarks
- sbatch.err
- sbatch.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have a directory where all job scripts can post their output? It would be neat to allow multiple job scripts. Not really needed, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure we can have a directory for the outputs. Each job in Gitlab spawns in a separate directory. Therefore, multiple jobs (from Gitlab) won't put files in the same directory. If multiple tasks from slurm are spawned from a single Job in Gitlab, having this directory might make sense.

@jakob-fritz
Copy link
Collaborator Author

For the job mirror to gitlab to work, the information needs to be provided in the Github-Repo.
For the required variables (and where to get them) see the readme of the used action

@brownbaerchen
Copy link
Contributor

How do we continue? If I understand correctly, we need to add a personal access token from GitLab as a secret to this repository. Does it make sense that @pancetta sets up the GitLab repo now? He has to add the secret anyhow.

@jakob-fritz
Copy link
Collaborator Author

Yes, probably it is best, if @pancetta does this. I can also create the repo (or use an existing one from me), but pancetta needs to add the secret in the end anyway.

As said: The steps are listed above. If questions occur, I am willing to help. In person only in the next year, as this year, I'm out of office

@pancetta
Copy link
Member

OK, the repository is https://gitlab.jsc.fz-juelich.de/atml-ati/pysdc and the secret is under GITLAB_SECRET.

@pancetta
Copy link
Member

pancetta commented Dec 18, 2023

Note that the gitlab repository is empty, so I could not allow force-push. I tried with wildcard *, but I don't know if that works. @jakob-fritz and @brownbaerchen are now maintainers of the gitlab repository, so you should be able to play around with this.

Copy link

codecov bot commented Dec 18, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.04%. Comparing base (249741b) to head (376b44a).
Report is 25 commits behind head on create_gitlab_ci.

❗ Current head 376b44a differs from pull request most recent head 0cec1f3. Consider uploading reports for the commit 0cec1f3 to get more accurate results

Additional details and impacted files
@@                Coverage Diff                @@
##           create_gitlab_ci     #386   +/-   ##
=================================================
  Coverage             74.04%   74.04%           
=================================================
  Files                   274      274           
  Lines                 23153    23153           
=================================================
  Hits                  17143    17143           
  Misses                 6010     6010           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jakob-fritz jakob-fritz mentioned this pull request Apr 22, 2024
4 tasks
@jakob-fritz jakob-fritz self-assigned this Apr 22, 2024
@jakob-fritz jakob-fritz merged commit cdb77d4 into Parallel-in-Time:create_gitlab_ci Apr 22, 2024
21 checks passed
brownbaerchen added a commit to brownbaerchen/pySDC that referenced this pull request Apr 24, 2024
commit cbaae05
Author: jakob-fritz <[email protected]>
Date:   Wed Apr 24 10:34:09 2024 +0200

    Make create_gitlab_ci branch up-to-date before merging into master (Parallel-in-Time#418)

    * first working SDC version (M and Minv)

    * Update playground.py

    * cleaning up

    * Added some hyphens in plots (Parallel-in-Time#389)

    * Removed seperate file for GPU Dahlquist implementation (Parallel-in-Time#391)

    Co-authored-by: Thomas <[email protected]>

    * Review (Parallel-in-Time#388)

    * Bug is fixed and added new code

    * new code for the table

    * Edits in markdown file

    * some edits in test

    * Bugs fix

    * Codecov

    * I cleaned up my code and separated classes to make it easier to work with. It is not ready yet; if Codecov fails, I will include more tests.

    * forgot black

    * flake8

    * bug fix

    * Edits codes according to the comments

    * Edited codes according to the comments in the GitHub

    * Defined new function in stability_simulation.py to check stability for
    given points and excluded codecov function that generates a table.

    * small edits for codecov

    * removed no cover

    * NCCL communicators (Parallel-in-Time#392)

    * Added wrapper for MPI communicator to use NCCL under the hood

    * Small fix

    * Moved NCCL communicator wrapper to helpers

    ---------

    Co-authored-by: Thomas <[email protected]>

    * Version bump for new release

    * proper readme and link

    * Started playground for machine learning generated initial guesses for (Parallel-in-Time#394)

    SDC

    * playing with FEniCS

    * blackening

    * Bug fix (Parallel-in-Time#395)

    * readme file changes

    * fixed bugs for stability plots and some edits in README file

    * some edits

    * typo in citation

    * Bump version

    * Bug fix (Parallel-in-Time#396)

    * Clear documentation and some edits in the code

    * forgot black

    * some changes

    * bump version

    * Cosmetic changes (Parallel-in-Time#398)

    * Parallel SDC (Reloaded) project (Parallel-in-Time#397)

    TL: Added efficient diagonal preconditioners and associated project. Coauthored by @caklovicka

    * Generic multi-component mesh  (Parallel-in-Time#400)

    * Generic multicomponent mesh

    * new try

    * Added a test for MultiComponentMesh

    * Test that the type is conserved also after numpy operations

    * Added documentation for how to use `MultiComponentMesh`

    * Changed formatting of the documentation

    * Update ci_pipeline.yml

    * version freak show

    * version freak show II

    * version freak show III

    * version freak show IV

    * Update ci_pipeline.yml

    * version freak show V

    * 2D Brusselator problem (Parallel-in-Time#401)

    * Added 2D Brusselator problem from Hairer-Wanner II. Thanks @grosilho for
    the suggestion!

    * Added forgotten pytest marker

    * Fix brain afk error

    * Added work counter for right hand side evaluations

    * Removed file for running Brusselator from project

    * Retry at removing the file

    * I need to go to git school

    * Datatype `DAEMesh` for DAEs (Parallel-in-Time#384)

    * Added DAE mesh

    * Updated all DAE problems and the SDC-DAE sweeper

    * Updated playgrounds with new DAE datatype

    * Adapted tests

    * Minor changes

    * Black.. :o

    * Added DAEMesh only to semi-explicit DAEs + update for FI-SDC and ProblemDAE.py

    * Black :D

    * Removed unnecessary approx_solution hook + replaced by LogSolution hook

    * Update WSCC9 problem class

    * Removed unnecessary comments

    * Removed test_misc.py

    * Removed registering of newton_tol from child classes

    * Update test_problems.py

    * Rename error hook class for logging global error in differential variable(s)

    * Added MultiComponentMesh - @brownbaerchen + @tlunet + @pancetta Thank ugit add pySDC/implementations/datatype_classes/MultiComponentMesh.py

    * Updated stuff with new version of DAE data type

    * (Hopefully) faster test for WSCC9

    * Test for DAEMesh

    * Renaming

    * ..for DAEMesh.py

    * Bug fix

    * Another bug fix..

    * Preparation for PDAE stuff (?)

    * Changes + adapted first test for PDAE stuff

    * Commented out test_WSCC9_SDC_detection() - too long runtime

    * Minor changes for test_DAEMesh.py

    * Extended test for DAEMesh - credits for @brownbaerchen

    * Test for HookClass_DAE.py

    * Update for DAEMesh + tests

    * 🎉 - speed up test a bit (at least locally..)

    * Forgot to enable other tests again

    * Removed if-else-statements for mesh type

    * View for unknowns in implSysFlatten

    * Fix for RK sweeper - changed nodes in BackwardEuler class (Parallel-in-Time#403)

    * Made aborting the step at growing residual optional (Parallel-in-Time#405)

    * `pySDC`-build-in `LagrangeApproximation` class in `SwitchEstimator` (Parallel-in-Time#406)

    * SE now uses LagrangeApproximation class + removed Lagrange class in SE

    * Removed log message again (not corresponding to PR)

    * version bump

    * Added hook for logging to file (Parallel-in-Time#410)

    * Monodomain project (Parallel-in-Time#407)

    * addded some classes from oldexplicit_stabilized branch. Mainly, the problems description, datatype classes, explicit stabilized classes. Tested for IMEX on simple problems

    * added implicit,explicit,exponential integrator (in electrophysiology aka Rush-Larsen)

    * added exponential imex and mES, added parabolic_system in vec format

    * added new stabilized integrators using multirate, splitting and exponential approaches

    * before adding exponential_runge_kutta as underlying method, instead of the traditional collocation methods

    * added first order exponential runge kutta as underlying collocation method. To be generalized to higher order

    * generalized exponential runge kutta to higher order. Added exponential multirate stabilized method using exponential RK but must tbe checked properly

    * fixed a few things

    * optimized a few things

    * renamed project ExplicitStabilized to Monodomain

    * removed deprecated problems

    * fixed some renaming issues

    * did refactoring of code and put in Monodomain_NEW

    * removed old code and renamed new code

    * added finite difference discretization

    * added many things, cant remember

    * old convergence_controller

    * addded some classes from oldexplicit_stabilized branch. Mainly, the problems description, datatype classes, explicit stabilized classes. Tested for IMEX on simple problems

    * added implicit,explicit,exponential integrator (in electrophysiology aka Rush-Larsen)

    * added exponential imex and mES, added parabolic_system in vec format

    * added new stabilized integrators using multirate, splitting and exponential approaches

    * before adding exponential_runge_kutta as underlying method, instead of the traditional collocation methods

    * added first order exponential runge kutta as underlying collocation method. To be generalized to higher order

    * generalized exponential runge kutta to higher order. Added exponential multirate stabilized method using exponential RK but must tbe checked properly

    * fixed a few things

    * optimized a few things

    * renamed project ExplicitStabilized to Monodomain

    * removed deprecated problems

    * fixed some renaming issues

    * did refactoring of code and put in Monodomain_NEW

    * removed old code and renamed new code

    * added finite difference discretization

    * added many things, cant remember

    * added smooth TTP model for conv test, added DCT for 2D and 3D problems

    * added plot stuff and run scripts

    * fixed controller to original

    * removed explicit stabilized files

    * fixed other files

    * removed obsolete splittings from ionic models

    * removed old sbatch scripts

    * removed mass transfer and sweeper

    * fixed something

    * removed my base transfer

    * removed hook class pde

    * removed FD files

    * fixed some calls to FD stuff

    * removed FEM FEniCSx files

    * renamed FD_Vector to DCT_Vector

    * added hook for output and visualization script

    * removed plot scripts

    * removed run scripts, except convergence

    * removed convergence experiments script

    * fixed TestODE

    * added stability test in run_TestODE

    * added stability test in run_TestODE

    * added stability test in run_TestODE

    * removed obsolete stuff in TestODE

    * removed unneeded stuff from run_MonodomainODE

    * cleaned a bit run_MonodomainODE

    * removed utils/

    * added few comments, cleaned a bit

    * removed schedule from workflow

    * restored tutorial step 7 A which I has modified time ago

    * run black on monodomain project

    * fixed a formatting thing

    * reformatted everything with black

    * Revert "revert formatted with black"

    This reverts commit 82c82e9.

    * added environment file for monodomain project, started to add stuff in workflow

    * added first test

    * added package tqdm to monodomain environment

    * added new TestODE using DCT_vectors instead of myfloat, moved phi_eval_lists from MonodomainODE to the sweeper

    * deleted old TestODE and myfloat stuff

    * renamed TestODEnew to TestODE

    * cleaned a bit

    * added stability, convergence and iterations tests. Changed a bit other scripts as needed

    * reactivated other tests in workflow

    * removed my tests temporarly

    * added monodomain marker to project pyproject.toml

    * changed files and function names for tests

    * fixed convergence test

    * made one test a bit shorter

    * added test for SDC on HH and fixed missing feature in SDC imex sweeper for monodomain

    * reformatted with correct black options

    * fixed a lint error

    * another lint error

    * adding tests with plot

    * modified convergence test

    * added test iterations in parallel

    * removed plot from tests

    * added plots without writing to file

    * added write to file

    * simplified plot

    * new plot

    * fixed plot in iterations parallel

    * added back all tests and plots

    * cleaned a bit

    * added README

    * fixed readme

    * modified comments in controllers

    * try to compute phi every step

    * removed my controllers, check u changed before comuting phis

    * enabled postprocessing in pipeline

    * added comments to data_type classes, removed unnecessary methods

    * added comments to hooks

    * added comments to the problem classes

    * added comments to the run scripts

    * added comments to sweepers and transfer classes

    * fixed the readme

    * decommented if in pipeline

    * removed recv_mprobe option

    * changed back some stuff outiside of monodomain project

    * same

    * again

    * fixed Thomas hints

    * removed old unneeded move coverage folders

    * fixed previously missed Thomas comments

    * begin change datatype

    * changed run_Monodomain

    * added prints

    * fixed prints

    * mod print

    * mod print

    * mod print

    * mod print

    * rading init val

    * rading init val

    * removed prints

    * removed prints

    * checking longer time

    * checking longer time

    * fixed call phi eval

    * trying 2D

    * trying 2D

    * new_data type passing tests

    * removed coverage folders

    * optmized phi eval lists

    * before changing phi type

    * changed eval phi lists

    * polished a bit

    * before switch indeces

    * reformatted phi computaiton to its traspose

    * before changing Q

    * optimized integral of exp terms

    * changed interfate to c++ code

    * moved definition of dtype u f

    * tests passed after code refactoring

    * Generic MPI FFT class (Parallel-in-Time#408)

    * Added generic MPIFFT problem class

    * Fixes

    * Generalized to `xp` in preparation for GPUs

    * Fixes

    * Ported Allen-Cahn to generic MPI FFT implementation

    * Ported Gray-Scott to generic MPI FFT (Parallel-in-Time#412)

    * Ported Gray-Scott to generic MPI FFT class

    * `np` -> `xp`

    * Reverted poor changes

    * Update README.md (Parallel-in-Time#413)

    Added the ExaOcean grant identified and the "Supported by the European Union - NextGenerationEU." clause that they would like us to display.

    * TIME-X Test Hackathon @ TUD: Test for `SwitchEstimator` (Parallel-in-Time#404)

    * Added piecewise linear interpolation to SwitchEstimator

    * Started with test for SwitchEstimator [WIP]

    * Test to proof sum_restarts when event occuring at boundary

    * Started with test to check adapt_interpolation_info [WIP]

    * Added test for SE.adapt_interpolation_info()

    * Update linear interpolation + logging + changing tolerances

    * Test for linear interpolation + update of other test

    * Correction for finite difference + adaption tolerance

    * Added test for DAE case for SE

    * Choice of FD seems to be important for performance of SE

    * Removed attributes from dummy probs (since the parent classes have it)

    * Test for dummy problems + using functions from battery_model.py

    * Moved standard params for test to function

    * Updated hardcoded solutions for battery models

    * Updated hardcoded solutions for DiscontinuousTestODE

    * Updated docu in SE for FDs

    * Lagrange Interpolation works better with baclward FD and alpha=0.9

    * Added test for state function + global error

    * Updated LogEvent hooks

    * Updated hardcoded solutions again

    * Adapted test_problems.py

    * Minor changes

    * Updated tests

    * Speed-up test for buck converter

    * Black..

    * Use msg about convergence info in Newton in SE

    * Moved dummy problem to file

    * Speed up loop using mask

    * Removed loop

    * SDC-DAE sweeper for semi-explicit DAEs (Parallel-in-Time#414)

    * Added SI-SDC-DAE sweeper

    * Starte with test for SemiImplicitDAE

    * Test for SI-SDC sweeper

    * Clean-up

    * Removed parameter from function

    * Removed test + changed range of loop in SI-sweeper

    ---------

    Co-authored-by: Robert Speck <[email protected]>
    Co-authored-by: Thomas Baumann <[email protected]>
    Co-authored-by: Thomas <[email protected]>
    Co-authored-by: Ikrom Akramov <[email protected]>
    Co-authored-by: Thibaut Lunet <[email protected]>
    Co-authored-by: Lisa Wimmer <[email protected]>
    Co-authored-by: Giacomo Rosilho de Souza <[email protected]>
    Co-authored-by: Daniel Ruprecht <[email protected]>

commit 24cdf05
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 24 09:11:38 2024 +0200

    Split installation and running into two jobs

    As one of the two jobs often failed during installation, while the other one succeeded. So it might be a race condition. Therefore, splitting installation and usage into separate jobs

commit 488e7a4
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 24 08:34:59 2024 +0200

    ci_pipeline.yml now more similar to upstream

commit 9ab9b63
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 23 15:36:39 2024 +0200

    Reduced diff to master

commit cdb77d4
Author: jakob-fritz <[email protected]>
Date:   Mon Apr 22 16:46:44 2024 +0200

    WIP: Use HPC for CI (Parallel-in-Time#386)

    Works on Parallel-in-Time#415

    Added sync with Gitlab, now also for pull requests

    ---------

    Co-authored-by: Robert Speck <[email protected]> and Thomas Baumann <[email protected]>

commit fb4b745
Author: Jakob Fritz <[email protected]>
Date:   Mon Apr 22 16:02:49 2024 +0200

    Moved development of action into main branch and added version-tag

commit 7de7187
Author: Jakob Fritz <[email protected]>
Date:   Mon Apr 22 11:19:53 2024 +0200

    Added triggers for workflows again

commit 5f45785
Author: Jakob Fritz <[email protected]>
Date:   Thu Apr 18 14:22:13 2024 +0900

    Updated name of step, as merge is not ff-only anymore

commit 2e9930f
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 17 13:36:44 2024 +0900

    Wrong syntax for if else

commit e33a611
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 17 13:33:40 2024 +0900

    Unshallow repo if needed

commit c7db47a
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 17 13:18:26 2024 +0900

    Add name and email for merge-commit

commit f34de9c
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 17 12:09:20 2024 +0900

    Also allow non-fast-forward merges

commit 82d9233
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 17 11:49:56 2024 +0900

    Don't run mirror on push now (as gitlab-file is incorrect in this branch)

commit d1b7250
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 17 11:49:02 2024 +0900

    Make unshallow before merging to properly compare history

commit 9cfeea3
Author: Jakob Fritz <[email protected]>
Date:   Wed Apr 17 11:17:48 2024 +0900

    Changed way to use variables (set locally and later in github_env)

commit 6961ef3
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 16:31:51 2024 +0900

    Reverted and changed way to store variable

commit d906604
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 16:21:54 2024 +0900

    Redone storing of var again

commit faec097
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 14:50:56 2024 +0900

    Corrected querying of a variable

commit cbf0b5d
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 13:57:31 2024 +0900

    Added more reporting for better debugging

commit efdaa05
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 13:23:08 2024 +0900

    Don't run main CI during development

commit ccd646a
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 13:22:44 2024 +0900

    First fetch, to be able to checkout branch

commit 2712998
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 12:15:13 2024 +0900

    Don't rerun CI on every push during this development

commit 8a316e2
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 12:14:31 2024 +0900

    Moved the check of condition from shell to yaml

commit d347bd3
Author: Jakob Fritz <[email protected]>
Date:   Tue Apr 16 11:52:02 2024 +0900

    Try to merge code (from PR) first

    So that merged state is tested in Gitlab-CI

commit bcd64a5
Author: Jakob Fritz <[email protected]>
Date:   Mon Feb 5 11:25:43 2024 +0100

    Use specific version of github2lab action

commit 28472dc
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 29 15:10:38 2024 +0100

    Uses newer checkout-action to use new node-version (20)

    Version 16 is deprecated

commit fefe88b
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 29 14:53:18 2024 +0100

    Minor formatting updates in README

    to trigger CI

commit 3de1b56
Author: Jakob Fritz <[email protected]>
Date:   Fri Jan 26 16:13:53 2024 +0100

    Formatted md-file to trigger CI

commit ef6a866
Author: Jakob Fritz <[email protected]>
Date:   Thu Jan 18 15:48:25 2024 +0100

    Set sha for checkout properly

commit be3aef7
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 15 16:14:41 2024 +0100

    Using default shallow checkout

    Otherwise, other own action complains

commit f38f0e5
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 15 16:11:05 2024 +0100

    Updated ref to use lastest code from PR; not merge

    Previously, a version of the code was used that was how a merge could look like.
    Now, the code is used as it is in the PR

commit 249741b
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 15 08:47:45 2024 +0100

    Updated workflow for mirroring

commit d8604b7
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 8 16:39:49 2024 +0100

    Try exapnding the predefined variable

commit c49accd
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 8 16:37:10 2024 +0100

    Another attempt to get the action to work

commit 5e0118a
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 8 16:28:51 2024 +0100

    Hopefully now, variable is expanded instead using the name

commit 832e7e5
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 8 16:07:29 2024 +0100

    Exit instead of return needed

    Because exiting the shell instead of a function

commit 5a5de4a
Author: Jakob Fritz <[email protected]>
Date:   Mon Jan 8 12:00:55 2024 +0100

    First version of CI to mirror pull_requests to Gitlab

    If someone with write-permissions triggered the workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants