Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use (but don't require) denv in ldmx env #1269

Closed
wants to merge 9 commits into from

Conversation

tomeichlersmith
Copy link
Member

@tomeichlersmith tomeichlersmith commented Mar 8, 2024

I am updating ldmx-sw, here are the details.

What are the issues that this addresses?

This resolves #1232 directly, resolves #1248 directly, and resolves #1252 indirectly. I do this by moving the complicated container interaction scripting into its own project such that it can be more directly tested and version controlled. The environment scripts here are then just using this other shell program to help replicate the current environment.

The long-term benefit of isolating the container interaction into its own program amounts to allowing users to do the same process we do with ldmx-sw with any of their software. Below I have two examples that I do regularly and one idea that I could see becoming commonplace if we adopt denv within LDMX.

Pin ldmx-sw Version for Production

I can use denv to choose an ldmx-sw version and use it pre-compiled. It is pinned and is not cluttered by my developments happening elsewhere on my system.

denv init ldmx/pro:v3.3.3
denv fire my-config.py
# runs with ldmx-sw v3.3.3

Pin Python and Packages

I often want to develop a python analysis to share with someone else and it is very helpful to be using the same python version and the same version of all the packages. pip and venv offer a good way to pin the python package versions, but often times it is difficult to align the python version across systems. Luckily for us, python builds container images that we can use directly with denv.

# choose a python version
denv init python:3.11
# install the requirements of the analysis into this denv's home directory
denv python3 -m pip install -r requirements.txt
# run your analysis or whatever
denv python3 analysis.py

Sharing Environments

In some cases, the workspace that I use with denv (the directory where I ran denv init) is also a git repository. In these cases, I can commit the config written by denv so that others who want to replicate the work I did with that code can simply

git clone <repo>
cd <repo>
denv <command>

Requested Feedback

Note that I am not removing the current ldmx-env.sh script. It has many peculiarities that we have gotten used to and I have not replicated all of them with denv. This was intentional since many of these peculiarities are born out of just trying to get it to work and I have now figured out a better solution. This is a first PR to make using denv an available path for people looking to develop ldmx-sw. I wish to have reviews of this PR focus on the following

  1. Whether this is a good idea. There was good discussion in [Discussion] Refactor ldmx-env.sh to use denv for container interaction #1232 but I want to re-open the discussion with this more concrete proposal. I also want to point out that I am not committed to the draft roadmap I wrote in that issue, it was just me dumping my thoughts onto GitHub.
  2. Ease the transition (If this is a good idea). How can we design these files to help ease the transition from the current set of bash functions (a strategy called ldmx here as shorthand) to more simply wrapping another program (strategy called denv here as shorthand). Even if we stay with a bash environment script with bash functions wrapping denv indefinitely, there are still some aspects of the workflow that will change.
  3. Feature proposals (If this is a good idea). If we are already transitioning the workflow to a slightly different form, I think it is a good time to try to accommodate other features that people might enjoy. I'm happy to discuss ideas to see what may be helpful.

To Do

  • Check for CVMFS like is suggested in support /cvmfs/unpacked.cern.ch in environment script #1252 (comment)
  • Test bash wrapper functions (in ldmx-denv-transition.sh)
    • config, use, pull, mount, setenv
  • Update README pointing users to denv's docs
  • Change default installation from ldmx-sw/install to $LDMX_BASE/.local. This aligns with denv's design choice to make the workspace (for us, LDMX_BASE) the home directory in the container which means the paths under ~/.local are often already included by the various PATH variables. Taken off list because we need to update PYTHONPATH no matter what and I like how easy it is for folks to delete the build and the install when trying to debug.
  • Update the PATH variables in the ${LDMX_BASE}/.profile file. A template file is written by denv if it doesn't already exist. Copy environment setup to /etc/skel/.profile docker#94 We need to do this even if we changed ldmx-sw's default installation to ${LDMX_BASE}/.local because we do not put our python config modules into a directory that is automatically searched by python (unless it is included in PYTHONPATH).

Context

To provide further context. The ldmx-denv-init.sh file just initilizes denv trying to be POSIX-sh compliant so that it could be used in other POSIX shells. The ldmx-denv-transition.sh file is a bash-specific environment file that wraps denv with various bash functions to try to mimic the behavior of the current ldmx command. Below, I have these translations written out in a more human readable form.

ldmx Command denv Equivalent
help help
config config print
use config image <image>
pull config image pull
mount config mounts <dir>
setenv config env copy
run removed
checkout removed
source removed
list removed
clean removed

Explanations for Removals

All of the ldmx commands that do not have a denv equivalent are not included within denv because they do not directly pertain to interfacing with the container runner. I also am proposing removing them entirely from our ecosystem largely because they are either (a) broken, (b) not used widely, or (c) both.

  • run: In the original bash functions, this avoids the directory movement outside of the container and directly passes the arguments to the entrypoint script inside the container. I am removing this because it is overly complicated to replicate with denv in a robust way especially since the easiest way to replicate it is to tell users to just cd <dir>; denv <cmd>.
  • checkout: this tries to code a more efficient combination of git checkout and git submodule update so that users can more easily switch ldmx-sw branches; however, this is not functional. Moreover, git provides a better description of how to switch between branches with different submodule commits and I think we should simply guide developers to this resource.
  • source: this is here to allow users to run multiple ldmx commands at once intending to have a single file defining the configuration of the environment conducted by ldmx. This command is not necessary in denv since denv stores the environment configuration within a text file at <workspace>/.denv/config (for us, ldmx-sw/../.denv/config since the workspace is the parent of ldmx-sw).
  • list: I tried to write some plain-shell code to list the available tags of the image. Since time of writing, DockerHub has changed its API breaking this code. Nobody else has noticed (or even if they have noticed, it wasn't a big enough deal to make an issue or post on slack), so I think removing it and just pointing people to the GitHub releases and DockerHub is good.
  • clean: This is the messiest command and has three sub-tasks.
    • src: Clean out all untracked files from the source file tree. Just a wrapper around git clean basically and with the moving of the generated files out from Framework and into the build directory, something more complicated that rm -r build install is not necessary anymore.
    • container: Remove the downloaded container images. This is a dangerous action and I feel more comfortable instructing people on how to list the images already downloaded and remove them manually (either with docker image rm or rm depending on the runner).
    • env: reset the environment variables. denv does not store any important information in environment variables preferring instead to write and read the config file.

Sorry, something went wrong.

@EinarElen
Copy link
Contributor

Right, I've tried to look at this as if I was a completely new user
First thing I tried:

  • In zsh (default shell on MacOS)
. scripts/ldmx-denv-init.sh

Gives

ERROR: Unable to deduce a denv workspace!
       Are you in a denv workspace? Do you still need to 'denv init'?
ERROR: Unable to deduce a denv workspace!
       Are you in a denv workspace? Do you still need to 'denv init'?
ERROR [ldmx-denv-init.sh]: 'denv' unable to find a supported container runner.
       Install one of the container runners 'denv' checked for above.

Same on bash, so that's not the issue

The reason for this is of course that i haven't run denv init yet. I would appreciate if the new scripts would check

  1. Is denv installed? If not, offer to do it y/n
  2. Do we have a workspace initialized? If not, offer to do it y/n. You could offer a choice between ldmx/pro and ldmx/dev

It looks like this is what ldmx-denv-init tries to do but it doesn't work for me at least :(

Very minor:
Maybe it is time to patch the typo in: You're default shell '/bin/zsh' isn't bash.

@tomeichlersmith
Copy link
Member Author

Thank you for giving this a try Einar!! 🥇

I have employed shellcheck to start helping debug these scripts and I've already found many patches to make. One of the issues shellcheck found seems to be the primary reason the ldmx-denv-init.sh file is failing for your zsh. I'm frankly surprised it didn't work on bash for you since it does work on my bash but I haven't setup a test VM for play around with different configurations yet.

tom@zuko:~/code/ldmx/ldmx-sw/scripts$ shellcheck ldmx-denv-init.sh 

In ldmx-denv-init.sh line 71:
_default_denv_workspace="$(dirname "${BASH_SOURCE[0]}" )/../../"
                                    ^---------------^ SC3028 (warning): In POSIX sh, BASH_SOURCE is undefined.
                                    ^---------------^ SC3054 (warning): In POSIX sh, array references are undefined.

For more information:
  https://www.shellcheck.net/wiki/SC3028 -- In POSIX sh, BASH_SOURCE is undef...
  https://www.shellcheck.net/wiki/SC3054 -- In POSIX sh, array references are...

Unfortunately, finding the path to the current source file in a POSIX-sh compliant way is hard (or impossible when sourcing a file rather than executing it?) so I'm investigating solutions. We might need to abandon a POSIX init script in favor of documentation if there isn't another way to deduce LDMX_BASE.

@tvami
Copy link
Member

tvami commented Mar 11, 2024

I did the following on UCSB's cluster which uses bash:

module load singularity
git clone https://github.com/LDMX-Software/ldmx-sw.git
git checkout 1232-use-denv-in-ldmx-env
denv --clean-env  --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"

leads to

ERROR: Unable to deduce a denv workspace!
       Are you in a denv workspace? Do you still need to 'denv init'?

Then I run

 .  scripts/ldmx-denv-init.sh

which leads to

ERROR: denv requires the --env flag for singularity which was introduced in v3.6
ldmx/dev:latest
INFO:    Starting build...
FATAL:   While performing build: conveyor failed to get: Error initializing source oci:/home/vamitamas/.singularity/cache/oci:a87552b8a96913893b47298ff754c5b52157d585edbc5c8e016033bc9c3e4d07: Error initializing image from source docker://ldmx/dev:latest: unsupported schema version 2

(now same thing happens with denv --clean-env --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}" )

The singularity version is 3.5.2, so I guess this not working out of the box is expected.

@tomeichlersmith
Copy link
Member Author

🙌 thank you @tvami 🙌

There are two separate "issues" that your comment brings up.

singularity version

UCSB also has apptainer in a separate module (which is the new name after it was adopted by the Linux Foundation), so you should be doing

module load apptainer

to prefer the newer version. The last error you see after . scripts/ldmx-denv-init.sh is related to the old version of singularity. I do not plan on supporting the older singularity versions and instead point people towards the newer apptainer (all clusters LDMX uses have access to apptainer: #1248 (comment))

init typo

The first error you see from running

denv --clean-env  --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"

is because there is a typo in the README I wrote (now fixed). It should be

denv init --clean-env  --name "ldmx" "ldmx/dev:latest" "${LDMX_BASE}"

In this call, we are using LDMX_BASE as a pre-defined environment variable that stores the path to LDMX_BASE.
If printenv LDMX_BASE returns a path, then you are all good! If not, you can just use a relative path when running this directly.
For example, from the ldmx-sw repository

denv init --clean-env --name ldmx ldmx/dev:latest ..

@tvami
Copy link
Member

tvami commented Mar 13, 2024

OK this worked now for the environment, however,

denv fire .github/validation_samples/ecal_pn/config.py 
sh: 1: fire: not found

Doing

denv init ldmx/pro:v3.3.3
denv fire .github/validation_samples/ecal_pn/config.py 

then leads to

---- LDMXSW: Loading configuration --------
Traceback (most recent call last):
  File "/home/vamitamas/Testing1269/ldmx-sw/.github/validation_samples/ecal_pn/config.py", line 1, in <module>
    from LDMX.Framework import ldmxcfg
ModuleNotFoundError: No module named 'LDMX'
Configuration Error [ConfigureError] : Problem loading python script
  at /code/Framework/src/Framework/ConfigurePython.cxx:315 in ConfigurePython
Stack trace: 
    0 /usr/local/lib/libFramework.so(+0x683d4) [0x7fe56f5df3d4]
    1 main + 280 addr2line: 'fire': No such file

    2 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fe56ebded90]
Configuration Error [ConfigureError] : Problem loading python script
  at /code/Framework/src/Framework/ConfigurePython.cxx:315 in ConfigurePython

One minor technical comment I'll have is: now this adds the .denv/ directory and that comes up in git status, I'd suggest to edit the .gitignore file and add that to the list to be ignored.

On the philosophical side, why not get rid off the ldmx env fully, supporting both seems to be some overhead in the maintenance. (I'm sorry if this was discussed and I just didnt follow this close enough). Although ldmx fire does sounds more ldmx-y than denv fire, maybe we could have an ldmxFire built-in alias that runs denv fire? This is truly a cosmetics and sentimental comment tho, feel free to ignore it.

@jmmans
Copy link
Contributor

jmmans commented Mar 13, 2024

While I appreciate the idea of a cleanup, and it makes sense to be able to share useful functionality, I feel a little nervous about making the core LDMX software dependent on a package which is owned/maintained by a specific individual. Does this make sense in a long-term manner?

@tomeichlersmith
Copy link
Member Author

@tvami I think both of your errors make sense in the context that the denv-run container does not run the container's entrypoint. Specifically, this means the PATH is not updated with ${LDMX_BASE}/ldmx-sw/install/bin (in the first case) and the PYTHONPATH is not updated with /usr/local/python (in the second case).

As far as the location of the .denv directory, I intend it to be located in ${LDMX_BASE} (i.e. in the directory above ldmx-sw); however, it was created within ldmx-sw when you ran denv init within ldmx-sw. (If you just want to change the image, you should use denv config image <tag>.) I am unsure on how to enforce this preference so thoughts along these lines are welcome.

I agree with you that supporting both ldmx and denv would be more maintenance. To be clear, I think the long term workflow would be to move ldmx to just be calling denv (like is being done in ldmx-denv-transition.sh) but even in that case, I fear it is too much. Again, I am unsure on how to meet my dueling goals of easy transition and long term maintainability.


@jmmans I think your concern is well founded. I've been trying to think about the long term sustainability of this project especially if LDMX adopts it. I actually presented this project to the Hep Software Foundation last September where I received good feedback but I was unsure on the next steps. Perhaps I reopen discussions with them about transitioning ownership of denv to HSF? Another obvious option is for me to transfer ownership to LDMX-Software or have LDMX-Software keep a fork.

@tomeichlersmith
Copy link
Member Author

In order to get running functional, we (currently) need to edit the .profile file copied into ${LDMX_BASE} by denv. This file is copied from the /etc/skel directory of the image so we can avoid requiring this by updating the skeleton file in the image: LDMX-Software/docker#94

After the first attempt to run denv after it has been init'ed, it copies the skeleton files into the base directory.

tom@nixos:~/code/ldmx$ ls -A
.bash_history  .bash_logout  .bashrc  .denv  ldmx-sw  .profile

We then are free to update them as we see fit since they are only copied into place if they don't exist. I wrote the following into the .profile file above copying the code from our current image's entrypoint script and changing LDMX_BASE to HOME. The LDMX_SW_INSTALL environment variable is defined by the production image when it is being built so we check if that is defined. If it isn't, then we define it to be ~/ldmx-sw/install since the denv workspace (what we call LDMX_BASE) is defined as the HOME directory inside the container. (This basically makes defining the LDMX_BASE environment variable redundant when using denv. It is more familiar terminology than "denv workspace" though so I am keeping it around)

if [ -z "${LDMX_SW_INSTALL}" ]; then 
  export LDMX_SW_INSTALL=$HOME/ldmx-sw/install 
fi 
export LD_LIBRARY_PATH=$LDMX_SW_INSTALL/lib:$LD_LIBRARY_PATH 
export PYTHONPATH=$LDMX_SW_INSTALL/python:$LDMX_SW_INSTALL/lib:$PYTHONPATH 
export PATH=$LDMX_SW_INSTALL/bin:$PATH 
 
#add what we need for GENIE  
export LD_LIBRARY_PATH=$GENIE/lib:/usr/local/pythia6:$LD_LIBRARY_PATH 
export PATH=$GENIE/bin:$PATH 
 
# add externals installed along side ldmx-sw 
# WARNING: No check to see if there is anything in this directory 
for _external_path in $LDMX_SW_INSTALL/external/*/lib 
do 
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$_external_path 
done 
unset _external_path 

This extra shell code updates the necessary environment variables in the container so that fire is found in PATH and the config python modules are found in PYTHONPATH.

tom@nixos:~/code/ldmx/ldmx-sw$ denv fire SimCore/test/basic.py                                                                                                                                                                                
---- LDMXSW: Loading configuration --------                                                                                                                                                                                                   
---- LDMXSW: Configuration load complete  --------                                                                                                                                                                                            
---- LDMXSW: Starting event processing --------  
<omitted running output for brevity>
[ Simulator ] : Started 10 events to produce 10 events.
---- LDMXSW: Event processing complete  --------

The .profile updated with the shell code above also works with production images.

tom@nixos:~/code/ldmx/ldmx-sw$ denv config image ldmx/pro:latest
<omit image downloading printouts for brevity>
tom@nixos:~/code/ldmx/ldmx-sw$ denv which fire
/usr/local/bin/fire
tom@nixos:~/code/ldmx/ldmx-sw$ denv fire SimCore/test/basic.py                                                                                                                                                                                
---- LDMXSW: Loading configuration --------                                                                                                                                                                                                   
---- LDMXSW: Configuration load complete  --------                                                                                                                                                                                            
---- LDMXSW: Starting event processing --------  
<omitted running output for brevity>
[ Simulator ] : Started 10 events to produce 10 events.
---- LDMXSW: Event processing complete  --------

@tomeichlersmith
Copy link
Member Author

I'm going to delay this until after May 1st. It will be a good example for how to update a branch (and what the resulting commit history will look like) and I don't have time to polish this as much as I think is necessary for it to be ready to merge.

@tomeichlersmith tomeichlersmith force-pushed the 1232-use-denv-in-ldmx-env branch from 698cafa to 96526b4 Compare May 1, 2024 21:08
@tomeichlersmith
Copy link
Member Author

This last force push was me following the online instructions for updating a branch from v3.4.0 to v4.0.0.

remove impossible POSIX-sh setup script and instead encourage users to
wrap denv specifically for their shell or use denv directly
avoid code repetition by calling compile from within recompile and fire
@tomeichlersmith
Copy link
Member Author

The more I've thought about it the more I'm displeased with this solution.1 Since denv would take over responsibility of container interaction, this leaves only defining specific "tasks" for our custom shell script. This idea of a "task runner" has already been encountered by many other programmers and there are many solutions available.

I think the long term solution is to pick a task runner to handle these tasks for us. While we wouldn't have control over how this task runner is implemented, it would simplify defining new tasks and allow us to have a single (and short!) file that lists all of the common commands and how they are run.

For this reason, I'm going to close this PR and focus more on writing a just file for developing ldmx-sw. In addition, there is an open issue with denv (tomeichlersmith/denv#80) that I want to resolve before fully adopting denv within LDMX.

Footnotes

  1. Talk on task runners at SW Dev https://indico.fnal.gov/event/64896/

@tomeichlersmith tomeichlersmith deleted the 1232-use-denv-in-ldmx-env branch September 2, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants