Skip to content

Commit

Permalink
feat: Add pixi project configuration (#227)
Browse files Browse the repository at this point in the history
* Add pixi manifest (pixi.toml) and pixi lockfile (pixi.lock) to fully
  specify the project dependencies. This provides a multi-environment
  multi-platform (Linux, macOS) lockfile.
* In addition to the default feature, add 'latest', 'cms-open-data-ttbar', and
  'local' features and corresponding environments composed from the features.
  The 'cms-open-data-ttbar' feature is designed to be compatible with the
  Coffea Base image which uses SemVer coffea
  (Coffea-casa build with coffea 0.7.21/dask 2022.05.0/HTCondor and cheese).
   - The cms-open-data-ttbar feature has a 'install-ipykernel' task that
     installs a kernel such that the pixi environment can be used on a
     coffea-casa instance from a notebook.
   - The local features have the canonical 'start' task that will launch a
     jupyter lab session inside of the environment.
* Add use instructions for the pixi environments to the cms-open-data-ttbar README.
  • Loading branch information
matthewfeickert authored Nov 21, 2024
1 parent 62f51e7 commit 9f5bb83
Show file tree
Hide file tree
Showing 5 changed files with 23,190 additions and 1 deletion.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
*.model filter=lfs diff=lfs merge=lfs -text
# GitHub syntax highlighting
pixi.lock linguist-language=YAML linguist-generated=true
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,7 @@ analyses/cms-open-data-ttbar/metrics

# dask
dask-worker-space/

# pixi environments
.pixi
*.egg-info
48 changes: 47 additions & 1 deletion analyses/cms-open-data-ttbar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,57 @@ This directory is focused on running the CMS Open Data $t\bar{t}$ analysis throu
| utils/config.py | This is a general config file to handle different options for running the analysis. |
| utils/hepdata.py | Function to create tables for submission to the [HEP_DATA website](https://www.hepdata.net) (use `HEP_DATA = True`) |

#### Setting up the environment

##### On Coffea-casa

1. Install [`pixi`](https://pixi.sh/latest/#installation).
2. From the top level of the entire repository run

```
pixi run --environment cms-open-data-ttbar install-ipykernel
```

This will install all of the software and create an `ipykernel` that the Coffea-casa Jupyter Lab instance will be able to see.

3. In the Coffea-casa Jupyter Lab browser, navigate and open up the `analyses/cms-open-data-ttbar/ttbar_analysis_pipeline.ipynb`.
4. Change the kernel of the notebook to be `cms-open-data-ttbar`.

##### On a local machine

To get a local Python environment that has all the software required for the analysis:

1. Install [`pixi`](https://pixi.sh/latest/#installation) on your machine.
2. Update `analyses/cms-open-data-ttbar/utils/config.py` to use `"local"` for the `"AF"` key.

```
sed -i 's/"AF": "coffea_casa"/"AF": "local"/g' analyses/cms-open-data-ttbar/utils/config.py # Linux
```
```
sed -i '' 's/"AF": "coffea_casa"/"AF": "local"/g' analyses/cms-open-data-ttbar/utils/config.py # macOS
```
3. From the top level of the entire repository run

```
pixi run --environment local-cms-open-data-ttbar start
```

This will install all of the software and launch a Jupyter lab session.
You can then use the file navigator and terminal in Jupyter lab to navigate to this directory to run the analysis.

**Note**: Given the size of the files, when running locally you will probably want to set the `USE_SERVICEX` global configuration variable in the `analyses/cms-open-data-ttbar/ttbar_analysis_pipeline.ipynb` notebook to `True`

```python
USE_SERVICEX = True
```

This requires you to have a ServiceX configuration file on your machine.

#### Instructions for paired notebook

If you only care about running the `ttbar_analysis_pipeline.ipynb` notebook, you can completely ignore the `ttbar_analysis_pipeline.py` file.

This notebook (`ttbar_analysis_pipeline.ipynb`) is paired to the file `ttbar_analysis_pipeline.py` via Jupytext (https://jupytext.readthedocs.io/en/latest/). Using `git diff` with this file instead of the `.ipynb` file is much simpler, as you don't have to deal with notebook metadata or output images. However, in order for the notebook output to be preserved, the notebook still needs to be version controlled. It is ideal to run `git diff` with the option `-- . ':(exclude)*.ipynb'`, so that `.ipynb` files are ignored.
This notebook (`ttbar_analysis_pipeline.ipynb`) is paired to the file `ttbar_analysis_pipeline.py` via Jupytext (https://jupytext.readthedocs.io/en/latest/). Using `git diff` with this file instead of the `.ipynb` file is much simpler, as you don't have to deal with notebook metadata or output images. However, in order for the notebook output to be preserved, the notebook still needs to be version controlled. It is ideal to run `git diff` with the option `-- . ':(exclude)*.ipynb'`, so that `.ipynb` files are ignored.

The `.py` file can also be run as a Python script.

Expand Down
Loading

0 comments on commit 9f5bb83

Please sign in to comment.