-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add pixi project configuration #227
Conversation
@alexander-held My guess is that the list that @eguiraud determined in #144 (comment) has changed since then, but this PR currently just implements the requirements described in #199 (comment) but I assume there will be more that we will need to test with. |
396beda
to
09b06ed
Compare
@alexander-held Can the |
Okay, I'll want to rebase this to get it into a single commit before merge, but to run the
and then you're good to go as that will also properly install the environment you need (making sure that you select the |
@matthewfeickert yes let's remove the |
94544e8
to
57df86b
Compare
@alexander-held @oshadura I've managed to get the environment to solve but I need help debugging some issues testing it:
### GLOBAL CONFIGURATION
# input files per process, set to e.g. 10 (smaller number = faster)
N_FILES_MAX_PER_SAMPLE = 5
# enable Dask
USE_DASK = True
# enable ServiceX
USE_SERVICEX = False
### ML-INFERENCE SETTINGS
# enable ML inference
USE_INFERENCE = True
# enable inference using NVIDIA Triton server
USE_TRITON = False during the "`Execute the data delivery pipeline" cell of the notebook things fail with the following Traceback:---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[7], line 29
27 t0 = time.monotonic()
28 # processing
---> 29 all_histograms, metrics = run(
30 fileset,
31 treename,
32 processor_instance=TtbarAnalysis(USE_INFERENCE, USE_TRITON)
33 )
34 exec_time = time.monotonic() - t0
36 print(f"\nexecution took {exec_time:.2f} seconds")
File ~/analysis-grand-challenge-debug/.pixi/envs/cms-open-data-ttbar/lib/python3.9/site-packages/coffea/processor/executor.py:1700, in Runner.__call__(self, fileset, treename, processor_instance)
1679 def __call__(
1680 self,
1681 fileset: Dict,
1682 treename: str,
1683 processor_instance: ProcessorABC,
1684 ) -> Accumulatable:
1685 """Run the processor_instance on a given fileset
1686
1687 Parameters
(...)
1697 An instance of a class deriving from ProcessorABC
1698 """
-> 1700 wrapped_out = self.run(fileset, processor_instance, treename)
1701 if self.use_dataframes:
1702 return wrapped_out # not wrapped anymore
File ~/analysis-grand-challenge-debug/.pixi/envs/cms-open-data-ttbar/lib/python3.9/site-packages/coffea/processor/executor.py:1848, in Runner.run(self, fileset, processor_instance, treename)
1843 closure = partial(
1844 self.automatic_retries, self.retries, self.skipbadfiles, closure
1845 )
1847 executor = self.executor.copy(**exe_args)
-> 1848 wrapped_out, e = executor(chunks, closure, None)
1849 if wrapped_out is None:
1850 raise ValueError(
1851 "No chunks returned results, verify ``processor`` instance structure.\n\
1852 if you used skipbadfiles=True, it is possible all your files are bad."
1853 )
File ~/analysis-grand-challenge-debug/.pixi/envs/cms-open-data-ttbar/lib/python3.9/site-packages/coffea/processor/executor.py:974, in DaskExecutor.__call__(self, items, function, accumulator)
967 # FIXME: fancy widget doesn't appear, have to live with boring pbar
968 progress(work, multi=True, notebook=False)
969 return (
970 accumulate(
971 [
972 work.result()
973 if self.compression is None
--> 974 else _decompress(work.result())
975 ],
976 accumulator,
977 ),
978 0,
979 )
980 except KilledWorker as ex:
981 baditem = key_to_item[ex.task]
File ~/analysis-grand-challenge-debug/.pixi/envs/cms-open-data-ttbar/lib/python3.9/site-packages/distributed/client.py:322, in Future.result(self, timeout)
320 self._verify_initialized()
321 with shorten_traceback():
--> 322 return self.client.sync(self._result, callback_timeout=timeout)
File /opt/conda/lib/python3.9/site-packages/coffea/processor/executor.py:221, in __call__()
220 def __call__(self, *args, **kwargs):
--> 221 out = self.function(*args, **kwargs)
222 return _compress(out, self.level)
File /opt/conda/lib/python3.9/site-packages/coffea/processor/executor.py:1367, in automatic_retries()
1361 break
1362 if (
1363 not skipbadfiles
1364 or any("Auth failed" in str(c) for c in chain)
1365 or retries == retry_count
1366 ):
-> 1367 raise e
1368 warnings.warn("Attempt %d of %d." % (retry_count + 1, retries + 1))
1369 retry_count += 1
File /opt/conda/lib/python3.9/site-packages/coffea/processor/executor.py:1336, in automatic_retries()
1334 while retry_count <= retries:
1335 try:
-> 1336 return func(*args, **kwargs)
1337 # catch xrootd errors and optionally skip
1338 # or retry to read the file
1339 except Exception as e:
File /opt/conda/lib/python3.9/site-packages/coffea/processor/executor.py:1572, in _work_function()
1570 item, processor_instance = item
1571 if not isinstance(processor_instance, ProcessorABC):
-> 1572 processor_instance = cloudpickle.loads(lz4f.decompress(processor_instance))
1574 if format == "root":
1575 filecontext = uproot.open(
1576 {item.filename: None},
1577 timeout=xrootdtimeout,
(...)
1580 else uproot.MultithreadedFileSource,
1581 )
ModuleNotFoundError: No module named 'servicex' which seems to indicate that the existence of the |
A follow up question: Is there an analysis facility where the CMS ttbar open data workflow has been run with $ git grep --name-only "USE_SERVICEX"
analyses/cms-open-data-ttbar/ttbar_analysis_pipeline.ipynb
analyses/cms-open-data-ttbar/ttbar_analysis_pipeline.py
analyses/cms-open-data-ttbar/utils/metrics.py
docs/facilityinstructions.rst isn't particularly deep. |
Now that #225 is merged, we can target the v3 API of the ServiceX frontend.
Should be https://opendataaf-servicex.servicex.coffea-opendata.casa/. As for the other question about importing, that's with your own environment? Not sure what causes this but perhaps we can update to v3 and then debug that one. |
@matthewfeickert The ServiceX instance was upgraded during the last couple of days, and now it back works. |
22k lines of changes are coming from |
I am not sure why we need to remove |
I used requirements.txt to create from scratch a conda environment to run my I/O tests. I'm not familiar with pixi, but if it can be used for the exact same use case, it should be fine. Otherwise, keeping a requirements.txt might be handy. |
@sciaba I agree with you :) and I was just telling Alex about your use-case |
@matthewfeickert can we keep in sync both environments? prefix-dev/pixi#1410 |
Okay, let me refactor this to use v3. That will be easier.
@alexander-held Yes. I don't think that having a different version of the library will matter, but we'll see.
Merci @oshadura! 🙏
@oshadura Yes, lock files are long to begin with and this is a multi-platform and multi-environment lock file. I would suggest not trying to keep around the old
@sciaba Yes,
The suggested idea in that issue is going the wrong direction ( |
When I rebase my PR I'll not remove the |
a08008f
to
eb4aa30
Compare
I am suggesting to remove jupyterlab environment or make it optional. This is very confusing for users, especially for power users who want to test notebook / python script in the facility or particular environment where is not needed jupyterlab. |
5f00329
to
021c741
Compare
Okay, I can refactor this into another feature + environment. Why is this confusing for users though? I would think they should be unaware of its existence. |
I tried to test, and |
@alexander-held @oshadura I've moved this out of draft and this is now ready for review. I've added notes for reviewers in the PR body, but all information should be clear from the additions to the README. If not, then I need to revise it. (sorry, last |
3786611
to
3989c45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high level guiding notes if you're new to how pixi
manifest files work. Feel free to ignore.
a8a68b9
to
9ab4a45
Compare
@alexander-held @oshadura If you have time to review this week that would be great. I'll also note for context here that I went with the idea of having things be in a top level for the whole project, but if it would be of more interest to have each analysis be a separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to run locally and I see next error:
2024-11-06 14:36:45,504 - distributed.worker - WARNING - Compute Failed
Key: TtbarAnalysis-5c778b8f1e703fd7fe17b7cd2972d7ed
Function: TtbarAnalysis
args: ((WorkItem(dataset='wjets__nominal', filename='https://xrootd-local.unl.edu:1094//store/user/AGC/nanoAOD/WJetsToLNu_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/cmsopendata2015_wjets_20547_PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext2-v1_10000_0004.root', treename='Events', entrystart=788276, entrystop=985345, fileuuid=b'#\x96\x8fdt\x8a\x11\xed\x8e[\xa6\xef]\x81\xbe\xef', usermeta={'process': 'wjets', 'xsec': 15487.164, 'nevts': 5913030, 'variation': 'nominal'}), b'\x04"M\x18H@{\x02"\x00\x00\x00\x00\x00a\x04\x94\x00\x00a\x80\x05\x95@A\x00\x01\x00\xe7\x8c\x17cloudpickle.\x0c\x00\xf6@\x94\x8c\x14_make_skeleton_class\x94\x93\x94(\x8c\x03abc\x94\x8c\x07ABCMeta\x94\x93\x94\x8c\rTtbarAnalysis\x94\x8c\x1acoffea.processor\n\x00D\x94\x8c\x0cP\x16\x00\xf2@ABC\x94\x93\x94\x85\x94}\x94\x8c\n__module__\x94\x8c\x08__main__\x94s\x8c c4f9f7e4f41d480e87c970e516ebf57a\x94Nt\x94R\x94h\x00\x8c\x0f\xa3\x00\xf2\x15_setstate\x94\x93\x94h\x10}\x94(h\x0ch\r\x8c\x08__init__\x94h\x00\x8c\x0e\xdb\x00\xf5Tfunction\x9
kwargs: {}
Exception: 'AttributeError("module \'setuptools\' has no attribute \'extern\'")'
I think you need to pin setuptools, but I dont know how to do this with pixie.
Also please add in README to run locally you need to update in utils/config.py
: "AF": "local"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to remove coffea-casa part for now since we don't have a solution on how to ship the pixie environment to workers and we can try to resolve it in the next pull request.
Already running with the new kernel, I see a version mistmatch between, client, scheduler and workers...
/home/cms-jovyan/agc-servicex/.pixi/envs/cms-open-data-ttbar/lib/python3.9/site-packages/distributed/client.py:1391: VersionMismatchWarning: Mismatched versions found
+---------+----------------+----------------+---------+
| Package | Client | Scheduler | Workers |
+---------+----------------+----------------+---------+
| lz4 | 4.3.3 | 4.3.2 | None |
| msgpack | 1.1.0 | 1.0.6 | None |
| python | 3.9.20.final.0 | 3.9.18.final.0 | None |
| toolz | 1.0.0 | 0.12.0 | None |
| tornado | 6.4.1 | 6.3.3 | None |
+---------+----------------+----------------+---------+
warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))
Why is that needed? The current workers are using the same coffea-casa environment as the coffea-casa client the user drops into at pod launch, right? You didn't ship the
Yes, I already evaluated that having the exact scheduler versions pinned here isn't needed. We can of course match things exactly (and I did earlier in this PR), but for runtime evaluation these differences don't seem to matter. |
Just to confirm, you don't see this when running locally with an environment created from analyses/cms-open-data-ttbar/requirements.txt? |
The version on client/scheduler and workers should be exactly the same, otherwise distributed dask usually crash (that is why we have a warning). What is happening is that your client environment has now a different version of python (and other packages) compared to my scheduler and worker environment on coffea-casa. |
0.7.x coffea works only with "setuptools<71" and I see in your environment you have higher version: |
I think honestly the main focus of this PR could help to run AGC locally, since on facility usually the environment is customized and not easy to handle in such a way (we have too many components). I would suggest dividing PR functionality on local setup and facility and to follow up on the facility setup in separate PR? |
Yes, that is why I tested it to find versions that wouldn't crash and noted the precautions section # coffea-casa precautions: keep the drift from scheduler environment small
pandas = ">=2.1.2, <2.2.4"
lz4 = ">=4.3.2, <4.3.4"
msgpack-python = ">=1.0.6, <1.1.1"
toolz = ">=0.12.0, <1.0.1"
tornado = ">=6.3.3, <6.4.2" but I'll just change these to not be bounds but exact versions. We can do that same with the CPython version, but as the versions differ only in the patch version (which is for security patches) the language feature set is the same across all Python |
I think this already does that. This PR is just meant to give people an environment lock file that reproduces the same runtime state as the "Coffea-casa build with coffea 0.7.21/dask 2022.05.0/HTCondor and cheese" instance. It provides a more tractable way to describe the environment than the existing |
1e99b3b
to
96c0dc1
Compare
Okay, thanks to @oshadura's work on the UNL Coffea-casa things are running there again for me as expected on both the default Coffea-casa environment and the So @oshadura and @alexander-held this should be ready for review again. The client environment now matches the scheduler environment fully. c.f. the # coffea-casa precautions: exactly match scheduler environment
python = "3.9.18.*"
pandas = "2.1.2.*"
lz4 = "4.3.2.*"
msgpack-python = "1.0.6.*"
toolz = "0.12.0.*"
tornado = "6.3.3.*" section. I've run on the UNL Coffea-casa with the |
96c0dc1
to
eff6727
Compare
e0163b1
to
99288b0
Compare
* Add pixi manifest (pixi.toml) and pixi lockfile (pixi.lock) to fully specify the project dependencies. This provides a multi-environment multi-platform (Linux, macOS) lockfile. * In addition to the default feature, add 'latest', 'cms-open-data-ttbar', and 'local' features and corresponding environments composed from the features. The 'cms-open-data-ttbar' feature is designed to be compatible with the Coffea Base image which uses SemVer coffea (Coffea-casa build with coffea 0.7.21/dask 2022.05.0/HTCondor and cheese). - The cms-open-data-ttbar feature has a 'install-ipykernel' task that installs a kernel such that the pixi environment can be used on a coffea-casa instance from a notebook. - The local features have the canonical 'start' task that will launch a jupyter lab session inside of the environment. * Add use instructions for the pixi environments to the cms-open-data-ttbar README.
99288b0
to
bc16750
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the discussion and work here, I think we're good to merge!
Thanks for all the help on this one, @oshadura and @alexander-held! |
requirements.txt
does not work with Python 3.11 #144requirements.txt
#140pixi
manifest (pixi.toml
) andpixi
lockfile (pixi.lock
) to fully specify the project dependencies. This provides a multi-environment multi-platform (Linux, macOS) lockfile.latest
,cms-open-data-ttbar
, andlocal
pixi features and corresponding environments composed from the features. Thecms-open-data-ttbar
feature is designed to be compatible with the Coffea Base image which uses SemVercoffea
(Coffea-casa build with coffea 0.7.21/dask 2022.05.0/HTCondor and cheese).cms-open-data-ttbar
feature has aninstall-ipykernel
task that installs a kernel such that the pixi environment can be used on a coffea-casa instance from a notebook.start
task that will launch a jupyter lab session inside of the environment.This will also be able to support the results of PR #225 after that PR is merged with just a few updates from
pixi
. 👍Tip
Instructions for reviewers testing the PR:
pixi
if you haven't alreadyipykernel
for thecms-open-data-ttbar
environmentanalyses/cms-open-data-ttbar/ttbar_analysis_pipeline.ipynb
cms-open-data-ttbar