Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable container creation #32

Open
ekourlit opened this issue Feb 21, 2024 · 10 comments · Fixed by #34 or usatlas/analysisbase-dask-uc#7
Open

Stable container creation #32

ekourlit opened this issue Feb 21, 2024 · 10 comments · Fixed by #34 or usatlas/analysisbase-dask-uc#7
Assignees
Labels
enhancement New feature or request

Comments

@ekourlit
Copy link

Hi! We would like to create the stable version of the ATLAS container following this requirements.txt.

This should be on top of AnalysisBase 24.2.3 (I found this from here) plus having the ColumnarPrototype in. @matthewfeickert should know how to do that.

Tagging @mvigl as well.

Thanks in advance!

@matthewfeickert matthewfeickert added the enhancement New feature or request label Feb 21, 2024
@alexander-held
Copy link

If these versions of coffea + awkward are known to work for the needs here then I think that is fine as a starting point, but I would also be curious to know if there is a reason for that exact coffea commit. Is this intended as a lower bound or an upper bound? Are there already known issues with the latest coffea release? There have been a number of fixes since and I would not be surprised if they affect scaling tests, so I am just raising this to avoid running into the same problems again that may already have been solved in later versions.

@ekourlit
Copy link
Author

I'm not sure I know the exact reasons why this version works, this has been tuned by @nikoladze and @mvigl.

There are indeed new updates in coffea, awkward and uproot. However, I know that the uproot fixes have not been released yet. Look at scikit-hep/uproot5#1114.

So what I have in mind is a working stable container and once the fixes will be released we will test them via the development container.

@matthewfeickert
Copy link
Member

matthewfeickert commented Feb 27, 2024

There are indeed new updates in coffea, awkward and uproot. However, I know that the uproot fixes have not been released yet. Look at scikit-hep/uproot5#1114.

These are now in uproot v5.3.0.

As the requirements file is using old commit hashes for coffea have there been attempts to run in a more updated environment?

@matthewfeickert
Copy link
Member

@ekourlit @mvigl can you test the environment from PR #34 that is now on the UChicago AF as AB-stable?

@mvigl
Copy link

mvigl commented Mar 26, 2024

I get this message

"Dask Server Error
Failed to list Dask clusters: might the server extension not be installed/enabled?"

and can't import coffea or dask_awkward

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 7
      4 from pathlib import Path
      6 import awkward as ak
----> 7 import dask_awkward as dak
      8 import vector; vector.register_awkward()
      9 import numpy as np

File [/venv/lib/python3.9/site-packages/dask_awkward/__init__.py:1](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask_awkward/__init__.py#line=0)
----> 1 from dask_awkward import config  # isort:skip; load awkward config
      3 import dask_awkward.lib.core as core
      4 import dask_awkward.lib.describe as describe

File [/venv/lib/python3.9/site-packages/dask_awkward/config.py:3](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask_awkward/config.py#line=2)
      1 import os
----> 3 import dask.config
      4 import yaml
      6 config = dask.config.config

File [/venv/lib/python3.9/site-packages/dask/__init__.py:3](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/__init__.py#line=2)
      1 from __future__ import annotations
----> 3 from dask import config, datasets
      4 from dask._version import get_versions
      5 from dask.base import (
      6     annotate,
      7     compute,
   (...)
     12     visualize,
     13 )

File [/venv/lib/python3.9/site-packages/dask/config.py:848](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/config.py#line=847)
    843         _defaults = yaml.safe_load(f)
    845     update_defaults(_defaults)
--> 848 refresh()
    849 _initialize()

File [/venv/lib/python3.9/site-packages/dask/config.py:564](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/config.py#line=563), in refresh(config, defaults, **kwargs)
    561 for d in defaults:
    562     update(config, d, priority="old")
--> 564 update(config, collect(**kwargs))
    565 rename(deprecations, config)

File [/venv/lib/python3.9/site-packages/dask/config.py:525](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/config.py#line=524), in collect(paths, env)
    522 if env is None:
    523     env = os.environ
--> 525 configs = [*collect_yaml(paths=paths), collect_env(env=env)]
    526 return merge(*configs)

File [/venv/lib/python3.9/site-packages/dask/config.py:242](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/config.py#line=241), in collect_yaml(paths, return_paths)
    240 # Parse yaml files
    241 for path in file_paths:
--> 242     config = _load_config_file(path)
    243     if config is not None:
    244         if return_paths:

File [/venv/lib/python3.9/site-packages/dask/config.py:186](https://mavigl-notebook-1.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/config.py#line=185), in _load_config_file(path)
    184     return None
    185 except Exception as exc:
--> 186     raise ValueError(
    187         f"A dask config file at {path!r} is malformed, original error "
    188         f"message:\n\n{exc}"
    189     ) from None
    190 if config is not None and not isinstance(config, dict):
    191     raise ValueError(
    192         f"A dask config file at {path!r} is malformed - config files must have "
    193         f"a dict as the top level object, got a {type(config).__name__} instead"
    194     )

ValueError: A dask config file at '[/etc/dask/dask_config.yaml](https://mavigl-notebook-1.notebook.af.uchicago.edu/etc/dask/dask_config.yaml)' is malformed, original error message:

mapping values are not allowed here
  in "<unicode string>", line 28, column 60:
     ... d.org[/usatlas/analysis-dask-base](https://mavigl-notebook-1.notebook.af.uchicago.edu/usatlas/analysis-dask-base):

@matthewfeickert
Copy link
Member

matthewfeickert commented Mar 26, 2024

This is happening downstream in https://github.com/usatlas/analysisbase-dask-uc as this works fine in hub.opensciencegrid.org/usatlas/analysis-dask-base:main (state of this repo post PR #34)

$ docker run --rm -ti hub.opensciencegrid.org/usatlas/analysis-dask-base:main /bin/bash -c "python -c 'import coffea; print(coffea)'"
Configured GCC from: /opt/lcg/gcc/13.1.0-b3d18/x86_64-el9/bin/gcc
Configured AnalysisBase from: /usr/AnalysisBase/25.2.2/InstallArea/x86_64-el9-gcc13-opt
Configured PyColumnarPrototype from: /usr/tools/PyColumnarPrototypeDemo/1.0.0/InstallArea/x86_64-el9-gcc13-opt
<module 'coffea' from '/venv/lib/python3.9/site-packages/coffea/__init__.py'>

but fails in hub.opensciencegrid.org/usatlas/analysis-dask-uc:main

$ docker run --rm -ti hub.opensciencegrid.org/usatlas/analysis-dask-uc:main /bin/bash -c "python -c 'import coffea; print(coffea)'"
Configured GCC from: /opt/lcg/gcc/13.1.0-b3d18/x86_64-el9/bin/gcc
Configured AnalysisBase from: /usr/AnalysisBase/25.2.2/InstallArea/x86_64-el9-gcc13-opt
Configured PyColumnarPrototype from: /usr/tools/PyColumnarPrototypeDemo/1.0.0/InstallArea/x86_64-el9-gcc13-opt
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/venv/lib/python3.9/site-packages/coffea/__init__.py", line 49, in <module>
    import dask_awkward
  File "/venv/lib/python3.9/site-packages/dask_awkward/__init__.py", line 1, in <module>
    from dask_awkward import config  # isort:skip; load awkward config
  File "/venv/lib/python3.9/site-packages/dask_awkward/config.py", line 3, in <module>
    import dask.config
  File "/venv/lib/python3.9/site-packages/dask/__init__.py", line 3, in <module>
    from dask import config, datasets
  File "/venv/lib/python3.9/site-packages/dask/config.py", line 848, in <module>
    refresh()
  File "/venv/lib/python3.9/site-packages/dask/config.py", line 564, in refresh
    update(config, collect(**kwargs))
  File "/venv/lib/python3.9/site-packages/dask/config.py", line 525, in collect
    configs = [*collect_yaml(paths=paths), collect_env(env=env)]
  File "/venv/lib/python3.9/site-packages/dask/config.py", line 242, in collect_yaml
    config = _load_config_file(path)
  File "/venv/lib/python3.9/site-packages/dask/config.py", line 186, in _load_config_file
    raise ValueError(
ValueError: A dask config file at '/etc/dask/dask_config.yaml' is malformed, original error message:

mapping values are not allowed here
  in "<unicode string>", line 28, column 60:
     ... d.org/usatlas/analysis-dask-base:

My guess is that the sed command in usatlas/analysisbase-dask-uc@fcbecfe was destructive and broke everything.

edit: Yeah, this broke everything because it tried to mix the build arg that can be used for the special FROM command with other typical build arg / environmental variables,but they don't mix and need to be separately defined.

@matthewfeickert
Copy link
Member

@mvigl This is now fixed in usatlas/analysisbase-dask-uc#7

$ docker pull hub.opensciencegrid.org/usatlas/analysis-dask-uc:main && docker run --rm -ti hub.opensciencegrid.org/usatlas/analysis-dask-uc:main /bin/bash -c "python -m pip show coffea && python -c 'import coffea; print(coffea)'"
main: Pulling from usatlas/analysis-dask-uc
Digest: sha256:f5ac11e96d3faf0954e1e13109379d8b677fcffbd86f6e4da94b6d7b4385333e
Status: Image is up to date for hub.opensciencegrid.org/usatlas/analysis-dask-uc:main
hub.opensciencegrid.org/usatlas/analysis-dask-uc:main
Total reclaimed space: 0B
Configured GCC from: /opt/lcg/gcc/13.1.0-b3d18/x86_64-el9/bin/gcc
Configured AnalysisBase from: /usr/AnalysisBase/25.2.2/InstallArea/x86_64-el9-gcc13-opt
Configured PyColumnarPrototype from: /usr/tools/PyColumnarPrototypeDemo/1.0.0/InstallArea/x86_64-el9-gcc13-opt
Name: coffea
Version: 2023.7.0rc1.dev83+g52950d1
Summary: Basic tools and wrappers for enabling not-too-alien syntax when running columnar Collider HEP analysis.
Home-page: 
Author: 
Author-email: Lindsey Gray <[email protected]>, Nick Smith <[email protected]>
License: BSD-3-Clause
Location: /venv/lib/python3.9/site-packages
Requires: awkward, cachetools, cloudpickle, correctionlib, dask, dask-awkward, dask-histogram, fsspec, hist, lz4, matplotlib, mplhep, numba, numpy, packaging, pandas, pyarrow, scipy, toml, tqdm, uproot
Required-by: 
<module 'coffea' from '/venv/lib/python3.9/site-packages/coffea/__init__.py'>

(and on UChicago Jupyter Lab AB-stable tag)

[bash][feickert]:analysisbase-dask > python -m pip show coffea
Name: coffea
Version: 2023.7.0rc1.dev83+g52950d1
Summary: Basic tools and wrappers for enabling not-too-alien syntax when running columnar Collider HEP analysis.
Home-page: 
Author: 
Author-email: Lindsey Gray <[email protected]>, Nick Smith <[email protected]>
License: BSD-3-Clause
Location: /venv/lib/python3.9/site-packages
Requires: awkward, cachetools, cloudpickle, correctionlib, dask, dask-awkward, dask-histogram, fsspec, hist, lz4, matplotlib, mplhep, numba, numpy, packaging, pandas, pyarrow, scipy, toml, tqdm, uproot
Required-by: 
[bash][feickert]:analysisbase-dask > python -c 'import coffea; print(coffea); import dask_awkward; print(dask_awkward)'
<module 'coffea' from '/venv/lib/python3.9/site-packages/coffea/__init__.py'>
<module 'dask_awkward' from '/venv/lib/python3.9/site-packages/dask_awkward/__init__.py'>
[bash][feickert]:analysisbase-dask > 

image

@matthewfeickert matthewfeickert self-assigned this Mar 26, 2024
@mvigl
Copy link

mvigl commented Mar 28, 2024

@ekourlit @matthewfeickert Not sure if it's right to reopen this issue or move the discussion somewhere else, since the stable image is the right one and everything runs fine locally (Thanks Matthew!).

So there are two separate issues preventing us to scale up with dask at the moment, I've added the option to run on multiple files on UChicago to the notebook and some comments + how to reproduce these 2 issues. Here I try to summarise:

  1. First one is probably not important to solve since it's tied to old releases anyway - and does not depend on dask. Even with this stable release it can happen that the clusters are not readable for some files. So if you run on all 50 mc files that are in Chicago this will fail mc_electrons.caloClusters.compute() and produce the error below. So a workaround is to just run on the first (2,5 or 20 ..) files:
Details:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File [/venv/lib/python3.9/site-packages/awkward/_dispatch.py:37](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_dispatch.py#line=36), in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     36 with OperationErrorContext(name, args, kwargs):
---> 37     gen_or_result = func(*args, **kwargs)
     38     if isgenerator(gen_or_result):

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:97](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=96), in from_buffers(form, length, container, buffer_key, backend, byteorder, allow_noncanonical_form, highlevel, behavior)
     35 """
     36 Args:
     37     form (#ak.forms.Form or str[/dict](https://mavigl-notebook-2.notebook.af.uchicago.edu/dict) equivalent): The form of the Awkward
   (...)
     95 See #ak.to_buffers for examples.
     96 """
---> 97 return _impl(
     98     form,
     99     length,
    100     container,
    101     buffer_key,
    102     backend,
    103     byteorder,
    104     highlevel,
    105     behavior,
    106     allow_noncanonical_form,
    107 )

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:141](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=140), in _impl(form, length, container, buffer_key, backend, byteorder, highlevel, behavior, simplify)
    139 getkey = regularize_buffer_key(buffer_key)
--> 141 out = _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
    142 return wrap_layout(out, behavior, highlevel)

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:396](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=395), in _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
    395 elif isinstance(form, ak.forms.RecordForm):
--> 396     contents = [
    397         _reconstitute(
    398             content, length, container, getkey, backend, byteorder, simplify
    399         )
    400         for content in form.contents
    401     ]
    402     return ak.contents.RecordArray(
    403         contents,
    404         None if form.is_tuple else form.fields,
    405         length,
    406         parameters=form._parameters,
    407     )

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:397](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=396), in <listcomp>(.0)
    395 elif isinstance(form, ak.forms.RecordForm):
    396     contents = [
--> 397         _reconstitute(
    398             content, length, container, getkey, backend, byteorder, simplify
    399         )
    400         for content in form.contents
    401     ]
    402     return ak.contents.RecordArray(
    403         contents,
    404         None if form.is_tuple else form.fields,
    405         length,
    406         parameters=form._parameters,
    407     )

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:374](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=373), in _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
    373     next_length = 0 if len(offsets) == 1 else offsets[-1]
--> 374 content = _reconstitute(
    375     form.content, next_length, container, getkey, backend, byteorder, simplify
    376 )
    377 return ak.contents.ListOffsetArray(
    378     ak.index.Index(offsets),
    379     content,
    380     parameters=form._parameters,
    381 )

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:396](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=395), in _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
    395 elif isinstance(form, ak.forms.RecordForm):
--> 396     contents = [
    397         _reconstitute(
    398             content, length, container, getkey, backend, byteorder, simplify
    399         )
    400         for content in form.contents
    401     ]
    402     return ak.contents.RecordArray(
    403         contents,
    404         None if form.is_tuple else form.fields,
    405         length,
    406         parameters=form._parameters,
    407     )

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:397](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=396), in <listcomp>(.0)
    395 elif isinstance(form, ak.forms.RecordForm):
    396     contents = [
--> 397         _reconstitute(
    398             content, length, container, getkey, backend, byteorder, simplify
    399         )
    400         for content in form.contents
    401     ]
    402     return ak.contents.RecordArray(
    403         contents,
    404         None if form.is_tuple else form.fields,
    405         length,
    406         parameters=form._parameters,
    407     )

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py:361](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_buffers.py#line=360), in _reconstitute(form, length, container, getkey, backend, byteorder, simplify)
    360 elif isinstance(form, ak.forms.ListOffsetForm):
--> 361     raw_array = container[getkey(form, "offsets")]
    362     offsets = _from_buffer(
    363         backend.index_nplike,
    364         raw_array,
   (...)
    367         byteorder=byteorder,
    368     )

File [/venv/lib/python3.9/site-packages/coffea/nanoevents/mapping/base.py:98](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/coffea/nanoevents/mapping/base.py#line=97), in BaseSourceMapping.__getitem__(self, key)
     94     handle = self.get_column_handle(
     95         self._column_source(uuid, treepath), handle_name
     96     )
     97     stack.append(
---> 98         self.extract_column(
     99             handle, start, stop, use_ak_forth=self._use_ak_forth
    100         )
    101     )
    102 elif node.startswith("!"):

File [/venv/lib/python3.9/site-packages/coffea/nanoevents/mapping/uproot.py:161](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/coffea/nanoevents/mapping/uproot.py#line=160), in UprootSourceMapping.extract_column(self, columnhandle, start, stop, use_ak_forth)
    160 interp._forth = use_ak_forth
--> 161 return columnhandle.array(
    162     interp,
    163     entry_start=start,
    164     entry_stop=stop,
    165     decompression_executor=uproot.source.futures.TrivialExecutor(),
    166     interpretation_executor=uproot.source.futures.TrivialExecutor(),
    167 )

File [/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py:1819](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py#line=1818), in TBranch.array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library, ak_add_doc)
   1818 interp_options = {"ak_add_doc": ak_add_doc}
-> 1819 _ranges_or_baskets_to_arrays(
   1820     self,
   1821     ranges_or_baskets,
   1822     branchid_interpretation,
   1823     entry_start,
   1824     entry_stop,
   1825     decompression_executor,
   1826     interpretation_executor,
   1827     library,
   1828     arrays,
   1829     False,
   1830     interp_options,
   1831 )
   1833 _fix_asgrouped(
   1834     arrays,
   1835     expression_context,
   (...)
   1839     ak_add_doc,
   1840 )

File [/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py:3147](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py#line=3146), in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets, interp_options)
   3146 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3147     uproot.source.futures.delayed_raise(*obj)
   3149 else:

File [/venv/lib/python3.9/site-packages/uproot/source/futures.py:36](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/uproot/source/futures.py#line=35), in delayed_raise(exception_class, exception_value, traceback)
     33 """
     34 Raise an exception from a background thread on the main thread.
     35 """
---> 36 raise exception_value.with_traceback(traceback)

File [/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py:3116](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py#line=3115), in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
   3115 if len(basket_arrays) == branchid_num_baskets[branch.cache_key]:
-> 3116     arrays[branch.cache_key] = interpretation.final_array(
   3117         basket_arrays,
   3118         entry_start,
   3119         entry_stop,
   3120         branch.entry_offsets,
   3121         library,
   3122         branch,
   3123         interp_options,
   3124     )
   3125     # no longer needed, save memory

File [/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py:424](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py#line=423), in AsObjects.final_array(self, basket_arrays, entry_start, entry_stop, entry_offsets, library, branch, options)
    423 else:
--> 424     output = numpy.concatenate(trimmed)
    426 self.hook_before_library_finalize(
    427     basket_arrays=basket_arrays,
    428     entry_start=entry_start,
   (...)
    433     output=output,
    434 )

File <__array_function__ internals>:200, in concatenate(*args, **kwargs)

File [/venv/lib/python3.9/site-packages/awkward/highlevel.py:1445](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/highlevel.py#line=1444), in Array.__array_function__(self, func, types, args, kwargs)
   1432 """
   1433 Intercepts attempts to pass this Array to those NumPy functions other
   1434 than universal functions that have an Awkward equivalent.
   (...)
   1443 See also #__array_ufunc__.
   1444 """
-> 1445 return ak._connect.numpy.array_function(
   1446     func, types, args, kwargs, behavior=self._behavior
   1447 )

File [/venv/lib/python3.9/site-packages/awkward/_connect/numpy.py:87](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_connect/numpy.py#line=86), in array_function(func, types, args, kwargs, behavior)
     86 if function is not None:
---> 87     return function(*args, **kwargs)
     88 # Use NumPy's implementation
     89 else:

File [/venv/lib/python3.9/site-packages/awkward/_connect/numpy.py:120](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_connect/numpy.py#line=119), in implements.<locals>.decorator.<locals>.ensure_valid_args(*args, **kwargs)
    117     raise TypeError(
    118         f"Awkward NEP-18 overload was provided with unsupported argument(s): {names}"
    119     )
--> 120 return function(*args, **kwargs)

File [/venv/lib/python3.9/site-packages/awkward/_dispatch.py:60](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_dispatch.py#line=59), in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     59 try:
---> 60     next(gen_or_result)
     61 except StopIteration as err:

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py:55](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py#line=54), in concatenate(arrays, axis, mergebool, highlevel, behavior)
     54 # Implementation
---> 55 return _impl(arrays, axis, mergebool, highlevel, behavior)

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py:76](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py#line=75), in _impl(arrays, axis, mergebool, highlevel, behavior)
     75 backend = backend_of(*arrays, default=cpu, coerce_to_common=True)
---> 76 content_or_others = [
     77     x.to_backend(backend) if isinstance(x, ak.contents.Content) else x
     78     for x in (
     79         ak.operations.to_layout(
     80             x, allow_record=False if axis == 0 else True, allow_other=True
     81         )
     82         for x in arrays
     83     )
     84 ]
     86 contents = [x for x in content_or_others if isinstance(x, ak.contents.Content)]

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py:76](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py#line=75), in <listcomp>(.0)
     75 backend = backend_of(*arrays, default=cpu, coerce_to_common=True)
---> 76 content_or_others = [
     77     x.to_backend(backend) if isinstance(x, ak.contents.Content) else x
     78     for x in (
     79         ak.operations.to_layout(
     80             x, allow_record=False if axis == 0 else True, allow_other=True
     81         )
     82         for x in arrays
     83     )
     84 ]
     86 contents = [x for x in content_or_others if isinstance(x, ak.contents.Content)]

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py:79](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_concatenate.py#line=78), in <genexpr>(.0)
     75 backend = backend_of(*arrays, default=cpu, coerce_to_common=True)
     76 content_or_others = [
     77     x.to_backend(backend) if isinstance(x, ak.contents.Content) else x
     78     for x in (
---> 79         ak.operations.to_layout(
     80             x, allow_record=False if axis == 0 else True, allow_other=True
     81         )
     82         for x in arrays
     83     )
     84 ]
     86 contents = [x for x in content_or_others if isinstance(x, ak.contents.Content)]

File [/venv/lib/python3.9/site-packages/awkward/_dispatch.py:60](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_dispatch.py#line=59), in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     59 try:
---> 60     next(gen_or_result)
     61 except StopIteration as err:

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_to_layout.py:48](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_to_layout.py#line=47), in to_layout(array, allow_record, allow_other, regulararray)
     47 # Implementation
---> 48 return _impl(array, allow_record, allow_other, regulararray=regulararray)

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_to_layout.py:77](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_to_layout.py#line=76), in _impl(array, allow_record, allow_other, regulararray)
     76 elif numpy.is_own_array(array):
---> 77     return ak.operations.from_numpy(
     78         array, regulararray=regulararray, recordarray=True, highlevel=False
     79     )
     81 elif Cupy.is_own_array(array):

File [/venv/lib/python3.9/site-packages/awkward/_dispatch.py:37](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_dispatch.py#line=36), in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     36 with OperationErrorContext(name, args, kwargs):
---> 37     gen_or_result = func(*args, **kwargs)
     38     if isgenerator(gen_or_result):

File [/venv/lib/python3.9/site-packages/awkward/operations/ak_from_numpy.py:43](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/operations/ak_from_numpy.py#line=42), in from_numpy(array, regulararray, recordarray, highlevel, behavior)
     11 """
     12 Args:
     13     array (np.ndarray): The NumPy array to convert into an Awkward Array.
   (...)
     40 See also #ak.to_numpy and #ak.from_cupy.
     41 """
     42 return wrap_layout(
---> 43     from_arraylib(array, regulararray, recordarray),
     44     highlevel=highlevel,
     45     behavior=behavior,
     46 )

File [/venv/lib/python3.9/site-packages/awkward/_layout.py:132](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_layout.py#line=131), in from_arraylib(array, regulararray, recordarray)
    131 if array.dtype == np.dtype("O"):
--> 132     raise TypeError("Awkward Array does not support arrays with object dtypes.")
    134 if isinstance(array, numpy.ma.MaskedArray):

TypeError: Awkward Array does not support arrays with object dtypes.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 mc_electrons.caloClusters.compute()

File [/venv/lib/python3.9/site-packages/dask/base.py:375](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/base.py#line=374), in DaskMethodsMixin.compute(self, **kwargs)
    351 def compute(self, **kwargs):
    352     """Compute this dask collection
    353 
    354     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    373     dask.compute
    374     """
--> 375     (result,) = compute(self, traverse=False, **kwargs)
    376     return result

File [/venv/lib/python3.9/site-packages/dask/base.py:661](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/base.py#line=660), in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    658     postcomputes.append(x.__dask_postcompute__())
    660 with shorten_traceback():
--> 661     results = schedule(dsk, keys, **kwargs)
    663 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File [/venv/lib/python3.9/site-packages/uproot/_dask.py:912](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/uproot/_dask.py#line=911), in _UprootOpenAndRead.__call__(self, file_path_object_path_istep_nsteps_ischunk)
    906         actual_form = self.rendered_form
    908     mapping, buffer_key = self.form_mapping.create_column_mapping_and_key(
    909         ttree, start, stop, self.interp_options
    910     )
--> 912     layout = awkward.from_buffers(
    913         actual_form,
    914         stop - start,
    915         mapping,
    916         buffer_key=buffer_key,
    917         highlevel=False,
    918     )
    919     return awkward.Array(
    920         dask_awkward.lib.unproject_layout.unproject_layout(
    921             self.rendered_form,
   (...)
    924         behavior=self.form_mapping.behavior,
    925     )
    927 array = ttree.arrays(
    928     self.common_keys,
    929     entry_start=start,
    930     entry_stop=stop,
    931     ak_add_doc=self.interp_options["ak_add_doc"],
    932 )

File [/venv/lib/python3.9/site-packages/awkward/_dispatch.py:68](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_dispatch.py#line=67), in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     63     else:
     64         raise AssertionError(
     65             "high-level functions should only implement a single yield statement"
     66         )
---> 68 return gen_or_result

File [/venv/lib/python3.9/site-packages/awkward/_errors.py:67](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_errors.py#line=66), in ErrorContext.__exit__(self, exception_type, exception_value, traceback)
     60 try:
     61     # Handle caught exception
     62     if (
     63         exception_type is not None
     64         and issubclass(exception_type, Exception)
     65         and self.primary() is self
     66     ):
---> 67         self.handle_exception(exception_type, exception_value)
     68 finally:
     69     # `_kwargs` may hold cyclic references, that we really want to avoid
     70     # as this can lead to large buffers remaining in memory for longer than absolutely necessary
     71     # Let's just clear this, now.
     72     self._kwargs.clear()

File [/venv/lib/python3.9/site-packages/awkward/_errors.py:82](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/awkward/_errors.py#line=81), in ErrorContext.handle_exception(self, cls, exception)
     80     self.decorate_exception(cls, exception)
     81 else:
---> 82     raise self.decorate_exception(cls, exception)

TypeError: Awkward Array does not support arrays with object dtypes.

This error occurred while calling

    ak.from_buffers(
        RecordForm-instance
        100000
        UprootSourceMapping-instance
        buffer_key = partial-instance
        highlevel = False
    )
  1. Second issue is "dask-dependent" and if you try to scale the notebook this error will appear when calling the tool sfreco = efficiencyCorrectionTool_reco(mc_electrons).compute(). This is also tied to old releases so stuff will also change when we have the "Athena version" of the tools, but solving this could allow to do some benchmarks now. Still I don't have a good intuition on how to fix this based on the error message.
Details:
2024-03-28 21:27:46,910 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f4395b17b20>
 0. compute-allow-typetracer-49fca2e327678441852f28b95364cf6a
>.
Traceback (most recent call last):
  File "[/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 63](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py#line=62), in dumps
    result = pickle.dumps(x, **dump_kwargs)
AttributeError: Can't pickle local object 'unpack_collections.<locals>.repack'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 68](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py#line=67), in dumps
    pickler.dump(x)
AttributeError: Can't pickle local object 'unpack_collections.<locals>.repack'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 81](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py#line=80), in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "[/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py", line 1479](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py#line=1478), in dumps
    cp.dump(obj)
  File "[/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py", line 1245](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py#line=1244), in dump
    return super().dump(obj)
TypeError: cannot pickle 'ToolHolder_ElectronEfficiencyCorrection' object
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File [/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py:63](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py#line=62), in dumps(x, buffer_callback, protocol)
     62 try:
---> 63     result = pickle.dumps(x, **dump_kwargs)
     64 except Exception:

AttributeError: Can't pickle local object 'unpack_collections.<locals>.repack'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File [/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py:68](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py#line=67), in dumps(x, buffer_callback, protocol)
     67 buffers.clear()
---> 68 pickler.dump(x)
     69 result = f.getvalue()

AttributeError: Can't pickle local object 'unpack_collections.<locals>.repack'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
File [/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py:353](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py#line=352), in serialize(x, serializers, on_error, context, iterate_collection)
    352 try:
--> 353     header, frames = dumps(x, context=context) if wants_context else dumps(x)
    354     header["serializer"] = name

File [/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py:76](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py#line=75), in pickle_dumps(x, context)
     74     writeable.append(not f.readonly)
---> 76 frames[0] = pickle.dumps(
     77     x,
     78     buffer_callback=buffer_callback,
     79     protocol=context.get("pickle-protocol", None) if context else None,
     80 )
     81 header = {
     82     "serializer": "pickle",
     83     "writeable": tuple(writeable),
     84 }

File [/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py:81](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/pickle.py#line=80), in dumps(x, buffer_callback, protocol)
     80     buffers.clear()
---> 81     result = cloudpickle.dumps(x, **dump_kwargs)
     82 except Exception:

File [/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py:1479](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py#line=1478), in dumps(obj, protocol, buffer_callback)
   1478 cp = Pickler(file, protocol=protocol, buffer_callback=buffer_callback)
-> 1479 cp.dump(obj)
   1480 return file.getvalue()

File [/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py:1245](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/cloudpickle/cloudpickle.py#line=1244), in Pickler.dump(self, obj)
   1244 try:
-> 1245     return super().dump(obj)
   1246 except RuntimeError as e:

TypeError: cannot pickle 'ToolHolder_ElectronEfficiencyCorrection' object

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 sfreco = efficiencyCorrectionTool_reco(mc_electrons).compute()
      2 sfid = efficiencyCorrectionTool_id(mc_electrons).compute()
      3 sfiso = efficiencyCorrectionTool_iso(mc_electrons).compute()

File [/venv/lib/python3.9/site-packages/dask/base.py:375](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/base.py#line=374), in DaskMethodsMixin.compute(self, **kwargs)
    351 def compute(self, **kwargs):
    352     """Compute this dask collection
    353 
    354     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    373     dask.compute
    374     """
--> 375     (result,) = compute(self, traverse=False, **kwargs)
    376     return result

File [/venv/lib/python3.9/site-packages/dask/base.py:661](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/dask/base.py#line=660), in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    658     postcomputes.append(x.__dask_postcompute__())
    660 with shorten_traceback():
--> 661     results = schedule(dsk, keys, **kwargs)
    663 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File [/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py:379](https://mavigl-notebook-2.notebook.af.uchicago.edu/venv/lib/python3.9/site-packages/distributed/protocol/serialize.py#line=378), in serialize(x, serializers, on_error, context, iterate_collection)
    377     except Exception:
    378         raise TypeError(msg) from exc
--> 379     raise TypeError(msg, str_x) from exc
    380 else:  # pragma: nocover
    381     raise ValueError(f"{on_error=}; expected 'message' or 'raise'")

TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4395b17b20>\n 0. compute-allow-typetracer-49fca2e327678441852f28b95364cf6a\n>') 

@matthewfeickert
Copy link
Member

  1. Second issue is "dask-dependent" and if you try to scale the notebook this error will appear when calling the tool sfreco = efficiencyCorrectionTool_reco(mc_electrons).compute(). This is also tied to old releases so stuff will also change when we have the "Athena version" of the tools, but solving this could allow to do some benchmarks now. Still I don't have a good intuition on how to fix this based on the error message.

@mvigl Have you attempted to update your code at all to use the software in the (current!) AB-dev environment? If not, I think that's probably a better move forward (compared to trying to fix the AB-stable Dask version) as the current AB-stable environment is not going to be used for any analysis.

(Just make sure that you tag your repository for where AB-stable works and then use a new branch for developing to AB-dev)

@ekourlit
Copy link
Author

@matthewfeickert we should definitely need to move to the AB-dev environment but isn't the atlas-asg/columnar-analysis-benchmarks/#1 which holds it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants