unpin ruamel.yaml version #2226

RoyStegeman · 2024-11-25T14:31:25Z

Along with NNPDF/reportengine#67 this PR addresses the dependence on the ruamel.yaml version which is pinned to <0.18 due to changes in the API.

Hopefully this also allows us to resolve the problem with vp-nextfitruncard removing newlines and comments from the runcard 🤞

scarlehoff

Many thanks for this, you are doing god's work here.

One thing, wouldn't it make sense to have the import ruamel.yaml ; yaml = YAML() in one single place (nnpdf_data for instance) and import it from there?

Would make your life easier I think debugging and ensuring that everything is doing the same thing.

scarlehoff · 2024-11-25T19:20:33Z

nnpdf_data/nnpdf_data/utils.py

-    Loader = yaml.CLoader
-except AttributeError:
-    Loader = yaml.Loader
+yaml = YAML(typ='safe')


Using libyaml (CLoader) when available makes a huge difference in time when loading large (or many) files.

CLoader is PyYaml, do we want to keep using both yaml packages?

On that note: all filter.py use pyyaml - do we want to change that or is it fine?

I could not find an explicit mention in the docs but it might be that as long as you use safe (and pure is not True) it will use CLoader under the hood https://sourceforge.net/p/ruamel-yaml/tickets/159/#8f94

but the comment is from 2017... we should check. The difference in performance was enough to think about dropping ruamel if we can no longer use CLoader

RoyStegeman · 2024-11-25T19:29:41Z

One thing, wouldn't it make sense to have the import ruamel.yaml ; yaml = YAML() in one single place (nnpdf_data for instance) and import it from there?

I'm still trying to understand some of the errors, it seems to depend on what settings are used during import (which is a bit concerning that it actually affects the way the yaml is parsed rather than just the formatting upon dumping). If this is the case then we may need a different YAML per module, though in general you're right and I'll probably clean it up once I understand the issues.

If not then you may be right

scarlehoff · 2024-11-25T20:45:21Z

The documentation of ruamel is worse than our own 🫠 but from https://yaml.readthedocs.io/en/latest/basicuse/#more-examples I think you would need at least two yaml functions in utils.

a "fast" where pure=False and typ=safe for the commondata, see also here to check all the combinations you can do, the goal is to arrive at cyaml/cparser 😆 https://sourceforge.net/p/ruamel-yaml/code/ci/default/tree/main.py#l105
A slow typ=rt (which is the default) for all the fancy stuff.

The fast loader 1) would support only yaml 1.1 while the other is also 1.2 (or safe + pure, also 1.2), which explains probably the differences you see.

scarlehoff · 2024-11-26T10:32:40Z

validphys2/src/validphys/commondataparser.py

-def _quick_yaml_load(filepath):
-    return yaml.load(filepath.read_text(encoding="utf-8"), Loader=Loader)
-
+yaml = YAML(typ='rt')


Suggested change

yaml = YAML(typ='rt')

yaml = YAML(typ='safe')

Just benchmarked it, with this change is (almost) as fast as master. Not as fast though, it takes 1.3x more.
With rt it takes 10x more.

As I said, I think the best thing at this point is to have in utils safe_load and rt_load or whatever with a comment explaining why. That will make it easier if we need to change it again.

RoyStegeman · 2024-12-03T18:37:56Z

I've added a single initialization invp.utils for the different options we may want to use in vp and n3fit. In nnpdf_data mostly pyyaml was used and ruamel only in some places where formatting doesn't matter, so I made nnpdf_data pyyaml-only.

The pyproject file still points to the corresponding branch in reportengine so that should be changed if we decide to merge both.

scarlehoff · 2024-12-03T20:16:18Z

commondataparser will be moved to nnpdf_data, can it use pyyaml as well?

scarlehoff · 2024-12-03T20:18:26Z

nnpdf_data/nnpdf_data/utils.py

    try:
        return parse_input(inp, spec)
    except ValidationError as e:
        current_exc = e
        # In order to provide a more complete error information, use round_trip_load
        # to read the .yaml file again (insetad of using the CLoader)
-        current_inp = yaml.round_trip_load(input_yaml.open("r", encoding="utf-8"))
+        current_inp = yaml.load(input_yaml.open("r", encoding="utf-8"), yaml.Loader)


Will this work? The extra information here needs the rt loader, otherwise you could use the load from before.

Also, is yaml.Loader what in ruamel is CLoader? Otherwise it will be slow.

Turns out ruamel is built on top of pyyaml, so CLoader is the same thing in both. I took the passing ci as indication that this was fine, but I'll revert it

Then you need to use CLoader for the other one to speed things up (and you need the try-except since libyaml might not be installed everywhere).
I think all in all it is better to use ruamel also here, pyyaml doesn't really simplify things.

I took the passing ci as indication that this was fine, but I'll revert it

We would need a test that not only checks for errors but looks also at the message... and in this case is a ValidationError from reportengine. I'll put it in my "probably never happening but who knows" to do list

RoyStegeman · 2024-12-04T11:58:19Z

I wasn't too sure from the ruamel code, but it seems that ruamel doesn't use CLoader. I don't believe there is any key that allows to set that. These are the timings:

yaml_rt: 0.002154 seconds per load
yaml_fast: 0.000500 seconds per load
yaml_safe: 0.000504 seconds per load
CLoader: 0.000153 seconds per load

Here is the script for reference:

import pathlib
from timeit import timeit

from ruamel.yaml import YAML
import yaml

import nnpdf_data

yaml_rt = YAML(typ="rt")
yaml_fast = YAML(typ="safe", pure=False)
yaml_safe = YAML(typ="safe")

file_path = nnpdf_data.theory_cards / "410.yaml"


def load_with_yaml_rt():
    input_yaml = pathlib.Path(file_path)
    return yaml_rt.load(input_yaml.read_text(encoding="utf-8"))


def load_with_yaml_fast():
    input_yaml = pathlib.Path(file_path)
    return yaml_fast.load(input_yaml.read_text(encoding="utf-8"))


def load_with_CLoader():
    input_yaml = pathlib.Path(file_path)
    return yaml.load(input_yaml.read_text(encoding="utf-8"), Loader=yaml.CLoader)


def load_with_yaml_safe():
    input_yaml = pathlib.Path(file_path)
    return yaml_fast.load(input_yaml.read_text(encoding="utf-8"))

num_iterations = 1000
time_yaml_rt = timeit(load_with_yaml_rt, number=num_iterations)
time_yaml_fast = timeit(load_with_yaml_fast, number=num_iterations)
time_yaml_safe = timeit(load_with_yaml_safe, number=num_iterations)
time_yaml_CLoader = timeit(load_with_CLoader, number=num_iterations)

print(f"yaml_rt: {time_yaml_rt / num_iterations:.6f} seconds per load")
print(f"yaml_fast: {time_yaml_fast / num_iterations:.6f} seconds per load")
print(f"yaml_safe: {time_yaml_safe / num_iterations:.6f} seconds per load")
print(f"CLoader: {time_yaml_CLoader / num_iterations:.6f} seconds per load")

scarlehoff reviewed Nov 25, 2024

View reviewed changes

scarlehoff reviewed Nov 26, 2024

View reviewed changes

RoyStegeman force-pushed the update_ruamel branch from fad96a6 to 39741da Compare December 3, 2024 16:36

RoyStegeman requested a review from scarlehoff December 3, 2024 18:45

scarlehoff reviewed Dec 3, 2024

View reviewed changes

RoyStegeman added 20 commits December 4, 2024 14:38

update for ruamel version, some issues still to be fixed

0dffb82

remove ruamel.yaml pin

03b942d

uncomment load weights in quickard_qed

57860e0

move error to be raised one step later..

3236706

remove useless pass statement

2dc8c41

pass stream to yaml.dump, not a path as string

2aa8aa8

correctly yaml.dump dict to file

0a840e3

fix(?) mc2hessian test

2e68972

pass default_flow_style at header level isntead of function

e897201

prevent file from closing while writing to buffer

4263338

fix vp-nextfitruncard

c7d9c72

clean up yaml imports

f06d78a

replace ruamel with pyyaml in nnpdf_data

310d9d0

use yaml_rt for .info file

abfc5eb

use yaml_rt to deal with error messages, otherwise yaml_fast

3a09960

update inline comment

7549143

make test pass

0e0741a

use CLoader in nnpdf_data utils

0622c49

don't say yaml_fast uses CLoader

418ecf2

point reportengine to master again

2e4096d

RoyStegeman force-pushed the update_ruamel branch from 91c2cde to 2e4096d Compare December 4, 2024 14:38

scarlehoff approved these changes Dec 4, 2024

View reviewed changes

scarlehoff mentioned this pull request Dec 4, 2024

Implementation of ATLAS_WCHARM_7TEV in the new format #2236

Merged

RoyStegeman merged commit 08e9384 into master Dec 4, 2024
6 checks passed

RoyStegeman deleted the update_ruamel branch December 4, 2024 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unpin ruamel.yaml version #2226

unpin ruamel.yaml version #2226

RoyStegeman commented Nov 25, 2024

scarlehoff left a comment

scarlehoff Nov 25, 2024

RoyStegeman Nov 25, 2024 •

edited

Loading

scarlehoff Nov 25, 2024 •

edited

Loading

RoyStegeman commented Nov 25, 2024

scarlehoff commented Nov 25, 2024 •

edited

Loading

scarlehoff Nov 26, 2024

RoyStegeman commented Dec 3, 2024 •

edited

Loading

scarlehoff commented Dec 3, 2024

scarlehoff Dec 3, 2024 •

edited

Loading

RoyStegeman Dec 3, 2024

scarlehoff Dec 4, 2024

RoyStegeman commented Dec 4, 2024 •

edited

Loading

unpin ruamel.yaml version #2226

unpin ruamel.yaml version #2226

Conversation

RoyStegeman commented Nov 25, 2024

scarlehoff left a comment

Choose a reason for hiding this comment

scarlehoff Nov 25, 2024

Choose a reason for hiding this comment

RoyStegeman Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

scarlehoff Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

RoyStegeman commented Nov 25, 2024

scarlehoff commented Nov 25, 2024 • edited Loading

scarlehoff Nov 26, 2024

Choose a reason for hiding this comment

RoyStegeman commented Dec 3, 2024 • edited Loading

scarlehoff commented Dec 3, 2024

scarlehoff Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

RoyStegeman Dec 3, 2024

Choose a reason for hiding this comment

scarlehoff Dec 4, 2024

Choose a reason for hiding this comment

RoyStegeman commented Dec 4, 2024 • edited Loading

RoyStegeman Nov 25, 2024 •

edited

Loading

scarlehoff Nov 25, 2024 •

edited

Loading

scarlehoff commented Nov 25, 2024 •

edited

Loading

RoyStegeman commented Dec 3, 2024 •

edited

Loading

scarlehoff Dec 3, 2024 •

edited

Loading

RoyStegeman commented Dec 4, 2024 •

edited

Loading