Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap apfel with eko #1490

Closed
2 of 4 tasks
scarlehoff opened this issue Dec 16, 2021 · 20 comments
Closed
2 of 4 tasks

Swap apfel with eko #1490

scarlehoff opened this issue Dec 16, 2021 · 20 comments
Assignees

Comments

@scarlehoff
Copy link
Member

scarlehoff commented Dec 16, 2021

Currently the fit workflow requires running apfel in order to evolve the PDF from the initial scale to create the LHAPDF grid.
In principle eko should already be able to do that, at least for the standard settings.

Doing this within the framework require a few steps though:

  • Add eko as conda package so that nnpdf can depend on it.
  • Create a new eko_operator object which is the output of eko and will be stored in the server so that once we have computed one it doesn't need to be computed again.
  • Add an action that can read an n3fit PDF at the initial scale, apply the eko_operator to it and write down a PDF at any scale.
  • Add a script to do both previous steps automatically in the same fashion as evolven3fit

I will start on this after the LHAPDFSet thing is done in pure python.

Mentioning you two so you can keep track of the progress on this
@alecandido @felixhekhorn @giacomomagni @andreab1997

@siranipour
Copy link
Contributor

siranipour commented Dec 16, 2021

It'd be nice if, while we're at it, we could make the evolution happen automatically (or with an optional flag) after the fit is done.

Currently when I submit a fit to the cluster I always need to remember to submit a job to evolve the fit too.This can often take longer than if it happened for each replica automatically, because I forget that I have to do this (or that I actually have to be in front of my laptop to physically submit the job).

I always liked the "fire and forget" strategy nnfit had. I could submit a fit in the evening and have it fully ready for me by the morning.

@scarlehoff
Copy link
Member Author

It'd be nice if, while we're at it, we could make the evolution happen automatically (or with an optional flag) after the fit is done.

It might make more sense to make it at the level of the postfit since it is part of the operation that creates the LHAPDF file. That way it can still be done to all PDFs at once.

@siranipour
Copy link
Contributor

But postfit is an operation that takes <1min and can be done on my laptop. The evolution is another hour of compute time and I usually submit it to the cluster.

@Zaharid
Copy link
Contributor

Zaharid commented Dec 16, 2021

It seems that the benefit of doing it at once would be small if the operator is stored. So I’d say it should be done in the fit job again.

@scarlehoff
Copy link
Member Author

But postfit is an operation that takes <1min

The application of the eko as well.

Anyway, when we get to the river we cross that bridge depending on how long it actually takes.

So I’d say it should be done in the fit job again.

Only if you still don't have it suddently you have 100 jobs running the same thing and only one is useful.

@Zaharid
Copy link
Contributor

Zaharid commented Dec 16, 2021

What would be the same thing? Afaict it would be applying once stored operator to different replicas. So we would only be missing on cache performance.

@alecandido
Copy link
Member

[ ] Add eko as conda package so that nnpdf can depend on it.

@scarlehoff is this actually relevant? Can't conda packages depend on PyPI ones?

In case the answer is "it has to be conda", then I'd suggest that someone with more conda experience can help here...

@scarlehoff
Copy link
Member Author

scarlehoff commented Mar 9, 2022

I'll deal with this in due time I guess. It is a problem as well (for a given value of problem) for pineappl in the other PR since pineappl depends on eko (and there's no conda for the latest pineappl? or maybe that was fixed)

On second thought, if there's no need for apfel and c++ there's no need for conda

@alecandido
Copy link
Member

On second thought, if there's no need for apfel and c++ there's no need for conda

That sounds great 🎉.

There will be still compiled pineappl, that is a problem as much as numpy is a problem (and actually somehow it is a problem for the M1 thingies, but I hope the --universal wheels for Mac at some point will start to work without caveats...).

@alecandido
Copy link
Member

@scarlehoff is this actually relevant? Can't conda packages depend on PyPI ones?

Just for my personal record: the answer to this question seems to be no, conda packages can only depend on other conda packages...
conda/conda-build#548

@Zaharid
Copy link
Contributor

Zaharid commented Mar 9, 2022

For now I'd strongly prefer having and eko conda package (which is really easy if it doesn't use any compiled code), as we have a fair amount of infrastructure depending on it (CI, server package repo, documentation, data file paths, LHAPDF, docker, frozen environments), that I don't think we need to be touching for the moment concurrently with everything else. After we have had the code with no c++ required working for a while, we could start thinking on alternatives for all of the above, and only then see if it makes sense for us to use wheels.

@alecandido
Copy link
Member

Ok, fine, but I would scope this to the fit only, and keep instead theory generation wheel based, since it's actually already working that way (and without C++ that's the trend in any case, I would say).

Concerning conda fit ecosystem, you'll need to:

  • conda package eko
  • fix the package for pineappl, that is currently broken (take into account that pineappl_py has to be compiled from Rust)

Instead, concerning wheels:

  • I already made a proposal for an LHAPDF wheel, but this will take time, of course
  • our wheels are all generated in CD, even compiled once, so the effort there has already been put

To make it explicit: I'm not advocating for stepping immediately into wheels (that I'm not against as well), for me having a transition period is also fine.

@Zaharid
Copy link
Contributor

Zaharid commented Mar 9, 2022

fix the package for pineappl, that is currently broken (take into account that pineappl_py has to be compiled from Rust)

Indeed this need to be in working order in the immediate future if we are going to depend on pineappl.

@felixhekhorn
Copy link
Contributor

fix the package for pineappl, that is currently broken (take into account that pineappl_py has to be compiled from Rust)

Indeed this need to be in working order in the immediate future if we are going to depend on pineappl.

if I remember correctly @scarrazza volunteered for this at some point 🙃 (since he also did this in the past https://github.com/conda-forge/pineappl-feedstock )

@scarrazza
Copy link
Member

@Zaharid
Copy link
Contributor

Zaharid commented Mar 10, 2022

Here some similar package that might be useful to look at https://github.com/conda-forge/datafusion-feedstock/blob/master/recipe/meta.yaml

@alecandido
Copy link
Member

alecandido commented Mar 10, 2022

I would remove --pyargs pineappl from https://github.com/regro-cf-autotick-bot/pineappl-feedstock/blob/ad529bdb0be666c97f31b75c40a1618cc2c1282b/recipe/meta.yaml#L37 since at the moment is not able to find tests, but without it is.

@alecandido
Copy link
Member

P.S.: take into account that v0.5.1 is a broken release for the python package, even if you're not yet hitting the bug (so I'd rather try with v0.5.0 or wait for v0.5.2)

@Zaharid
Copy link
Contributor

Zaharid commented Mar 10, 2022

As an aside (to the aside) it would probably make sense if these things followed the same philosophy as nnpdf whereby each commit to master makes a new version.

@alecandido
Copy link
Member

Effectively, this is what is happening in eko and yadism, since we are using a branching style with two parallel long-standing branches.

But this is a bit heavy, and there is actually one branch (master) fully dedicated to tags, that is a bit of nonsense, since tags exist as a concept, so I'm not any longer sure that has been the best choice ever. And in any case, it's not what you're doing for nnpdf.

But since tags are tags, I don't think that there is much need for syncing master and tags. A tag is coming with extra guarantees, since we are taking the time to properly review it (unless when we are failing, as I did...).
All in all, if someone uses master instead of a tag, it's on his own responsibility (and pain, since packages on registries only exist for tags).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants