Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of CMS_WCHARM_13TEV_WPWM-TOT-UNNORM #2244

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

achiefa
Copy link
Contributor

@achiefa achiefa commented Dec 9, 2024

This PR implements CMS_WCHARM_13TEV_WPWM-TOT-UNNORM in the new format.

General comments

This dataset delivers the differential distribution in function of the absolute rapidity of the lepton pair. Each data point is accompanied by a (symmetric) statistical uncertainty and a (asymmetric) systematical uncertainty. The latter is the sum in quadrature of the different sources of uncertainty. The breakdown of these systematic sources is not delivered in the HepData format, but it is given in Table 1 of the paper.

The legacy version has the variant sys_10, which should not be implemented because it was meant to account for the 3pt prescription.

$(x,Q^2)$-map and data-theory comparison

Legacy: [default],
New: [default w/ shifts], [default w/o shifts]

@achiefa achiefa requested a review from RoyStegeman December 9, 2024 17:06
@achiefa achiefa marked this pull request as draft December 9, 2024 17:06
@achiefa achiefa mentioned this pull request Dec 9, 2024
5 tasks
@achiefa achiefa self-assigned this Dec 9, 2024
@achiefa
Copy link
Contributor Author

achiefa commented Dec 10, 2024

Hi @RoyStegeman , I think I'm done here. Please, see the following comments:

  • In this dataset, uncertainties must be symmetrised. This means that the central data must be shifted accordingly. However, I suspect that the old implementation didn't have the data shifted. I produced two comparisons (see description), w/ and w/o shift, respectively. The one that does not implement the shift gives the same chi2 as the legacy version. You can also check by naked eye that the non-shifted data are closer to the legacy data.
  • I also believe that the old implementation rounded some data, and indeed the central data in HepData do not match with the legacy implementation.
  • I can't reproduce the legacy covmat, but it might be that the cause is one or both of the differences above. Below, you can find the two matrices.
    output

Honestly, I can't judge whether these differences are relevant or not. The difference in chi2 is not negligible if one accounts for the shifts. On the other hand, the difference in the t0 matrices does not really worry me as I was able to reproduce the chi2 of the legacy implementation provided shifts were removed. @RoyStegeman, what do you think? Maybe it is worth asking @enocera.

@achiefa achiefa marked this pull request as ready for review December 10, 2024 11:43
@RoyStegeman
Copy link
Member

Do you know why the fktables of this dataset only exist in theories 704 (0.5,0.5) and 705 (0.5,1)?

@achiefa
Copy link
Contributor Author

achiefa commented Dec 10, 2024

Do you know why the fktables of this dataset only exist in theories 704 (0.5,0.5) and 705 (0.5,1)?

No, maybe @enocera does.

Copy link
Member

@RoyStegeman RoyStegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so familiar with dataset implementations so this is going to take me some time to figure out...

For now I just have a question regarding the Extractor class. There are also a lot of unused imports, are you using an lsp?

Comment on lines 8 to 9
import numpy as np
import yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import numpy as np
import yaml

import os

import numpy as np
import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import pandas as pd

import os

import numpy as np
import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import pandas as pd

SYS_UNC_by_bin = [{}]


class Extractor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this stuff seems pretty universal. Would it be worth defining a base class that is shared between datasets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about making this class more universal. However, I gave up because there are many differences between datasets, even within the same experiment. If we want to use a more universal extractor, then we should all agree on standard common specifics amongst datasets. For now, the extractor class is rather specialised to the datasets that I implemented.

@@ -0,0 +1,304 @@
import logging
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import os

Comment on lines -31 to +34
kinematic_coverage:
- k1
- k2
- k3
plot_x: abs_eta
kinematic_coverage: [abs_eta, m_W2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the legacy kinematics have three variables and the new implementation just two? And does the legacy dataset not use this metadatafile and thus cause issues if the kinematics don't match?

From the value it seems the removed variable was just supposed to indicate the 13TeV beam energy, which I don't think would be used anywhere. But this question is more about how the code deals with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third variable in the old implementation was the beam energy. However, this is optional in the new common data parser as the $\sqrt{s}$ is automatically read from the name of the dataset. In other words, using sqrts here would be redundant.

The legacy dataset does use this metadata file. However, since the kinematics is the same between the two versions, the legacy file can be removed (and I will).

@@ -0,0 +1,143 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import numpy as np

@achiefa
Copy link
Contributor Author

achiefa commented Dec 10, 2024

There are also a lot of unused imports, are you using an lsp?
These were copy and paste. My bad. BTW, what is a lsp?

@RoyStegeman
Copy link
Member

RoyStegeman commented Dec 10, 2024

language server protocol. It's the software that highlights tokens based on their role in the python syntax. Including unused imports

I'm pretty sure you are using it, but just in case you're not

@achiefa
Copy link
Contributor Author

achiefa commented Dec 10, 2024

Oh, then I am. But I just forgot to delete the unused imports.

@enocera
Copy link
Contributor

enocera commented Dec 10, 2024

Do you know why the fktables of this dataset only exist in theories 704 (0.5,0.5) and 705 (0.5,1)?

No, maybe @enocera does.

I think that the reason is as follows: W+c data were not included in NNDPF4.0 because, at that time, NNLO corrections to the matrix elements were not known. When the MHOU, QED, and aN3LO determinations were produced, the leitmotiv was to put them on the same grounds as NNPDF4.0. Therefore W+c did not go into them. At some point I raised the question whether we should include it (in the same way as we do, e.g. for LHC data in the N3LO fit). Initially their answer was yes, but then they retracted. So I suspect that Andrea started to compute the FK tables, but then stopped.

@achiefa achiefa force-pushed the new_CMS_WCHARM_13TEV_WPWM-TOT-UNNORM branch from f054206 to 342e1f5 Compare December 12, 2024 15:07
@achiefa
Copy link
Contributor Author

achiefa commented Dec 12, 2024

According to what ERN said in the last code meeting, this one is also ready for review.

@RoyStegeman RoyStegeman force-pushed the new_CMS_WCHARM_13TEV_WPWM-TOT-UNNORM branch from 5ce9c42 to 0eb48b5 Compare December 24, 2024 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants