Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CDF W mass determination runcards #134

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Add CDF W mass determination runcards #134

wants to merge 11 commits into from

Conversation

cschwan
Copy link
Contributor

@cschwan cschwan commented May 13, 2022

We need the following distributions:

  • transverse momentum of the W boson
  • transverse momentum of the lepton
  • transverse momentum of the neutrino

@cschwan cschwan changed the title Add CDF transverse mass runcards Add CDF W mass determination runcards May 20, 2022
@cschwan cschwan self-assigned this May 20, 2022
@cschwan
Copy link
Contributor Author

cschwan commented May 20, 2022

There's a missing cut on 'pt(W) < 15 GeV', which probably won't change much given that most event have a small transverse momentum.

@cschwan cschwan requested a review from Radonirinaunimi May 23, 2022 15:04
@Radonirinaunimi
Copy link
Member

For the cuts, I double-checked them and they are are fine to me. I will also try to run this on my side.

@cschwan
Copy link
Contributor Author

cschwan commented May 24, 2022

What I'm not sure of is the final value for set req_acc_FO .... I just know that the current value 0.0005 is not enough. I'm currently running with 0.0001 and after it's finished we'll see if that is enough.

@cschwan
Copy link
Contributor Author

cschwan commented May 24, 2022

The runs for MT and PTL are finished now, they ran with set req_acc_FO 0.0001 in roughly 1.5 days on my Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz with four physical CPUs. There are a few problems:

  • the first bin of PTL and the first two bins of MTL are negative because NLO EW + NLO QCD > LO. I wonder if I'm using a good scale, which I've set to MW in all cases;
  • the MC uncertainty is OK for most bins, but some are at the 1% and one extreme bin is 38% (see next point for an explanation)
  • for the PTL distribution the LO is zero above 50 GeV because at LO mt = 2 ptl and thus the cut mt < 100 GeV translates into pt < 50 GeV. @Radonirinaunimi could you double-check that this cut is indeed imposed for PTL?

@cschwan
Copy link
Contributor Author

cschwan commented May 24, 2022

Concerning my last this is probably right and OK, because the fit ranges are actually smaller than the distributions shown. I will adjust the distribution ranges to the fit ranges so it gets easier to integrate.

@Radonirinaunimi
Copy link
Member

* for the `PTL` distribution the LO is zero above 50 GeV because at LO mt = 2 ptl and thus the cut mt < 100 GeV translates into pt < 50 GeV. @Radonirinaunimi could you double-check that this cut is indeed imposed for `PTL`?

The cut on the transverse momentum of the lepton (be it $Z$ or $W$) is always $30 &lt; p_{T}^{\ell} &lt; 55 \text{} \mathrm{GeV}$. So, indeed, this is probably fine.

Btw @cschwan, if it is not too much of a request, for the record, could you perhaps post the numbers/results here?

@cschwan
Copy link
Contributor Author

cschwan commented May 24, 2022

@Radonirinaunimi here you go: MT PTL (PTNU is missing but should be very similar to PTL).

Note that only the second and third columns are correct and relevant (the MC result and MC uncertainty), because the PineAPPL result is wrong which I fixed with commit bb3fb45 (the grid was convoluted with two proton PDFs instead of proton and anti-proton). Depending on how you upgrade/start you might get the same problems. Maybe try running with small statistics first.

@Radonirinaunimi
Copy link
Member

@Radonirinaunimi here you go: MT PTL (PTNU is missing but should be very similar to PTL).

Note that only the second and third columns are correct and relevant (the MC result and MC uncertainty), because the PineAPPL result is wrong which I fixed with commit bb3fb45 (the grid was convoluted with two proton PDFs instead of proton and anti-proton). Depending on how you upgrade/start you might get the same problems. Maybe try running with small statistics first.

Thanks a lot!

@cschwan
Copy link
Contributor Author

cschwan commented May 24, 2022

@Radonirinaunimi could you run one of the distributions on the cluster? Just to see how it would take. I've increased the statistics by a lot, but it could be too much.

@Radonirinaunimi
Copy link
Member

Radonirinaunimi commented May 24, 2022

@Radonirinaunimi could you run one of the distributions on the cluster? Just to see how it would take. I've increased the statistics by a lot, but it could be too much.

That is indeed quite a lot of statistics 😃 I will run them tomorrow (today I tried to fix some issues with the installation of the rr script on our cluster).

@cschwan
Copy link
Contributor Author

cschwan commented May 25, 2022

Alright, if these problems persist simply open a new Issue and we'll help you with them!

@Radonirinaunimi
Copy link
Member

@cschwan: here are the results for the MT: results.log. They look more or less reasonable except for the still larger uncertainties in some of the bins; also the negative bins have disappeared. It took about 100 hours to get them on 12 threads with 3.8 GHz.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 9, 2022

@Radonirinaunimi great, the MC uncertainties look very well, all of them are sub-per mille.

@Radonirinaunimi
Copy link
Member

@Radonirinaunimi great, the MC uncertainties look very well, all of them are sub-per mille.

Ok! I will keep the req_acc_FO to be 0.000025.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 9, 2022

Now we need to agree on a range of MW we'd like to scan;

  • PDG 2022's result is 80.377,
  • CDF's result is 80.434 (rounded to three digits),
  • and the SM result is 80.357.

Should we maybe scan in steps of 4 MeV? The uncertainty of the CDF result is 9 MeV, and the PDF shifts reported by the ResBos2 authors are of a similar magnitude. I'd scan starting from the CDF result, and start a run with 4 MeV shifted down to the SM result. That should be roughly 20 runs for MT and PTL.

Could you check that this is what Juan was suggesting/OK? If it is, I'll prepare the runcards.

@Radonirinaunimi
Copy link
Member

@juanrojochacon What would be the appropriate choice here?

@juanrojochacon
Copy link

@cschwan @Radonirinaunimi a scan on steps of 4 MeV sounds good to begin with. We can then assess the stability of the template fit results and if needed generate more templates so that the spacing is reduced to 2 MeV, and so on. But it is better to start with high-stat templates, even with a coarse MW spacing, than with a very fine spacing of MW but then lower statistics.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 9, 2022

@Radonirinaunimi the last commit adds all templates for the MT distribution, which is the most important one. Could you please on the cluster:

  1. run ./rr install so that the PineAPPL Python dependency is updated. This will fix the results table shown at the end of each run (see also Add support for PDFs other than protons pinefarm#10).

  2. Before submitting any jobs run ./rr run TEST_RUN_SH theories/theory_200_1.yaml locally to check that everything works and that every dependency is installed. That should be very quick.

  3. Finally start the templates on the cluster similar to

    ./rr run CDF_WP_1960GEV_88FB_MT_80434 theories/theory_200_1.yaml
    ./rr run CDF_WP_1960GEV_88FB_MT_80430 theories/theory_200_1.yaml
    ./rr run CDF_WP_1960GEV_88FB_MT_80426 theories/theory_200_1.yaml
    # ...
    

    so that the masses closest to the CDF result are prioritized first.

Once we have two or more grids I can finalize the chi-square script. Once we know this is working we can also run PTL and PTNU.

@Radonirinaunimi
Copy link
Member

@cschwan, perfect!

@Radonirinaunimi
Copy link
Member

Radonirinaunimi commented Jun 27, 2022

@cschwan By telling madevent and the batch system to use the same 32 cores, the runs were (much) faster. Some of the grids are already available and I am attaching the logs below. The remaining ones are still running as they were delayed by the queue. Also, in the meantime, I will upload the grids in NNPDF/pineapplgrids.

$M_W~\mathrm{[GeV]}$ Results
80.354 results.log
80.358 results.log
80.362 results.log
80.366 results.log
80.370 results.log
80.374 results.log
80.378 results.log
80.382 results.log
80.386 results.log
80.390 results.log
80.394 results.log
80.398 results.log
80.402 results.log
80.406 results.log
80.410 results.log
80.414 results.log
80.418
80.422 results.log
80.426 results.log
80.430 results.log
80.434 results.log

@cschwan
Copy link
Contributor Author

cschwan commented Jun 27, 2022

@Radonirinaunimi that's great. Are those are the ones with set req_acc_FO 0.000025 or the ones where lowered the statistics and started 16 runs for each template?

@Radonirinaunimi
Copy link
Member

@Radonirinaunimi that's great. Are those are the ones with set req_acc_FO 0.000025 or the ones where lowered the statistics and started 16 runs for each template?

Those are already with set req_acc_FO 0.000025.

@juanrojochacon
Copy link

Can we check, looking at the plots, whether the statistical precision of the calculation is sufficient?

@cschwan
Copy link
Contributor Author

cschwan commented Jun 27, 2022

@Radonirinaunimi great!

@cschwan
Copy link
Contributor Author

cschwan commented Jun 27, 2022

@juanrojochacon that's a very good question. In statistical terms it would be: are the pulls between neighbouring templates $j$ and $j+1$ for each bin $i$ significant? So is

$$ \frac{\sigma^i_{j+1} - \sigma^i_j}{\sqrt{(\delta \sigma^i_{j+1})^2 + (\delta \sigma^i_j)^2}} > 3 $$

where $\sigma^i_j $ is the cross section and $\delta \sigma^i_j$ the corresponding Monte Carlo uncertainty.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 27, 2022

This check isn't always fulfilled, see the comparison: pulls.txt. Only bins 31 to 38 have a large pull, those are the bins in the range of 80 and 83.5 GeV. We could increase statistics, but also only up to some point, where the interpolation error is of a similar size than the MC uncertainty. We've already reached that for some bins, see in the results files where the column 'central/sigma' is larger than one. This column shows the difference between PineAPPL and the MC (the interpolation error) divided by the MC uncertainty. If this number is one, the MC uncertainty is of the same size as the interpolation error.

If we take a step size of MW of 8 MeV (only use every second template), then it looks a bit better: pulls.txt.

@Radonirinaunimi How long does one template need to finish?

@Radonirinaunimi
Copy link
Member

This check isn't always fulfilled, see the comparison: pulls.txt. Only bins 31 to 38 have a large pull, those are the bins in the range of 80 and 83.5 GeV. We could increase statistics, but also only up to some point, where the interpolation error is of a similar size than the MC uncertainty. We've already reached that for some bins, see in the results files where the column 'central/sigma' is larger than one. This column shows the difference between PineAPPL and the MC (the interpolation error) divided by the MC uncertainty. If this number is one, the MC uncertainty is of the same size as the interpolation error.

If we take a step size of MW of 8 MeV (only use every second template), then it looks a bit better: pulls.txt.

@Radonirinaunimi How long does one template need to finish?

With 32 cores, one template takes about 60 hours to finish.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 28, 2022

There's another possibility: use bias_weight_function in cuts.f. I'll have a look at that, ideally we can redistribute phase-space points from the most populated regions into the ones that are less populated without increasing the statistics.

@Radonirinaunimi
Copy link
Member

There's another possibility: use bias_weight_function in cuts.f. I'll have a look at that, ideally we can redistribute phase-space points from the most populated regions into the ones that are less populated without increasing the statistics.

I have a recollection that we tried this approach when studying the negativity of the PDFs for the HIVM CC DY but had not seen significant improvements.

I might be missing an obvious argument but why is it that the requirement below is crucial? Isn't it enough to just make sure that the (MC) uncertainty of the individual template is under control?

$$ \frac{\sigma^i_{j+1} - \sigma^i_j}{\sqrt{(\delta \sigma^i_{j+1})^2 + (\delta \sigma^i_j)^2}} > 3 $$

@cschwan
Copy link
Contributor Author

cschwan commented Jun 28, 2022

I have a recollection that we tried this approach when studying the negativity of the PDFs for the HIVM CC DY but had not seen significant improvements.

True, but I realize now that these grids were suffering from the same problem as described in #138 (comment). The PDFs are negative and that's why the MC can't properly integrate it. I hope this will be different here.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 28, 2022

I might be missing an obvious argument but why is it that the requirement below is crucial? Isn't it enough to just make sure that the (MC) uncertainty of the individual template is under control?

If our MC uncertainty is larger than the differences in the templates we don't see a difference. However, it's probably a less of a problem than I initially thought:

  • the phase-space points probably have a non-negligable correlation (MC should run with the same seed so that changing MW slightly probably generates very similar phase-space points) so that the pulls are really much larger.
  • on the other hand we might get away with a few bins showing larger pulls, because they will be the important ones

@Radonirinaunimi
Copy link
Member

True, but I realize now that these grids were suffering from the same problem as described in #138 (comment). The PDFs are negative and that's why the MC can't properly integrate it. I hope this will be different here.

This might be indeed true.

If our MC uncertainty is larger than the differences in the templates we don't see a difference.

Right, that's correct. But the point below is indeed what I thought.

the phase-space points probably have a non-negligable correlation (MC should run with the same seed so that changing MW slightly probably generates very similar phase-space points) so that the pulls are really much larger.

As you can see in the table above, we are now only missing two templates. As soon as these are finished, I could run the templates again with the bias_weight_function.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 28, 2022

As you can see in the table above, we are now only missing two templates. As soon as these are finished, I could run the templates again with the bias_weight_function.

Alright, they will be very useful insofar that we can already run the analysis using the templates, in the meantime I will also write the bias function.

@Radonirinaunimi
Copy link
Member

For some strange reasons, the last template is still running. I have not gotten any output logs yet (in principle, some logs should be written into disk around step 3). I have just asked the admin to check if there is something wrong and see.

@cschwan
Copy link
Contributor Author

cschwan commented Jun 30, 2022

Strange...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants