-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #27 from LDMX-Software/26-start-a-physics-section-…
…of-the-online-docs 26 start a physics section of the online docs
- Loading branch information
Showing
6 changed files
with
424 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# ECal | ||
|
||
Software-related physics topics for the LDMX Electromagnetic Calorimeter (ECal). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,212 @@ | ||
# Layer Weights | ||
|
||
As a sampling calorimeter, the ECal needs to apply weights to all of the energies | ||
it measures in order to account for the energy "lost" from particles traversing | ||
material that is not active or sensitive. The ECal has sensitive silicon layers | ||
that are actually making energy measurements while the rest of the material | ||
(tungsten plates, cooling planes, printed circuit boards "PCB") are here treated | ||
as "absorber". | ||
|
||
Reconstructing the amount of energy deposited within a cell inside the sensitive | ||
silicon is another topic, largely a question of interpreting the electronic signals | ||
readout using our knowledge of the detector. This document starts with the assumption | ||
that we already have an estimate for the amount of energy deposited in the silicon. | ||
|
||
~~~admonish tip title="Energy Deposited in Silicon" | ||
I am being intentionally vague here because the estimate for the energy deposited | ||
in the silicon can be retrieved either from the `EcalRecHits` or the `EcalSimHits`. | ||
The simulated hits exclude any of the electronic estimation and so are, in some | ||
sense, less realistic; however they could be used to assign blame to energies within | ||
the ECal which is helpful for some studies. | ||
The estimated energy deposited in the silicon is stored in different names in these | ||
two classes. | ||
- `amplitude_` member variable of `EcalHit` (`getAmplitude()` accessor in C++) | ||
- `edep_` member variable of `EcalSimHit` (`getEdep()` accessor in C++) | ||
~~~ | ||
|
||
Our strategy to account for the absorbing material is to calculate a scaling factor \\(S\\) | ||
increasing the energy deposited in the silicon \\(E_\text{silicon}\\) up to a value that estimates the | ||
energy deposited in all of the absorbing material in front of this silicon layer | ||
up to the next sensitive silicon layer \\(E_\text{absorber}\\). This strategy allows us to keep the relatively | ||
accurate estimate of the energy deposited in the silicon as well as avoids double | ||
counting absorber layers by uniquely assigning them to a single silicon sensitive layer. | ||
|
||
\\[ | ||
E_\text{absorber} \approx S E_\text{silicon} | ||
\\] | ||
|
||
We can estimate \\(S\\) by using our knowledge of the detector design and measurements of how | ||
much energy particles lose inside of these materials. Many of the measurements of how much energy | ||
is lost by particles is focused on "minimum-ionizing particles" or MIPs and so we factor this | ||
out by first estimating the number of MIPs within the silicon and then multiplying that number | ||
by the estimate energy loss per MIP in the absorbing material. | ||
|
||
\\[ | ||
S = L / E_\text{MIP in Si} | ||
\quad\Rightarrow\quad | ||
E_\text{absorber} \approx L \frac{E_\text{silicon}}{E_\text{MIP in Si}} | ||
\\] | ||
|
||
The \\(L\\) in this equation is what the layer weights stored within are software represent. | ||
They are the estimated energy loss of a single MIP passing through the absorbing layers in | ||
front of a silicon layer up to the next silicon layer. | ||
|
||
~~~admonish note collapsible=true title="Calculating the Layer Weights" | ||
The [`Detectors/util/ecal_layer_stack.py`](https://github.com/LDMX-Software/ldmx-sw/blob/trunk/Detectors/util/ecal_layer_stack.py) | ||
python script calculate these layer weights as well as other aspects of the ECal detector design. | ||
It copies the \\(dE/dx\\) values estimated within the PDG for the materials within our design | ||
and then uses a coded definition of the material layers and their thicknesses to calculate | ||
these layer weights as well as the positions of the sensitive layers. | ||
~~~ | ||
|
||
Using this estimate for the energy deposited into the absorbing material, we can combine it | ||
with our estimate of the energy deposited into the sensitive silicon to obtain a "total energy" | ||
represented by a single hit. | ||
|
||
\\[ | ||
E = E_\text{absorber}+E_\text{silicon} | ||
\approx L E_\text{silicon} / E_\text{MIP in Si} + E_\text{silicon} | ||
= \left(1 + \frac{L}{E_\text{MIP in Si}}\right) E_\text{silicon} | ||
\\] | ||
|
||
~~~admonish note collapsible=true title="Second Order Corrections" | ||
From simulations of the detector design, we found that this underestimated the energy of showers | ||
within the ECal by a few percent, so we also introduces another scaling factor \\(C\\) | ||
(called `secondOrderEnergyCorrection` in the reconstruction code) that multiplies this total | ||
energy. | ||
\\[ | ||
E = C \left(1 + \frac{L}{E_\text{MIP in Si}}\right) E_\text{silicon} | ||
\\] | ||
This is helpful to keep in mind when editing the reconstruction code itself, but | ||
for simplicity and understandability if you are manually applying the layer weights (e.g. | ||
to use them with sim hits), you should not use this second order correction (i.e. just keep \\(C=1\\)). | ||
~~~ | ||
|
||
Now that we have an estimate for the total energy represented by a single hit, | ||
we can combine energies from multiple hits to obtain shower or cluster "features". | ||
|
||
## Applying the Layer Weights | ||
In many cases, one can just use the `energy_` member variable of the `EcalHit` class to get | ||
this energy estimate (including the layer weights) already calculated by the reconstruction; | ||
however, in some studies it is helpful to apply them manually. This could be used to study | ||
different values of the layer weights or to apply the layer weights to the `EcalSimHits` | ||
branch in order to estimate the geometric acceptance (i.e. neglect any electronic noise and | ||
readout affects). | ||
|
||
This section focuses on how to apply the layer weights within a python-based analysis | ||
which uses the `uproot` and `awkward` packages. We are going to assume that you (the | ||
user) have already loaded some data into an `ak.Array` with a form matching | ||
``` | ||
N * var * { | ||
id: int32, | ||
amplitude: float32 | ||
... other stuff | ||
} | ||
``` | ||
The `N` represents the number of events and the `var` signals that this array has a | ||
variable length sub-array. I am using the `EcalHit` terminology (i.e. `amplitude` | ||
represents the energy deposited in just the silicon), but one could use the following | ||
code with sim hits as well following a name change. You can view this form within | ||
a jupyter notebook or by calling `show()` on the array of interest. | ||
|
||
~~~admonish tip collapsible=true title="Loading Data into `ak.Array`" | ||
Just as an example, here is how I load the `EcalRecHits` branch from a pass named | ||
`example`. | ||
```python | ||
import uproot | ||
import awkward as ak | ||
f = uproot.open('path/to/file.root') | ||
ecal_hits = f['LDMX_Events'].arrays(expressions = 'EcalRecHits_example') | ||
# shorten names for easier access, just renaming columns not doing any data manipulation | ||
ecal_hits = ak.zip({ | ||
m[m.index('.')+1:].strip('_') : ecal_hits.EcalRecHits_valid[m] | ||
for m in ecal_hits.EcalRecHits_example.fields | ||
}) | ||
``` | ||
~~~ | ||
|
||
From here on, I assume that the variable `ecal_hits` has a `ak.Array` of data | ||
with a matching form. | ||
|
||
### Obtain Layer Number | ||
First, we need to obtain the layer number corresponding to each hit. This can be | ||
extracted from the detector ID that accompanies each hit. | ||
```python | ||
ecal_hits["layer"] = (ecal_hits.id >> 17) & 0x3f | ||
``` | ||
~~~admonish warning title="Bit Shifting Nonsense" | ||
This bit shifting nonsense was manually deduced from the | ||
[`ECalID` source code](https://github.com/LDMX-Software/ldmx-sw/blob/trunk/DetDescr/include/DetDescr/EcalID.h) | ||
and should be double checked. | ||
The layers should range in values from `0` up to the number | ||
of sensitve layers minus one (currently `33`). | ||
If you are running from within the ldmx-sw development or production container | ||
(i.e. have access to an installation of `fire`), you could also do | ||
```python | ||
from libDetDescr import EcalID | ||
import numpy as np | ||
@np.vectorize | ||
def to_layer(rawid): | ||
return EcalID(int(rawid)).layer() | ||
``` | ||
to use the C++ code directly. This is less performant but is more likely to | ||
be correct. | ||
~~~ | ||
|
||
### Efficiently Broadcasting Layer Weights | ||
Now, the complicated bit. We want to efficiently broadcast the layer weights | ||
to all of the hits without having to copy around a bunch of data. `awkward` | ||
has a solution for this we can use | ||
[IndexedArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.IndexedArray.html?highlight=indexedarray#ak.contents.IndexedArray) | ||
with the layer indices as the index and | ||
[ListOffsetArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.ListOffsetArray.html?highlight=listoffsetarray#ak.contents.ListOffsetArray) | ||
with the offsets of the layers to get an `ak.Array`that presents the layer | ||
weights for each hit without having to copy any data around. | ||
|
||
First, store the layer weights in order by the layer they correspond to. | ||
This is how the layer weights are stored within | ||
[the python config](https://github.com/LDMX-Software/Ecal/blob/trunk/python/digi.py) | ||
and so I have copied the v14 geometry values below. | ||
```python | ||
layer_weights = ak.Array([ | ||
2.312, 4.312, 6.522, 7.490, 8.595, 10.253, 10.915, 10.915, 10.915, 10.915, 10.915, | ||
10.915, 10.915, 10.915, 10.915, 10.915, 10.915, 10.915, 10.915, 10.915, 10.915, | ||
10.915, 10.915, 14.783, 18.539, 18.539, 18.539, 18.539, 18.539, 18.539, 18.539, | ||
18.539, 18.539, 9.938 | ||
]) | ||
``` | ||
Then we use the special arrays linked above to present these layer | ||
weights as if there is a copy of the correct one for each hit. | ||
```python | ||
layer_weight_of_each_hit = ak.Array( | ||
ak.contents.ListOffsetArray( | ||
ecal_hits.layer.layout.offsets, | ||
ak.contents.IndexedArray( | ||
ak.index.Index(ecal_hits.layer.layout.content.data), | ||
layer_weights.layout | ||
) | ||
) | ||
) | ||
``` | ||
|
||
### Applying Layer Weights | ||
Now that we have an array that has a layer weight for each hit, we can go | ||
through the calculation described above to apply these layer weights and | ||
get an estimate for the total energy of each hit. | ||
```python | ||
mip_si_energy = 0.130 #MeV - corresponds to ~3.5 eV per e-h pair <- derived from 0.5mm thick Si | ||
(1 + layer_weight_of_each_hit / mip_si_energy)*ecal_hits.amplitude | ||
``` | ||
If you are re-applying the layer weights used in reconstruction, you can | ||
also confirm that your code is functioning properly by comparing the array | ||
calculated above with the energies that the reconstruction provides alongside | ||
the `amplitude` member variable. | ||
```python | ||
# we need to remove the second order correct that the reconsturction applies | ||
secondOrderEnergyCorrection = 4000. / 3940.5 # extra correction factor used within ldmx-sw for v14 | ||
ecal_hits.energy / secondOrderEnergyCorrection | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# Different Kinds of Averages | ||
|
||
There are many more helpful and more detailed resources available. | ||
Honestly, the [Wikipedia page on Average](https://en.wikipedia.org/wiki/Average) | ||
is a great place to start. | ||
|
||
### mean | ||
The "arithmetic mean" (or "mean" for short, there are different kinds of means as well), | ||
is the most common average. Simply add up all the values of the samples and divide by | ||
the number of samples. | ||
|
||
**Code**: [`np.mean`](https://numpy.org/doc/stable/reference/generated/numpy.mean.html) | ||
|
||
How do we calculate the "error" on the mean? While the standard deviation shows the | ||
width of a normal distribution, what shows the "uncertainty" in how well we know what | ||
the center of that distribution is? The value we use for this is "standard error of the | ||
mean" (or just "standard error" for short). The | ||
[wikipedia page](https://en.wikipedia.org/wiki/Standard_error) gives a description | ||
in all its statistics glory, but for our purposes its helpful to remember that the | ||
error of the mean is the standard deviation divided by the square root of the number | ||
of samples. | ||
|
||
### median | ||
The middle number of the group of samples when ranked in order. If there are an even | ||
number of samples, then take the arithmetic mean of the two samples in the middle. | ||
|
||
**Code**: [`np.median`](https://numpy.org/doc/stable/reference/generated/numpy.median.html) | ||
|
||
### weighted mean | ||
Sometimes, the different samples should be weighted differently. In the arithmetic mean, | ||
each sample carries the same weight (I think of it as importance). We can slightly augment | ||
the arithmetic mean to consider weights by multipling each sample by its weight, adding these | ||
up, and then dividing by the sum of weights. This is a nice definition since it simplifies | ||
down to the regular arithmetic mean if all the weights are one. | ||
|
||
Confusingly, NumPy has decided to implemented the weighted mean in their `average` function. | ||
This is just due to a difference in vocabulary. | ||
|
||
**Code**: [`np.average`](https://numpy.org/doc/stable/reference/generated/numpy.average.html) | ||
|
||
### iterative mean | ||
In a lot of the distributions we look at, there | ||
is a "core" distribution that is Gaussian, but the tails are distorted by other effects. | ||
Sometimes, we only care about the core distribution and we want to intentionally cut away | ||
the distorted tails so that we can study the "bulk" of the distribution. This is where | ||
an iterative approach is helpful. We continue to take means and cut away outliers until | ||
there are no outliers left. The main question then becomes: how do we define an outlier? | ||
A simple definition that works well in our case since we usually have many samples is | ||
to have any sample that is further from the mean by X standard deviations be considered | ||
an outlier. | ||
|
||
NumPy does not have a function for this type of mean to my knowledge, so we will construct | ||
our own. The key aspect of NumPy I use is | ||
[boolean array indexing](https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing) | ||
which is one of many different ways NumPy allows you to access the contents of an array. | ||
In this case, I access certain elements of an array by constructing another array of True/False | ||
values (in the code below, this boolean array is called `selection`) and then when I use | ||
that array as the "index" (the stuff between the square brackets), only the values of the | ||
array that correspond to `True` values will be returned in the output array. I construct | ||
this boolean array by [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) | ||
a certain comparison against every element in the array. Using broadcasting shortens the | ||
code by removing the manual `for` loop _and_ makes the code faster since NumPy can move that | ||
`for` loop into it's under-the-hood, faster operations in C. | ||
|
||
**Code**: | ||
```python | ||
import numpy as np | ||
|
||
# first I write my own mean calculation that includes the possibility of weights | ||
# and returns the mean, standard deviation, and the error of the mean | ||
def weightedmean(values, weights = None) : | ||
"""calculate the weighted mean and standard deviation of the input values | ||
This function isn't /super/ necessary, but it is helpful for the itermean | ||
function below where the same code needs to be called in multiple times. | ||
""" | ||
mean = np.average(values, weights=weights) | ||
stdd = np.sqrt(np.average((values-mean)**2, weights=weights)) | ||
merr = stdd/np.sqrt(weights.sum()) | ||
return mean, stdd, merr | ||
|
||
# now I can write the iterative mean | ||
def itermean(values, weights = None, *, sigma_cut = 3.0) : | ||
"""calculate an iterative mean and standard deviation | ||
If no weights are provided, then we assume they are all one. | ||
The sigma_cut parameter is what defines what an outlier is. | ||
If a sample is further from the mean than the sigma_cut times | ||
the standard deviation, it is removed. | ||
""" | ||
mean, stdd, merr = weightedmean(values, weights) | ||
num_included = len(values)+1 # just to get loop started | ||
# first selection is all non-zero weighted samples | ||
selection = (weights > 0) if weights is not None else np.full(len(values), True) | ||
while np.count_nonzero(selection) < num_included : | ||
# update number included for this mean | ||
num_included = np.count_nonzero(selection) | ||
# calculate mean and std dev | ||
mean, stdd, merr = weightedmean(values[selection], weights[selection] if weights is not None else None) | ||
# determine new selection, since this variable was defined outside | ||
# the loop, we can use it in the `while` line and it will just be updated | ||
selection = (values > (mean - sigma_cut*stdd)) & (values < (mean + sigma_cut*stdd)) | ||
|
||
# left loop, meaning we settled into a state where nothing is outside sigma_cut standard deviations | ||
# from our mean | ||
return mean, stdd, merr | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Statistics and Calculations | ||
|
||
This chapter gives explanations on various calculations and statistical methods common in our experiment. | ||
This assorted group of topics is not a complete reference and should only be treated as an introduction rather | ||
that a full source. |
Oops, something went wrong.