Skip to content

Commit

Permalink
Merge pull request #576 from SamuelBorden/refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
gipert authored Apr 26, 2024
2 parents 9b2e556 + ebfbe4c commit 7950c34
Show file tree
Hide file tree
Showing 52 changed files with 2,281 additions and 1,321 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ci:
autoupdate_schedule: "quarterly"
autofix_commit_msg: "style: pre-commit fixes"

exclude: ^(attic|src/pygama/math|src/pygama/flow/datagroup.py)
exclude: ^(attic|src/pygama/flow/datagroup.py)
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: "v4.5.0"
Expand Down
44 changes: 22 additions & 22 deletions docs/source/notebooks/MathTutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -703,10 +703,10 @@
"metadata": {},
"outputs": [],
"source": [
"from pygama.math.functions.pygama_continuous import pygama_continuous\n",
"from pygama.math.functions.pygama_continuous import PygamaContinuous\n",
"\n",
"\n",
"class cauchy_gen(pygama_continuous):\n",
"class cauchy_gen(PygamaContinuous):\n",
" def _pdf(self, x: np.ndarray) -> np.ndarray:\n",
" x.flags.writeable = True\n",
" return nb_cauchy_pdf(x, 0, 1)\n",
Expand Down Expand Up @@ -906,7 +906,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding distributions with `sum_dists`\n",
"## Adding distributions with `SumDists`\n",
"\n",
"In the business of fitting data, adding two distributions together is our bread-and-butter. This part of the notebook will show you how to create your own distribution that is an instance of `pygama.math.sum_dists`. There are a couple of different use cases for adding distributions together, and we will look at each in turn, but here is a summary:\n",
"\n",
Expand Down Expand Up @@ -949,16 +949,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## A little about `sum_dists`\n",
"## A little about `SumDists`\n",
"The first important thing to note about this class is that all methods in this class are of the form `method(x, *parameter_array)` or `method(x, parameters, ...)`. \n",
"\n",
"`sum_dists` works by adding two --- and only two --- distributions together. The first thing a user does in creating a new instance is define the order of elements in a `parameter_array`. Then, `sum_dists` works by grabbing elements at different indices in this `parameter_array` according to rules that the user provides in the instantiation. Here's how to write a generic class: \n",
"`SumDists` works by adding two --- and only two --- distributions together. The first thing a user does in creating a new instance is define the order of elements in a `parameter_array`. Then, `sum_dists` works by grabbing elements at different indices in this `parameter_array` according to rules that the user provides in the instantiation. Here's how to write a generic class: \n",
"\n",
"1. Create a `parameter_index_array` that holds the indices of what will eventually come in the `parameter_array`. If the user will eventually pass `parameters = [frac1, mu, sigma]` then we just take `parameter_index_array=[frac1, mu, sigma]=range(3)`\n",
"\n",
"2. `sum_dists` takes an alternating pattern of distributions and distribution-specific parameter_index_arrays. Each par array can contain `[mu, sigma, shape]`. These par arrays are placed in a tuple with their distribution like `(dist1, [mu1, sigma1, shape1])`. Finally, a list of these tuples is fed to the constructor as `sum_dists([(dist1, [mu1, sigma1, shape1]), (dist2, [mu2, sigma2])])`\n",
"2. `SumDists` takes an alternating pattern of distributions and distribution-specific parameter_index_arrays. Each par array can contain `[mu, sigma, shape]`. These par arrays are placed in a tuple with their distribution like `(dist1, [mu1, sigma1, shape1])`. Finally, a list of these tuples is fed to the constructor as `sum_dists([(dist1, [mu1, sigma1, shape1]), (dist2, [mu2, sigma2])])`\n",
"\n",
"3. The `sum_dists` constructor then takes an array corresponding the index locations of where either fraction or area values will be passed in the ultimate `parameter_index_array`.\n",
"3. The `SumDists` constructor then takes an array corresponding the index locations of where either fraction or area values will be passed in the ultimate `parameter_index_array`.\n",
"\n",
"4. We pass one of the 4 flag options, to be described below. \n",
"\n",
Expand All @@ -978,26 +978,26 @@
"\n",
"Finally, we would intitalize (with the `fracs` flag in this case, more on that later) \n",
"\n",
"`sum_dists(args, area_frac_idxs = [frac_1], frac_flag = \"fracs\", parameter_names=[\"mu\", \"sigma\", \"tau\", \"frac_1\"])`\n",
"`SumDists(args, area_frac_idxs = [frac_1], frac_flag = \"fracs\", parameter_names=[\"mu\", \"sigma\", \"tau\", \"frac_1\"])`\n",
"\n",
"\n",
"### So... What is `sum_dists` actually doing? \n",
"Under the hood, `sum_dists` is applying a set of rules so that the following *is always* computed, regardless of the flag sent to the constructor. \n",
"### So... What is `SumDists` actually doing? \n",
"Under the hood, `SumDists` is applying a set of rules so that the following *is always* computed, regardless of the flag sent to the constructor. \n",
"\n",
"`area1*frac1*dist1(x, mu, sigma, shape) + area2*frac2*dist_2(x, mu2, sigma2, shape2)`\n",
"\n",
"It computes this by first grabbing the relevant areas or fraction values from their position in the `parameter_index_array`. Then, at the time of method call, `sum_dists` grabs the values from `parameter_index_array` that correspond to each distribution via slicing the `parameter_index_array` with the individual par arrays passed in the instantiation. In our example above, `sum_dists` knows to grab the values at indices 0, 1, 2 for the first distribution because it is instantiated with the tuple `(dist1, [mu, sigma, tau])`.\n",
"It computes this by first grabbing the relevant areas or fraction values from their position in the `parameter_index_array`. Then, at the time of method call, `SumDists` grabs the values from `parameter_index_array` that correspond to each distribution via slicing the `parameter_index_array` with the individual par arrays passed in the instantiation. In our example above, `SumDists` knows to grab the values at indices 0, 1, 2 for the first distribution because it is instantiated with the tuple `(dist1, [mu, sigma, tau])`.\n",
"\n",
"There's also some work done to determine which of `area` and `frac` are present. That's the purpose of the `flag`. Let's take some time and learn a little more about what each flag does. \n",
" \n",
"\n",
"\n",
"### The flag in the `sum_dists` constructor\n",
"### The flag in the `SumDists` constructor\n",
"Let's say we are interested in knowing the amount of counts present in a signal and a background in our total spectrum, i.e. we want to create and fit a function that looks like\n",
"\n",
"$pdf= A\\cdot gauss\\_pdf + B\\cdot cauchy\\_pdf$\n",
"\n",
"Because we are interested in fitting the areas, we send the `areas` keyword to `flag`. This causes `sum_dists` to look for an `area_frac_idxs` array of length 2 in the instantiation: this array contains the indices that the area values will be located at in the `parameter_index_array`. The `areas` keyword causes the fractions to be set to `1`. \n",
"Because we are interested in fitting the areas, we send the `areas` keyword to `flag`. This causes `SumDists` to look for an `area_frac_idxs` array of length 2 in the instantiation: this array contains the indices that the area values will be located at in the `parameter_index_array`. The `areas` keyword causes the fractions to be set to `1`. \n",
"\n",
"In our example above for constructing $new\\_pdf(x, \\mu, \\sigma, \\tau, frac_1) = frac_1\\cdot dist_1(x, \\mu, \\sigma, \\tau) + (1-frac1)\\cdot dist_2(x, \\mu, \\sigma)$, the instantiation takes the `fracs` keyword. This causes `sum_dists` to look for an `area_frac_idxs` array of length 1 in the instantiation: this array contains the index that the fraction values will be located at in the `parameter_index_array`. `sum_dists` works by multiplying `frac` times the first distribution, and `1-frac` times the second distribution.\n",
"\n",
Expand All @@ -1011,7 +1011,7 @@
"metadata": {},
"outputs": [],
"source": [
"from pygama.math.functions.sum_dists import sum_dists\n",
"from pygama.math.functions.sum_dists import SumDists\n",
"\n",
"# we first create an array contains the indices of the parameter array\n",
"# that will eventually be input to function calls\n",
Expand All @@ -1020,7 +1020,7 @@
"# we now create an array containing the tuples of the distributions and their shape parameters\n",
"args = [(gaussian, [mu, sigma]), (cauchy, [mu2, sigma2])]\n",
"# we initialize with the flag = \"areas\" to let the constructor know we are sending area parameter idxs only\n",
"cauchy_on_gauss = sum_dists(\n",
"cauchy_on_gauss = SumDists(\n",
" args,\n",
" area_frac_idxs=[area1, area2],\n",
" flag=\"areas\",\n",
Expand Down Expand Up @@ -1097,12 +1097,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## The `fracs` flag in the `sum_dists` constructor\n",
"To get a feel for how `sum_dists` works, let's create a function that creates the following distribution:\n",
"## The `fracs` flag in the `SumDists` constructor\n",
"To get a feel for how `SumDists` works, let's create a function that creates the following distribution:\n",
"\n",
"$pdf = f_1\\cdot gauss\\_{pdf}+(1-f_1)\\cdot cauchy\\_{pdf}$\n",
"\n",
"The `fracs` keyword allows `sum_dists` to look for the parameter index corresponding to the value of one fraction, and then automatically calculates `f*dist1+(1-f)*dist2`"
"The `fracs` keyword allows `SumDists` to look for the parameter index corresponding to the value of one fraction, and then automatically calculates `f*dist1+(1-f)*dist2`"
]
},
{
Expand All @@ -1111,7 +1111,7 @@
"metadata": {},
"outputs": [],
"source": [
"from pygama.math.functions.sum_dists import sum_dists\n",
"from pygama.math.functions.sum_dists import SumDists\n",
"\n",
"\n",
"# we first create an array contains the indices of the parameter array\n",
Expand All @@ -1121,7 +1121,7 @@
"# we now create an array containing the distributions and their shape parameters\n",
"args = [(gaussian, [mu, sigma]), (cauchy, [mu2, sigma2])]\n",
"# we initialize with the flag = \"areas\" to let the constructor know we are sending area parameters idxs only\n",
"cauchy_on_gauss = sum_dists(\n",
"cauchy_on_gauss = SumDists(\n",
" args,\n",
" area_frac_idxs=[frac1],\n",
" flag=\"fracs\",\n",
Expand Down Expand Up @@ -1169,10 +1169,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### By using `sum_dists` on an instance of `sum_dists`, you can even sum three distributions together\n",
"### By using `SumDists` on an instance of `SumDists`, you can even sum three distributions together\n",
"See the `hpge_peak` function to see an example of this. \n",
"\n",
"There is also one other `flag` keyword that `sum_dists` can take: `one_area`. This special keyword is used if we have an odd number of distributions that we want to add together and fit their areas. The first two distributions can be an instance of `sum_dists` with the `areas` flag; however, to add a third distribution to this, we need a way to pass only one area idx to the instantiation of `sum_dists`. See the `triple_gauss_on_double_step` function for an example of this."
"There is also one other `flag` keyword that `SumDists` can take: `one_area`. This special keyword is used if we have an odd number of distributions that we want to add together and fit their areas. The first two distributions can be an instance of `SumDists` with the `areas` flag; however, to add a third distribution to this, we need a way to pass only one area idx to the instantiation of `SumDists`. See the `triple_gauss_on_double_step` function for an example of this."
]
},
{
Expand Down
Loading

0 comments on commit 7950c34

Please sign in to comment.