Skip to content

Commit

Permalink
Updates following review
Browse files Browse the repository at this point in the history
  • Loading branch information
brynpickering committed Oct 24, 2024
1 parent ec3bd32 commit f706ebf
Show file tree
Hide file tree
Showing 6 changed files with 155 additions and 157 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

### User-facing changes

|new| `where(array, where_array)` math helper function to apply a where array _inside_ an expression, to enable extending component dimensions on-the-fly, and applying filtering to different components within the expression (#604, #679).
|changed| Helper functions are now documented on their own page within the "Defining your own math" section of the documentation (#698).

|new| `where(array, condition)` math helper function to apply a where array _inside_ an expression, to enable extending component dimensions on-the-fly, and applying filtering to different components within the expression (#604, #679).

|new| Data tables can inherit options from `templates`, like `techs` and `nodes` (#676).

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/api/helper_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ search:
---

::: calliope.backend.helper_functions
options:
docstring_options:
ignore_init_summary: true
121 changes: 121 additions & 0 deletions docs/user_defined_math/helper_functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@

# Helper functions

For [`where` strings](syntax.md#where-strings) and [`expression` strings](syntax.md#where-strings), there are many helper functions available to use, to allow for more complex operations to be undertaken within the string.
Their functionality is detailed in the [helper function API page](../reference/api/helper_functions.md).
Here, we give a brief summary.
Some of these helper functions require a good understanding of their functionality to apply, so make sure you are comfortable with them before using them.

## inheritance

using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key.
If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain.

So, for the definition:

```yaml
templates:
techgroup1:
template: techgroup2
flow_cap_max: 10
techgroup2:
base_tech: supply
techs:
tech1:
template: techgroup1
tech2:
template: techgroup2
```
`inheritance(techgroup1)` will give the `[tech1]` subset and `inheritance(techgroup2)` will give the `[tech1, tech2]` subset.

## any

Parameters are indexed over multiple dimensions.
Using `any(..., over=...)` in a `where` string allows you to check if there is at least one non-NaN value in a given dimension (akin to [xarray.DataArray.any][]).
So, `any(cost, over=[nodes, techs])` will check if there is at least one non-NaN tech+node value in the `costs` dimension (the other dimension that the `cost` decision variable is indexed over).

## defined

Similar to [any](syntax.md#any), using `defined(..., within=...)` in a `where` string allows you to check for non-NaN values along dimensions.
In the case of `defined`, you can check if e.g., certain technologies have been defined within the nodes or certain carriers are defined within a group of techs or nodes.

So, for the definition:

```yaml
techs:
tech1:
base_tech: conversion
carrier_in: electricity
carrier_out: heat
tech2:
base_tech: conversion
carrier_in: [coal, biofuel]
carrier_out: electricity
nodes:
node1:
techs: {tech1}
node2:
techs: {tech1, tech2}
```

`defined(carriers=electricity, within=techs)` would yield a list of `[True, True]` as both technologies define electricity.

`defined(techs=[tech1, tech2], within=nodes)` would yield a list of `[True, True]` as both nodes define _at least one_ of `tech1` or `tech2`.

`defined(techs=[tech1, tech2], within=nodes, how=all)` would yield a list of `[False, True]` as only `node2` defines _both_ `tech1` and `tech2`.

## sum

Using `sum(..., over=)` in an expression allows you to sum over one or more dimension of your component array (be it a parameter, decision variable, or global expression).

## select_from_lookup_arrays

Some of our arrays in [`model.inputs`][calliope.Model.inputs] are not data arrays, but "lookup" arrays.
These arrays are used to map the array's index items to other index items.
For instance when using [time clustering](../advanced/time.md#time-clustering), the `lookup_cluster_last_timestep` array is used to get the timestep resolution and the stored energy for the last timestep in each cluster.
Using `select_from_lookup_arrays(..., dim_name=lookup_array)` allows you to apply this lookup array to your data array.

## get_val_at_index

If you want to access an integer index in your dimension, use `get_val_at_index(dim_name=integer_index)`.
For example, `get_val_at_index(timesteps=0)` will get the first timestep in your timeseries, `get_val_at_index(timesteps=-1)` will get the final timestep.
This is mostly used when conditionally applying a different expression in the first / final timestep of the timeseries.

It can be used in the `where` string (e.g., `timesteps=get_val_at_index(timesteps=0)` to mask all other timesteps) and the `expression string` (via [slices](syntax.md#slices) - `storage[timesteps=$first_timestep]` and `first_timestep` expression being `get_val_at_index(timesteps=0)`).

## roll

We do not use for-loops in our math.
This can be difficult to get your head around initially, but it means that to define expressions of the form `var[t] == var[t-1] + param[t]` requires shifting all the data in your component array by N places.
Using `roll(..., dimension_name=N)` allows you to do this.
For example, `roll(storage, timesteps=1)` will shift all the storage decision variable objects by one timestep in the array.
Then, `storage == roll(storage, timesteps=1) + 1` is equivalent to applying `storage[t] == storage[t - 1] + 1` in a for-loop.

## default_if_empty

We work with quite sparse arrays in our models.
So, although your arrays are indexed over e.g., `nodes`, `techs` and `carriers`, a decision variable or parameter might only have one or two values in the array, with the rest being NaN.
This can play havoc with defining math, with `nan` values making their way into your optimisation problem and then killing the solver or the solver interface.
Using `default_if_empty(..., default=...)` in your `expression` string allows you to put a placeholder value in, which will be used if the math expression unavoidably _needs_ a value.
Usually you shouldn't need to use this, as your `where` string will mask those NaN values.
But if you're having trouble setting up your math, it is a useful function to getting it over the line.

!!! note
Our internally defined parameters, listed in the `Parameters` section of our [pre-defined base math documentation][base-math] all have default values which propagate to the math.
You only need to use `default_if_empty` for decision variables and global expressions, and for user-defined parameters.

## where

[Where strings](syntax.md#where-strings) only allow you to apply conditions across the whole expression equations.
Sometimes, it's necessary to apply specific conditions to different components _within_ the expression.
Using `where(<math_component>, <condition>)` helper function enables this,
where `<math_component>` is a reference to a parameter, variable, or global expression and `<condition>` is a reference to an array in your model inputs that contains only `True`/`1` and `False`/`0`/`NaN` values.
`<condition>` will then be applied to `<math_component>`, keeping only the values in `<math_component>` where `<condition>` is `True`/`1`.

This helper function can also be used to _extend_ the dimensions of a `<math_component>`.
If the `<condition>` has any dimensions not present in `<math_component>`, `<math_component>` will be [broadcast](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html#broadcasting-adjusting-arrays-to-the-same-shape) to include those dimensions.

!!! note
`Where` gets referred to a lot in Calliope math.
It always means the same thing: applying [xarray.DataArray.where][].
129 changes: 4 additions & 125 deletions docs/user_defined_math/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ When checking the existence of an input parameter it is possible to first sum it
- If you want to apply a constraint across all `nodes` and `techs`, but only for node+tech combinations where the `flow_out_eff` parameter has been defined, you would include `flow_out_eff`.
- If you want to apply a constraint over `techs` and `timesteps`, but only for combinations where the `source_use_max` parameter has at least one `node` with a value defined, you would include `any(resource, over=nodes)`. (1)

1. `any` is a [helper function](#helper-functions); read more below!
1. `any` is a [helper function](helper_functions.md#any); read more below!

1. Checking the value of a configuration option or an input parameter.
Checks can use any of the operators: `>`, `<`, `=`, `<=`, `>=`.
Expand All @@ -50,15 +50,15 @@ Configuration options are any that are defined in `config.build`, where you can
- If you want to apply a constraint only for the first timestep in your timeseries, you would include `timesteps=get_val_at_index(dim=timesteps, idx=0)`. (1)
- If you want to apply a constraint only for the last timestep in your timeseries, you would include `timesteps=get_val_at_index(dim=timesteps, idx=-1)`.

1. `get_val_at_index` is a [helper function](#helper-functions); read more below!
1. `get_val_at_index` is a [helper function](helper_functions.md#get_val_at_index); read more below!

1. Checking the `base_tech` of a technology (`storage`, `supply`, etc.) or its inheritance chain (if using `templates` and the `template` parameter).

??? example "Examples"

- If you want to create a decision variable across only `storage` technologies, you would include `base_tech=storage`.
- If you want to apply a constraint across only your own `rooftop_supply` technologies (e.g., you have defined `rooftop_supply` in `templates` and your technologies `pv` and `solar_thermal` define `#!yaml template: rooftop_supply`), you would include `inheritance(rooftop_supply)`.
Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a helper function ([see below](#helper-functions)) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`.
Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a helper function ([see below](helper_functions.md)) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`.

1. Subsetting a set.
The sets available to subset are always [`nodes`, `techs`, `carriers`] + any additional sets defined by you in [`foreach`](#foreach-lists).
Expand All @@ -67,7 +67,7 @@ The sets available to subset are always [`nodes`, `techs`, `carriers`] + any add

- If you want to filter `nodes` where any of a set of `techs` are defined: `defined(techs=[tech1, tech2], within=nodes, how=any)` (1).

1. `defined` is a [helper function](#helper-functions); read more below!
1. `defined` is a [helper function](helper_functions.md#defined); read more below!

To combine statements you can use the operators `and`/`or`.
You can also use the `not` operator to negate any of the statements.
Expand Down Expand Up @@ -109,127 +109,6 @@ Behind the scenes, we will make sure that every relevant element of the defined
Slicing math components involves appending the component with square brackets that contain the slices, e.g. `flow_out[carriers=electricity, nodes=[A, B]]` will slice the `flow_out` decision variable to focus on `electricity` in its `carriers` dimension and only has two nodes (`A` and `B`) on its `nodes` dimension.
To find out what dimensions you can slice a component on, see your input data (`model.inputs`) for parameters and the definition for decision variables in your math dictionary.

## Helper functions

For [`where` strings](#where-strings) and [`expression` strings](#where-strings), there are many helper functions available to use, to allow for more complex operations to be undertaken.
Their functionality is detailed in the [helper function API page](../reference/api/helper_functions.md).
Here, we give a brief summary.
Some of these helper functions require a good understanding of their functionality to apply, so make sure you are comfortable with them before using them.

### inheritance

using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key.
If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain.

So, for the definition:

```yaml
templates:
techgroup1:
template: techgroup2
flow_cap_max: 10
techgroup2:
base_tech: supply
techs:
tech1:
template: techgroup1
tech2:
template: techgroup2
```
`inheritance(techgroup1)` will give the `[tech1]` subset and `inheritance(techgroup2)` will give the `[tech1, tech2]` subset.

### any

Parameters are indexed over multiple dimensions.
Using `any(..., over=...)` in a `where` string allows you to check if there is at least one non-NaN value in a given dimension (akin to [xarray.DataArray.any][]).
So, `any(cost, over=[nodes, techs])` will check if there is at least one non-NaN tech+node value in the `costs` dimension (the other dimension that the `cost` decision variable is indexed over).

### defined

Similar to [any](#any), using `defined(..., within=...)` in a `where` string allows you to check for non-NaN values along dimensions.
In the case of `defined`, you can check if e.g., certain technologies have been defined within the nodes or certain carriers are defined within a group of techs or nodes.

So, for the definition:

```yaml
techs:
tech1:
base_tech: conversion
carrier_in: electricity
carrier_out: heat
tech2:
base_tech: conversion
carrier_in: [coal, biofuel]
carrier_out: electricity
nodes:
node1:
techs: {tech1}
node2:
techs: {tech1, tech2}
```

`defined(carriers=electricity, within=techs)` would yield a list of `[True, True]` as both technologies define electricity.

`defined(techs=[tech1, tech2], within=nodes)` would yield a list of `[True, True]` as both nodes define _at least one_ of `tech1` or `tech2`.

`defined(techs=[tech1, tech2], within=nodes, how=all)` would yield a list of `[False, True]` as only `node2` defines _both_ `tech1` and `tech2`.

### sum

Using `sum(..., over=)` in an expression allows you to sum over one or more dimension of your component array (be it a parameter, decision variable, or global expression).

### select_from_lookup_arrays

Some of our arrays in [`model.inputs`][calliope.Model.inputs] are not data arrays, but "lookup" arrays.
These arrays are used to map the array's index items to other index items.
For instance when using [time clustering](../advanced/time.md#time-clustering), the `lookup_cluster_last_timestep` array is used to get the timestep resolution and the stored energy for the last timestep in each cluster.
Using `select_from_lookup_arrays(..., dim_name=lookup_array)` allows you to apply this lookup array to your data array.

### get_val_at_index

If you want to access an integer index in your dimension, use `get_val_at_index(dim_name=integer_index)`.
For example, `get_val_at_index(timesteps=0)` will get the first timestep in your timeseries, `get_val_at_index(timesteps=-1)` will get the final timestep.
This is mostly used when conditionally applying a different expression in the first / final timestep of the timeseries.

It can be used in the `where` string (e.g., `timesteps=get_val_at_index(timesteps=0)` to mask all other timesteps) and the `expression string` (via [slices](#slices) - `storage[timesteps=$first_timestep]` and `first_timestep` expression being `get_val_at_index(timesteps=0)`).

### roll

We do not use for-loops in our math.
This can be difficult to get your head around initially, but it means that to define expressions of the form `var[t] == var[t-1] + param[t]` requires shifting all the data in your component array by N places.
Using `roll(..., dimension_name=N)` allows you to do this.
For example, `roll(storage, timesteps=1)` will shift all the storage decision variable objects by one timestep in the array.
Then, `storage == roll(storage, timesteps=1) + 1` is equivalent to applying `storage[t] == storage[t - 1] + 1` in a for-loop.

### default_if_empty

We work with quite sparse arrays in our models.
So, although your arrays are indexed over e.g., `nodes`, `techs` and `carriers`, a decision variable or parameter might only have one or two values in the array, with the rest being NaN.
This can play havoc with defining math, with `nan` values making their way into your optimisation problem and then killing the solver or the solver interface.
Using `default_if_empty(..., default=...)` in your `expression` string allows you to put a placeholder value in, which will be used if the math expression unavoidably _needs_ a value.
Usually you shouldn't need to use this, as your `where` string will mask those NaN values.
But if you're having trouble setting up your math, it is a useful function to getting it over the line.

!!! note
Our internally defined parameters, listed in the `Parameters` section of our [pre-defined base math documentation][base-math] all have default values which propagate to the math.
You only need to use `default_if_empty` for decision variables and global expressions, and for user-defined parameters.

### where

[Where strings](#where-strings) only allow you to apply conditions across the whole expression equations.
Sometimes, it's necessary to apply specific conditions to different components _within_ the expression.
Using `where(<math_component>, <boolean_array>)` helper function enables this,
where `<math_component>` is a reference to a parameter, variable, or global expression and `<boolean_array>` is a reference to an array in your model inputs that contains only `True`/`1` and `False`/`0`/`NaN` values.
`<boolean_array>` will then be applied to `<math_component>`, keeping only the values in `<math_component>` where `<boolean_array>` is `True`/`1`.

This helper function can also be used to _extend_ the dimensions of a `<math_component>`.
If the ``<boolean_array>`` has any dimensions not present in `<math_component>`, `<math_component>` will be [broadcast](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html#broadcasting-adjusting-arrays-to-the-same-shape) to include those dimensions.

!!! note
`Where` gets referred to a lot in Calliope math.
It always means the same thing: applying [xarray.DataArray.where][].

## equations

Equations are combinations of [expression strings](#expression-strings) and [where strings](#where-strings).
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ nav:
- user_defined_math/index.md
- user_defined_math/components.md
- user_defined_math/syntax.md
- user_defined_math/helper_functions.md
- user_defined_math/customise.md
- Example additional math gallery:
- user_defined_math/examples/index.md
Expand Down
Loading

0 comments on commit f706ebf

Please sign in to comment.