From f706ebf93f8b477d5cad01bfcef56a17028a502f Mon Sep 17 00:00:00 2001 From: Bryn Pickering <17178478+brynpickering@users.noreply.github.com> Date: Thu, 24 Oct 2024 12:28:14 +0100 Subject: [PATCH] Updates following review --- CHANGELOG.md | 4 +- docs/reference/api/helper_functions.md | 3 + docs/user_defined_math/helper_functions.md | 121 +++++++++++++++++++ docs/user_defined_math/syntax.md | 129 +-------------------- mkdocs.yml | 1 + src/calliope/backend/helper_functions.py | 54 ++++----- 6 files changed, 155 insertions(+), 157 deletions(-) create mode 100644 docs/user_defined_math/helper_functions.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 2af92511..6ef5c98c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,7 +2,9 @@ ### User-facing changes -|new| `where(array, where_array)` math helper function to apply a where array _inside_ an expression, to enable extending component dimensions on-the-fly, and applying filtering to different components within the expression (#604, #679). +|changed| Helper functions are now documented on their own page within the "Defining your own math" section of the documentation (#698). + +|new| `where(array, condition)` math helper function to apply a where array _inside_ an expression, to enable extending component dimensions on-the-fly, and applying filtering to different components within the expression (#604, #679). |new| Data tables can inherit options from `templates`, like `techs` and `nodes` (#676). diff --git a/docs/reference/api/helper_functions.md b/docs/reference/api/helper_functions.md index 7ecfc5b0..3c9e575e 100644 --- a/docs/reference/api/helper_functions.md +++ b/docs/reference/api/helper_functions.md @@ -4,3 +4,6 @@ search: --- ::: calliope.backend.helper_functions + options: + docstring_options: + ignore_init_summary: true diff --git a/docs/user_defined_math/helper_functions.md b/docs/user_defined_math/helper_functions.md new file mode 100644 index 00000000..9bd7e4d2 --- /dev/null +++ b/docs/user_defined_math/helper_functions.md @@ -0,0 +1,121 @@ + +# Helper functions + +For [`where` strings](syntax.md#where-strings) and [`expression` strings](syntax.md#where-strings), there are many helper functions available to use, to allow for more complex operations to be undertaken within the string. +Their functionality is detailed in the [helper function API page](../reference/api/helper_functions.md). +Here, we give a brief summary. +Some of these helper functions require a good understanding of their functionality to apply, so make sure you are comfortable with them before using them. + +## inheritance + +using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key. +If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain. + +So, for the definition: + +```yaml +templates: + techgroup1: + template: techgroup2 + flow_cap_max: 10 + techgroup2: + base_tech: supply +techs: + tech1: + template: techgroup1 + tech2: + template: techgroup2 +``` + +`inheritance(techgroup1)` will give the `[tech1]` subset and `inheritance(techgroup2)` will give the `[tech1, tech2]` subset. + +## any + +Parameters are indexed over multiple dimensions. +Using `any(..., over=...)` in a `where` string allows you to check if there is at least one non-NaN value in a given dimension (akin to [xarray.DataArray.any][]). +So, `any(cost, over=[nodes, techs])` will check if there is at least one non-NaN tech+node value in the `costs` dimension (the other dimension that the `cost` decision variable is indexed over). + +## defined + +Similar to [any](syntax.md#any), using `defined(..., within=...)` in a `where` string allows you to check for non-NaN values along dimensions. +In the case of `defined`, you can check if e.g., certain technologies have been defined within the nodes or certain carriers are defined within a group of techs or nodes. + +So, for the definition: + +```yaml +techs: + tech1: + base_tech: conversion + carrier_in: electricity + carrier_out: heat + tech2: + base_tech: conversion + carrier_in: [coal, biofuel] + carrier_out: electricity +nodes: + node1: + techs: {tech1} + node2: + techs: {tech1, tech2} +``` + +`defined(carriers=electricity, within=techs)` would yield a list of `[True, True]` as both technologies define electricity. + +`defined(techs=[tech1, tech2], within=nodes)` would yield a list of `[True, True]` as both nodes define _at least one_ of `tech1` or `tech2`. + +`defined(techs=[tech1, tech2], within=nodes, how=all)` would yield a list of `[False, True]` as only `node2` defines _both_ `tech1` and `tech2`. + +## sum + +Using `sum(..., over=)` in an expression allows you to sum over one or more dimension of your component array (be it a parameter, decision variable, or global expression). + +## select_from_lookup_arrays + +Some of our arrays in [`model.inputs`][calliope.Model.inputs] are not data arrays, but "lookup" arrays. +These arrays are used to map the array's index items to other index items. +For instance when using [time clustering](../advanced/time.md#time-clustering), the `lookup_cluster_last_timestep` array is used to get the timestep resolution and the stored energy for the last timestep in each cluster. +Using `select_from_lookup_arrays(..., dim_name=lookup_array)` allows you to apply this lookup array to your data array. + +## get_val_at_index + +If you want to access an integer index in your dimension, use `get_val_at_index(dim_name=integer_index)`. +For example, `get_val_at_index(timesteps=0)` will get the first timestep in your timeseries, `get_val_at_index(timesteps=-1)` will get the final timestep. +This is mostly used when conditionally applying a different expression in the first / final timestep of the timeseries. + +It can be used in the `where` string (e.g., `timesteps=get_val_at_index(timesteps=0)` to mask all other timesteps) and the `expression string` (via [slices](syntax.md#slices) - `storage[timesteps=$first_timestep]` and `first_timestep` expression being `get_val_at_index(timesteps=0)`). + +## roll + +We do not use for-loops in our math. +This can be difficult to get your head around initially, but it means that to define expressions of the form `var[t] == var[t-1] + param[t]` requires shifting all the data in your component array by N places. +Using `roll(..., dimension_name=N)` allows you to do this. +For example, `roll(storage, timesteps=1)` will shift all the storage decision variable objects by one timestep in the array. +Then, `storage == roll(storage, timesteps=1) + 1` is equivalent to applying `storage[t] == storage[t - 1] + 1` in a for-loop. + +## default_if_empty + +We work with quite sparse arrays in our models. +So, although your arrays are indexed over e.g., `nodes`, `techs` and `carriers`, a decision variable or parameter might only have one or two values in the array, with the rest being NaN. +This can play havoc with defining math, with `nan` values making their way into your optimisation problem and then killing the solver or the solver interface. +Using `default_if_empty(..., default=...)` in your `expression` string allows you to put a placeholder value in, which will be used if the math expression unavoidably _needs_ a value. +Usually you shouldn't need to use this, as your `where` string will mask those NaN values. +But if you're having trouble setting up your math, it is a useful function to getting it over the line. + +!!! note + Our internally defined parameters, listed in the `Parameters` section of our [pre-defined base math documentation][base-math] all have default values which propagate to the math. + You only need to use `default_if_empty` for decision variables and global expressions, and for user-defined parameters. + +## where + +[Where strings](syntax.md#where-strings) only allow you to apply conditions across the whole expression equations. +Sometimes, it's necessary to apply specific conditions to different components _within_ the expression. +Using `where(, )` helper function enables this, +where `` is a reference to a parameter, variable, or global expression and `` is a reference to an array in your model inputs that contains only `True`/`1` and `False`/`0`/`NaN` values. +`` will then be applied to ``, keeping only the values in `` where `` is `True`/`1`. + +This helper function can also be used to _extend_ the dimensions of a ``. +If the `` has any dimensions not present in ``, `` will be [broadcast](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html#broadcasting-adjusting-arrays-to-the-same-shape) to include those dimensions. + +!!! note + `Where` gets referred to a lot in Calliope math. + It always means the same thing: applying [xarray.DataArray.where][]. diff --git a/docs/user_defined_math/syntax.md b/docs/user_defined_math/syntax.md index ef4a2751..cdb1fad6 100644 --- a/docs/user_defined_math/syntax.md +++ b/docs/user_defined_math/syntax.md @@ -37,7 +37,7 @@ When checking the existence of an input parameter it is possible to first sum it - If you want to apply a constraint across all `nodes` and `techs`, but only for node+tech combinations where the `flow_out_eff` parameter has been defined, you would include `flow_out_eff`. - If you want to apply a constraint over `techs` and `timesteps`, but only for combinations where the `source_use_max` parameter has at least one `node` with a value defined, you would include `any(resource, over=nodes)`. (1) - 1. `any` is a [helper function](#helper-functions); read more below! + 1. `any` is a [helper function](helper_functions.md#any); read more below! 1. Checking the value of a configuration option or an input parameter. Checks can use any of the operators: `>`, `<`, `=`, `<=`, `>=`. @@ -50,7 +50,7 @@ Configuration options are any that are defined in `config.build`, where you can - If you want to apply a constraint only for the first timestep in your timeseries, you would include `timesteps=get_val_at_index(dim=timesteps, idx=0)`. (1) - If you want to apply a constraint only for the last timestep in your timeseries, you would include `timesteps=get_val_at_index(dim=timesteps, idx=-1)`. - 1. `get_val_at_index` is a [helper function](#helper-functions); read more below! + 1. `get_val_at_index` is a [helper function](helper_functions.md#get_val_at_index); read more below! 1. Checking the `base_tech` of a technology (`storage`, `supply`, etc.) or its inheritance chain (if using `templates` and the `template` parameter). @@ -58,7 +58,7 @@ Configuration options are any that are defined in `config.build`, where you can - If you want to create a decision variable across only `storage` technologies, you would include `base_tech=storage`. - If you want to apply a constraint across only your own `rooftop_supply` technologies (e.g., you have defined `rooftop_supply` in `templates` and your technologies `pv` and `solar_thermal` define `#!yaml template: rooftop_supply`), you would include `inheritance(rooftop_supply)`. - Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a helper function ([see below](#helper-functions)) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`. + Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a helper function ([see below](helper_functions.md)) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`. 1. Subsetting a set. The sets available to subset are always [`nodes`, `techs`, `carriers`] + any additional sets defined by you in [`foreach`](#foreach-lists). @@ -67,7 +67,7 @@ The sets available to subset are always [`nodes`, `techs`, `carriers`] + any add - If you want to filter `nodes` where any of a set of `techs` are defined: `defined(techs=[tech1, tech2], within=nodes, how=any)` (1). - 1. `defined` is a [helper function](#helper-functions); read more below! + 1. `defined` is a [helper function](helper_functions.md#defined); read more below! To combine statements you can use the operators `and`/`or`. You can also use the `not` operator to negate any of the statements. @@ -109,127 +109,6 @@ Behind the scenes, we will make sure that every relevant element of the defined Slicing math components involves appending the component with square brackets that contain the slices, e.g. `flow_out[carriers=electricity, nodes=[A, B]]` will slice the `flow_out` decision variable to focus on `electricity` in its `carriers` dimension and only has two nodes (`A` and `B`) on its `nodes` dimension. To find out what dimensions you can slice a component on, see your input data (`model.inputs`) for parameters and the definition for decision variables in your math dictionary. -## Helper functions - -For [`where` strings](#where-strings) and [`expression` strings](#where-strings), there are many helper functions available to use, to allow for more complex operations to be undertaken. -Their functionality is detailed in the [helper function API page](../reference/api/helper_functions.md). -Here, we give a brief summary. -Some of these helper functions require a good understanding of their functionality to apply, so make sure you are comfortable with them before using them. - -### inheritance - -using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key. -If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain. - -So, for the definition: - -```yaml -templates: - techgroup1: - template: techgroup2 - flow_cap_max: 10 - techgroup2: - base_tech: supply -techs: - tech1: - template: techgroup1 - tech2: - template: techgroup2 -``` - -`inheritance(techgroup1)` will give the `[tech1]` subset and `inheritance(techgroup2)` will give the `[tech1, tech2]` subset. - -### any - -Parameters are indexed over multiple dimensions. -Using `any(..., over=...)` in a `where` string allows you to check if there is at least one non-NaN value in a given dimension (akin to [xarray.DataArray.any][]). -So, `any(cost, over=[nodes, techs])` will check if there is at least one non-NaN tech+node value in the `costs` dimension (the other dimension that the `cost` decision variable is indexed over). - -### defined - -Similar to [any](#any), using `defined(..., within=...)` in a `where` string allows you to check for non-NaN values along dimensions. -In the case of `defined`, you can check if e.g., certain technologies have been defined within the nodes or certain carriers are defined within a group of techs or nodes. - -So, for the definition: - -```yaml -techs: - tech1: - base_tech: conversion - carrier_in: electricity - carrier_out: heat - tech2: - base_tech: conversion - carrier_in: [coal, biofuel] - carrier_out: electricity -nodes: - node1: - techs: {tech1} - node2: - techs: {tech1, tech2} -``` - -`defined(carriers=electricity, within=techs)` would yield a list of `[True, True]` as both technologies define electricity. - -`defined(techs=[tech1, tech2], within=nodes)` would yield a list of `[True, True]` as both nodes define _at least one_ of `tech1` or `tech2`. - -`defined(techs=[tech1, tech2], within=nodes, how=all)` would yield a list of `[False, True]` as only `node2` defines _both_ `tech1` and `tech2`. - -### sum - -Using `sum(..., over=)` in an expression allows you to sum over one or more dimension of your component array (be it a parameter, decision variable, or global expression). - -### select_from_lookup_arrays - -Some of our arrays in [`model.inputs`][calliope.Model.inputs] are not data arrays, but "lookup" arrays. -These arrays are used to map the array's index items to other index items. -For instance when using [time clustering](../advanced/time.md#time-clustering), the `lookup_cluster_last_timestep` array is used to get the timestep resolution and the stored energy for the last timestep in each cluster. -Using `select_from_lookup_arrays(..., dim_name=lookup_array)` allows you to apply this lookup array to your data array. - -### get_val_at_index - -If you want to access an integer index in your dimension, use `get_val_at_index(dim_name=integer_index)`. -For example, `get_val_at_index(timesteps=0)` will get the first timestep in your timeseries, `get_val_at_index(timesteps=-1)` will get the final timestep. -This is mostly used when conditionally applying a different expression in the first / final timestep of the timeseries. - -It can be used in the `where` string (e.g., `timesteps=get_val_at_index(timesteps=0)` to mask all other timesteps) and the `expression string` (via [slices](#slices) - `storage[timesteps=$first_timestep]` and `first_timestep` expression being `get_val_at_index(timesteps=0)`). - -### roll - -We do not use for-loops in our math. -This can be difficult to get your head around initially, but it means that to define expressions of the form `var[t] == var[t-1] + param[t]` requires shifting all the data in your component array by N places. -Using `roll(..., dimension_name=N)` allows you to do this. -For example, `roll(storage, timesteps=1)` will shift all the storage decision variable objects by one timestep in the array. -Then, `storage == roll(storage, timesteps=1) + 1` is equivalent to applying `storage[t] == storage[t - 1] + 1` in a for-loop. - -### default_if_empty - -We work with quite sparse arrays in our models. -So, although your arrays are indexed over e.g., `nodes`, `techs` and `carriers`, a decision variable or parameter might only have one or two values in the array, with the rest being NaN. -This can play havoc with defining math, with `nan` values making their way into your optimisation problem and then killing the solver or the solver interface. -Using `default_if_empty(..., default=...)` in your `expression` string allows you to put a placeholder value in, which will be used if the math expression unavoidably _needs_ a value. -Usually you shouldn't need to use this, as your `where` string will mask those NaN values. -But if you're having trouble setting up your math, it is a useful function to getting it over the line. - -!!! note - Our internally defined parameters, listed in the `Parameters` section of our [pre-defined base math documentation][base-math] all have default values which propagate to the math. - You only need to use `default_if_empty` for decision variables and global expressions, and for user-defined parameters. - -### where - -[Where strings](#where-strings) only allow you to apply conditions across the whole expression equations. -Sometimes, it's necessary to apply specific conditions to different components _within_ the expression. -Using `where(, )` helper function enables this, -where `` is a reference to a parameter, variable, or global expression and `` is a reference to an array in your model inputs that contains only `True`/`1` and `False`/`0`/`NaN` values. -`` will then be applied to ``, keeping only the values in `` where `` is `True`/`1`. - -This helper function can also be used to _extend_ the dimensions of a ``. -If the ```` has any dimensions not present in ``, `` will be [broadcast](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html#broadcasting-adjusting-arrays-to-the-same-shape) to include those dimensions. - -!!! note - `Where` gets referred to a lot in Calliope math. - It always means the same thing: applying [xarray.DataArray.where][]. - ## equations Equations are combinations of [expression strings](#expression-strings) and [where strings](#where-strings). diff --git a/mkdocs.yml b/mkdocs.yml index fa677f27..6b41e180 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -117,6 +117,7 @@ nav: - user_defined_math/index.md - user_defined_math/components.md - user_defined_math/syntax.md + - user_defined_math/helper_functions.md - user_defined_math/customise.md - Example additional math gallery: - user_defined_math/examples/index.md diff --git a/src/calliope/backend/helper_functions.py b/src/calliope/backend/helper_functions.py index dc024796..f8cef607 100644 --- a/src/calliope/backend/helper_functions.py +++ b/src/calliope/backend/helper_functions.py @@ -758,58 +758,50 @@ class Where(ParsingHelperFunction): #: ALLOWED_IN = ["expression"] - def as_math_string(self, array: str, where_array: str) -> str: # noqa: D102, override - return rf"({array} \text{{if }} {where_array} == True)" + def as_math_string(self, array: str, condition: str) -> str: # noqa: D102, override + return rf"({array} \text{{if }} {condition} == True)" - def as_array(self, array: xr.DataArray, where_array: xr.DataArray) -> xr.DataArray: - """Apply a `where` array to a math array within an expression string. + def as_array(self, array: xr.DataArray, condition: xr.DataArray) -> xr.DataArray: + """Apply a `where` condition to a math array within an expression string. Args: array (xr.DataArray): Math component array. - where_array (xr.DataArray): + condition (xr.DataArray): Boolean where array. If not `bool` type, NaNs and 0 will be assumed as False and all other values will be assumed as True. Returns: xr.DataArray: - Returns the input array with the where array applied. + Returns the input array with the condition applied, + including having been broadcast across any new dimensions provided by the condition. Examples: - input: + One common use-case is to introduce a new dimension to the variable which represents subsets of one of the main model dimensions. + In this case, each member of `cap_node_groups` is a subset of `nodes` and we want to sum `flow_cap` over each of those subsets and set a maximum value. + input: ```yaml parameters: node_grouping: data: True index: [[group_1, region1], [group_1, region1_1], [group_2, region1_2], [group_2, region1_3], [group_3, region2]] dims: [cap_node_groups, nodes] + node_group_max: + data: [1, 2, 3] + index: [group_1, group_2, group_3] + dims: cap_node_groups ``` - ``` - >>> flow_cap_max - [out] Size: 320B - array([[ nan, 30000., nan, nan, nan, nan, nan, 10000.], - [ nan, nan, 10000., nan, nan, nan, nan, nan], - [ nan, nan, 10000., nan, nan, nan, nan, nan], - [ nan, nan, 10000., nan, nan, nan, nan, nan], - [ 1000., nan, nan, nan, nan, nan, nan, 10000.]]) - Coordinates: - * nodes (nodes) object 40B 'region1' 'region1_1' ... 'region1_3' 'region2' - * techs (techs) object 64B 'battery' 'ccgt' ... 'region1_to_region2' - >>> where(flow_cap_max, node_grouping) - [out] - array([[[ nan, nan, nan], - [30000., nan, nan], - ... - [ nan, nan, nan], - [ nan, nan, 10000.]]]) - Coordinates: - * nodes (nodes) object 40B 'region1' 'region1_1' ... 'region2' - * techs (techs) object 64B 'battery' ... 'region1_to_region2' - * cap_node_groups (cap_node_groups) object 24B 'group_1' 'group_2' 'group_3' + math: + ```yaml + constraints: + my_new_constraint: + foreach: [techs, cap_node_groups] + equations: + - expression: sum(where(flow_cap, node_grouping), over=nodes) <= node_group_max ``` """ if self._backend_interface is not None: - where_array = self._input_data[where_array.name] + condition = self._input_data[condition.name] - return array.where(where_array.fillna(False).astype(bool)) + return array.where(condition.fillna(False).astype(bool))