Skip to content

Commit

Permalink
Merge pull request #783 from PowerGridModel/docs/columnar-data-termin…
Browse files Browse the repository at this point in the history
…ology

Columnar data documentation (Terminology)
  • Loading branch information
mgovers authored Oct 18, 2024
2 parents 33c184f + 211b219 commit eed5887
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 17 deletions.
4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,9 @@
# label references for depth of headers: label name in anchor slug structure
myst_heading_anchors = 4
# execute jupter notebooks output before building webpage
jupyter_execute_notebooks = "off"
nb_execution_mode = "off"
nb_execution_excludepatterns = ["*/_build/*"]

# Extentions in myst
myst_enable_extensions = [
"dollarmath",
Expand Down
114 changes: 98 additions & 16 deletions docs/user_manual/dataset-terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,23 +10,105 @@ Some terms regarding the data structures are explained here, including the defin

## Data structures

- **Dataset:** Either a single or a batch dataset.
- **SingleDataset:** A data type storing input data (i.e. all elements of all components) for a single scenario.
- **BatchDataset:** A data type storing update and or output data for one or more scenarios. A batch dataset can contain sparse or dense data, depending on the component.
- **DataArray** A data array can be a single or a batch array. It is a numpy structured array.
- **SingleArray** A dictionary where the keys are the component types and the values are one-dimensional structured numpy arrays.
- **BatchArray:** An array of dictionaries where the keys are the component types and the values are two-dimensional structured numpy arrays.
- **DenseBatchArray:** A two-dimensional structured numpy array containing a list of components of the same type for each scenario.
- **SparseBatchArray:** A dictionary with a one-dimensional numpy int64 array and a one-dimensional structured numpy arrays.

### Type of Dataset

The types of `Dataset` include the following: `input`, `update`, `sym_output`, `asym_output`, and `sc_output`:
Exemplery datasets attributes are given in a dataset containing a `line` component.

- **input:** Contains attributes relevant to configuration of grid.
```{mermaid}
graph TD
subgraph Other numpy arrays
IndexPointer
SingleColumn
BatchColumn
end
subgraph Datasets
Dataset --> SingleDataset
Dataset --> BatchDataset
end
click Dataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.Dataset"
click SingleDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleDataset"
click BatchDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchDataset"
click IndexPointer href "../api_reference/python-api-reference.html#power_grid_model.data_types.IndexPointer"
click SingleColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumn"
click BatchColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumn"
```

```{mermaid}
graph TD
subgraph Dataset values
ComponentData --> DataArray
ComponentData --> ColumnarData
DataArray --> SingleArray
DataArray --> BatchArray
BatchArray --> DenseBatchArray
BatchArray --> SparseBatchArray
ColumnarData --> SingleColumnarData
ColumnarData --> BatchColumnarData
BatchColumnarData --> DenseBatchColumnarData
BatchColumnarData --> SparseBatchColumnarData
end
click ComponentData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ComponentData"
click DataArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DataArray"
click ColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ColumnarData"
click SingleArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleArray"
click BatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchArray"
click DenseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchArray"
click SparseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchArray"
click SingleColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumnarData"
click BatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumnarData"
click DenseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchColumnarData"
click SparseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchColumnarData"
```

- **{py:class}`Dataset <power_grid_model.data_types.Dataset>`:** Either a single or a batch dataset. it is a dictionary with keys as the component types (eg. `line`, `node`, etc) and values as **ComponentData**
- **{py:class}`SingleDataset <power_grid_model.data_types.SingleDataset>`:** A data type storing input data (i.e. all elements of all components) for a single scenario.
- **{py:class}`BatchDataset <power_grid_model.data_types.BatchDataset>`:** A data type storing update and or output data for one or more scenarios. A batch dataset can contain sparse or dense data, depending on the component.

- **{py:class}`ComponentData <power_grid_model.data_types.ComponentData>`:** The data corresponding to the component.
- **{py:class}`DataArray <power_grid_model.data_types.DataArray>`:** A data array can be a single or a batch array. It is a numpy structured array.
- **{py:class}`SingleArray <power_grid_model.data_types.SingleArray>`:** A 1D numpy structured array corresponding to a single dataset.
- **{py:class}`BatchArray <power_grid_model.data_types.BatchArray>`:** Multiple batches of data can be represented in sparse or dense forms.
- **{py:class}`DenseBatchArray <power_grid_model.data_types.DenseBatchArray>`:** A 2D structured numpy array containing a list of components of the same type for each scenario.
- **{py:class}`SparseBatchArray <power_grid_model.data_types.SparseBatchArray>`:** A typed dictionary with a 1D numpy array of `Indexpointer` type under `indptr` key and `SingleArray` under `data` key which is all components flattened over all batches.
- **{py:class}`ColumnarData <power_grid_model.data_types.ColumnarData>`:** A dictionary of attributes as keys and individual numpy arrays as values.
- **{py:class}`SingleColumnarData <power_grid_model.data_types.SingleColumnarData>`:** A dictionary of attributes as keys and `SingleColumn` as values in a single dataset.
- **{py:class}`BatchColumnarData <power_grid_model.data_types.BatchColumnarData>`:** Multiple batches of data can be represented in sparse or dense forms.
- **{py:class}`DenseBatchColumnarData <power_grid_model.data_types.DenseBatchColumnarData>`:** A dictionary of attributes as keys and 2D/3D numpy array of `BatchColumn` type as values in a single dataset.
- **{py:class}`SparseBatchColumnarData <power_grid_model.data_types.SparseBatchColumnarData>`:** A typed dictionary with a 1D numpy array of `Indexpointer` type under `indptr` key and `SingleColumn` under `data` which is all components flattened over all batches.

- **{py:class}`IndexPointer <power_grid_model.data_types.IndexPointer>`:** A 1D numpy array of int64 type used to specify sparse batches. It indicates the range of components within a scenario. For example, an Index pointer of [0, 1, 3, 3] indicates 4 batches with element indexed with 0 in 1st batch, [1, 2, 3] in 2nd batch and no elements in 3rd batch.
- **{py:class}`SingleColumn <power_grid_model.data_types.SingleColumn>`:** A 1D/2D numpy array of values corresponding to a specific attribute.
- **{py:class}`BatchColumn <power_grid_model.data_types.BatchColumn>`:** A 2D/3D numpy array of values corresponding to a specific attribute.

### Dimensions of numpy arrays

The dimensions of numpy arrays and the interpretation of each dimension is as follows.

| **Data Type** | **1D** |**2D** | **3D** |
|--------------------------|-----------------------------------|-------------------------------------------------------|-------------------------------------------------------------------------------|
| **SingleArray** | Corresponds to a single dataset. | &#10060; | &#10060; |
| **DenseBatchArray** | &#10060; | Batch number $\times$ Component within that batch | &#10060; |
| **SingleColumn** | Component within that batch. | Component within that batch $\times$ Phases &#10024; | &#10060; |
| **BatchColumn** | &#10060; | Batch number $\times$ Component within that batch | Batch number $\times$ Component within that batch $\times$ Phases &#10024; |

```{note}
&#10024; The "Phases" dimension is optional and is available only when the attributes are asymmetric.
```

### Type of Dataset

The types of `Dataset` include the following: `input`, `update`, `sym_output`, `asym_output`, and `sc_output`. They are included under the enum {py:class}`DatasetType <power_grid_model.typing.DatasetType>`.
Exemplary datasets attributes are given in a dataset containing a `line` component.

- **input:** Contains attributes relevant to configuration of grid.
- Example: `id`, `from_node`, `from_status`
- **update:** Contains attributes relevant to multiple scenarios.
- **update:** Contains attributes relevant to multiple scenarios.
- Example: `from_status`,`to_status`
- **sym_output:** Contains attributes relevant to symmetrical steady state output of power flow or state estimation calculation.
- Example: `p_from`, `p_to`
Expand Down

0 comments on commit eed5887

Please sign in to comment.