Skip to content

Commit

Permalink
[Feat] Enable dynamic filter (#879)
Browse files Browse the repository at this point in the history
Co-authored-by: Antony Milne <[email protected]>
  • Loading branch information
petar-qb and antonymilne authored Dec 2, 2024
1 parent 3db95b9 commit dfafd88
Show file tree
Hide file tree
Showing 29 changed files with 1,199 additions and 331 deletions.
2 changes: 1 addition & 1 deletion vizro-ai/changelog.d/new_fragment.md.j2
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Uncomment the section that is right (remove the HTML comment wrapper).
<!--
### {{ cat }}
- A bullet item for the {{ cat }} category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX ([#1](https://github.com/mckinsey/vizro/pull/1))
- A bullet item for the {{ cat }} category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX. ([#1](https://github.com/mckinsey/vizro/pull/1))
-->
{% endfor -%}
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<!--
A new scriv changelog fragment.
Uncomment the section that is right (remove the HTML comment wrapper).
-->

### Highlights ✨

- Filters update automatically when underlying dynamic data changes. See the [user guide on dynamic filters](https://vizro.readthedocs.io/en/stable/pages/user-guides/data/#filters) for more information. ([#879](https://github.com/mckinsey/vizro/pull/879))

<!--
### Removed
- A bullet item for the Removed category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX. ([#1](https://github.com/mckinsey/vizro/pull/1))
-->
<!--
### Added
- A bullet item for the Added category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX. ([#1](https://github.com/mckinsey/vizro/pull/1))
-->
<!--
### Changed
- A bullet item for the Changed category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX. ([#1](https://github.com/mckinsey/vizro/pull/1))
-->
<!--
### Deprecated
- A bullet item for the Deprecated category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX. ([#1](https://github.com/mckinsey/vizro/pull/1))
-->
<!--
### Fixed
- A bullet item for the Fixed category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX. ([#1](https://github.com/mckinsey/vizro/pull/1))
-->
<!--
### Security
- A bullet item for the Security category with a link to the relevant PR at the end of your entry, e.g. Enable feature XXX. ([#1](https://github.com/mckinsey/vizro/pull/1))
-->
94 changes: 83 additions & 11 deletions vizro-core/docs/pages/user-guides/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ Since dynamic data sources must always be added to the data manager and referenc

### Configure cache

By default, each time the dashboard is refreshed a dynamic data function executes again. In fact, if there are multiple graphs on the same page using the same dynamic data source then the loading function executes _multiple_ times, once for each graph on the page. Hence, if loading your data is a slow operation, your dashboard performance may suffer.
By default, a dynamic data function executes every time the dashboard is refreshed. Data loading is batched so that a dynamic data function that supplies multiple graphs on the same page only executes _once_ per page refresh. Even with this batching, if loading your data is a slow operation, your dashboard performance may suffer.

The Vizro data manager has a server-side caching mechanism to help solve this. Vizro's cache uses [Flask-Caching](https://flask-caching.readthedocs.io/en/latest/), which supports a number of possible cache backends and [configuration options](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching). By default, the cache is turned off.

Expand Down Expand Up @@ -220,7 +220,7 @@ By default, when caching is turned on, dynamic data is cached in the data manage

If you would like to alter some options, such as the default cache timeout, then you can specify a different cache configuration:

```py title="Simple cache with timeout set to 10 minutes"
```python title="Simple cache with timeout set to 10 minutes"
data_manager.cache = Cache(config={"CACHE_TYPE": "SimpleCache", "CACHE_DEFAULT_TIMEOUT": 600})
```

Expand Down Expand Up @@ -268,16 +268,20 @@ data_manager["no_expire_data"].timeout = 0

### Parametrize data loading

You can supply arguments to your dynamic data loading function that can be modified from the dashboard.
For example, if you are handling big data then you can use an argument to specify the number of entries or size of chunk of data.
You can give arguments to your dynamic data loading function that can be modified from the dashboard. For example:

- To load different versions of the same data.
- To handle large datasets you can use an argument that controls the amount of data that is loaded. This effectively pre-filters data before it reaches the Vizro dashboard.

In general, a parametrized dynamic data source should always return a pandas DataFrame with a fixed schema (column names and types). This ensures that page components and controls continue to work as expected when the parameter is changed on screen.

To add a parameter to control a dynamic data source, do the following:

1. add the appropriate argument to your dynamic data function and specify a default value for the argument.
2. give an `id` to all components that have the data source you wish to alter through a parameter.
3. [add a parameter](parameters.md) with `targets` of the form `<target_component_id>.data_frame.<dynamic_data_argument>` and a suitable [selector](selectors.md).

For example, let us extend the [dynamic data example](#dynamic-data) above to show how the `load_iris_data` can take an argument `number_of_points` controlled from the dashboard with a [`Slider`][vizro.models.Slider].
For example, let us extend the [dynamic data example](#dynamic-data) above into an example of how parametrized dynamic data works. The `load_iris_data` can take an argument `number_of_points` controlled from the dashboard with a [`Slider`][vizro.models.Slider].

!!! example "Parametrized dynamic data"
=== "app.py"
Expand Down Expand Up @@ -333,14 +337,82 @@ Parametrized data loading is compatible with [caching](#configure-cache). The ca

You cannot pass [nested parameters](parameters.md#nested-parameters) to dynamic data. You can only target the top-level arguments of the data loading function, not the nested keys in a dictionary.

### Filter update limitation
### Filters

When a [filter](filters.md) depends on dynamic data and no `selector` is explicitly defined in the `vm.Filter` model, the available selector values update on page refresh to reflect the latest dynamic data. This is called a _dynamic filter_.

The mechanism behind updating dynamic filters works exactly like other non-control components such as `vm.Graph`. However, unlike such components, a filter can depend on multiple data sources. If at least one data source of the components in the filter's `targets` is dynamic then the filter is dynamic. Remember that when `targets` is not explicitly specified, a filter applies to all the components on a page that use a DataFrame including `column`.

When the page is refreshed, the behavior of a dynamic filter is as follows:

- The filter's selector updates its available values:
- For [categorical selectors](selectors.md#categorical-selectors), `options` updates to give all unique values found in `column` across all the data sources of components in `targets`.
- For [numerical selectors](selectors.md#numerical-selectors), `min` and `max` update to give the overall minimum and maximum values found in `column` across all the data sources of components in `targets`.
- The value selected on screen by a dashboard user _does not_ change. If the selected value is not already present in the new set of available values then the `options` or `min` and `max` are modified to include it. In this case, the filtering operation might result in an empty DataFrame.
- Even though the values present in a data source can change, the schema should not: `column` should remain present and of the same type in the data sources. The `targets` of the filter and selector type cannot change while the dashboard is running. For example, a `vm.Dropdown` selector cannot turn into `vm.RadioItems`.

For example, let us add two filters to the [dynamic data example](#dynamic-data) above:

!!! example "Dynamic filters"

```py hl_lines="10 20 21"
from vizro import Vizro
import pandas as pd
import vizro.plotly.express as px
import vizro.models as vm

If your dashboard includes a [filter](filters.md) then the values shown on a filter's [selector](selectors.md) _do not_ update while the dashboard is running. This is a known limitation that will be lifted in future releases, but if is problematic for you already then [raise an issue on our GitHub repo](https://github.com/mckinsey/vizro/issues/).
from vizro.managers import data_manager

This limitation is why all arguments of your dynamic data loading function must have a default value. Regardless of the value of the `vm.Parameter` selected in the dashboard, these default parameter values are used when the `vm.Filter` is built. This determines the type of selector used in a filter and the options shown, which cannot currently be changed while the dashboard is running.
def load_iris_data():
iris = pd.read_csv("iris.csv")
return iris.sample(5) # (1)!

Although a selector is automatically chosen for you in a filter when your dashboard is built, remember that [you can change this choice](filters.md#changing-selectors). For example, we could ensure that a dropdown always contains the options "setosa", "versicolor" and "virginica" by explicitly specifying your filter as follows.
data_manager["iris"] = load_iris_data

```py
vm.Filter(column="species", selector=vm.Dropdown(options=["setosa", "versicolor", "virginica"])
page = vm.Page(
title="Update the chart and filters on page refresh",
components=[
vm.Graph(figure=px.box("iris", x="species", y="petal_width", color="species"))
],
controls=[
vm.Filter(column="species"), # (2)!
vm.Filter(column="sepal_length"), # (3)!
],
)

dashboard = vm.Dashboard(pages=[page])

Vizro().build(dashboard).run()
```

1. We sample only 5 rather than 50 points so that changes to the available values in the filtered columns are more apparent when the page is refreshed.
2. This filter implicitly controls the dynamic data source `"iris"`, which supplies the `data_frame` to the targeted `vm.Graph`. On page refresh, Vizro reloads this data, finds all the unique values in the `"species"` column and sets the categorical selector's `options` accordingly.
3. Similarly, on page refresh, Vizro finds the minimum and maximum values of the `"sepal_length"` column in the reloaded data and sets new `min` and `max` values for the numerical selector accordingly.

Consider a filter that depends on dynamic data, where you do **not** want the available values to change when the dynamic data changes. You should manually specify the `selector`'s `options` field (categorical selector) or `min` and `max` fields (numerical selector). In the above example, this could be achieved as follows:

```python title="Override selector options to make a dynamic filter static"
controls = [
vm.Filter(column="species", selector=vm.Dropdown(options=["setosa", "versicolor", "virginica"])),
vm.Filter(column="sepal_length", selector=vm.RangeSlider(min=4.3, max=7.9)),
]
```

If you [use a specific selector](filters.md#change-selector) for a dynamic filter without manually specifying `options` (categorical selector) or `min` and `max` (numerical selector) then the selector remains dynamic. For example:

```python title="Dynamic filter with specific selector is still dynamic"
controls = [
vm.Filter(column="species", selector=vm.Checklist()),
vm.Filter(column="sepal_length", selector=vm.Slider()),
]
```

When Vizro initially builds a filter that depends on parametrized dynamic data loading, data is loaded using the default argument values. This data is used to:

* perform initial validation
* check which data sources contain the specified `column` (unless `targets` is explicitly specified) and
* determine the type of selector to use (unless `selector` is explicitly specified).

!!! note

When the value of a dynamic data parameter is changed by a dashboard user, the data underlying a dynamic filter can change. Currently this change affects page components such as `vm.Graph` but does not affect the available values shown in a dynamic filter, which only update on page refresh. This functionality will be coming soon!
90 changes: 81 additions & 9 deletions vizro-core/docs/pages/user-guides/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
This guide shows you how to add filters to your dashboard. One main way to interact with the charts/components on your page is by filtering the underlying data. A filter selects a subset of rows of a component's underlying DataFrame which alters the appearance of that component on the page.

The [`Page`][vizro.models.Page] model accepts the `controls` argument, where you can enter a [`Filter`][vizro.models.Filter] model.
This model enables the automatic creation of [selectors](../user-guides/selectors.md) (such as Dropdown, RadioItems, Slider, ...) that operate upon the charts/components on the screen.
This model enables the automatic creation of [selectors](selectors.md) (for example, `Dropdown` or `RangeSlider`) that operate on the charts/components on the screen.

By default, filters that control components with [dynamic data](data.md#dynamic-data) are [dynamically updated](data.md#filters) when the underlying data changes while the dashboard is running.

## Basic filters

Expand All @@ -13,8 +14,7 @@ To add a filter to your page, do the following:
1. add the [`Filter`][vizro.models.Filter] model into the `controls` argument of the [`Page`][vizro.models.Page] model
2. configure the `column` argument, which denotes the target column to be filtered

By default, all components on a page with such a `column` present will be filtered. The selector type will be chosen
automatically based on the target column, for example, a dropdown for categorical data, a range slider for numerical data, or a date picker for temporal data.
You can also set `targets` to specify which components on the page should be affected by the filter. If this is not explicitly set then `targets` defaults to all components on the page whose data source includes `column`.

!!! example "Basic Filter"
=== "app.py"
Expand Down Expand Up @@ -63,12 +63,83 @@ automatically based on the target column, for example, a dropdown for categorica

[Filter]: ../../assets/user_guides/control/control1.png

## Changing selectors
The selector is configured automatically based on the target column type data as follows:

- Categorical data uses [`vm.Dropdown(multi=True)`][vizro.models.Dropdown] where `options` is the set of unique values found in `column` across all the data sources of components in `targets`.
- [Numerical data](https://pandas.pydata.org/docs/reference/api/pandas.api.types.is_numeric_dtype.html) uses [`vm.RangeSlider`][vizro.models.RangeSlider] where `min` and `max` are the overall minimum and maximum values found in `column` across all the data sources of components in `targets`.
- [Temporal data](https://pandas.pydata.org/docs/reference/api/pandas.api.types.is_datetime64_any_dtype.html) uses [`vm.DatePicker(range=True)`][vizro.models.DatePicker] where `min` and `max` are the overall minimum and maximum values found in `column` across all the data sources of components in `targets`. A column can be converted to this type with [pandas.to_datetime](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html).

The following example demonstrates these default selector types.

!!! example "Default Filter selectors"
=== "app.py"
```{.python pycafe-link}
import pandas as pd
from vizro import Vizro
import vizro.plotly.express as px
import vizro.models as vm

df_stocks = px.data.stocks(datetimes=True)

df_stocks_long = pd.melt(
df_stocks,
id_vars='date',
value_vars=['GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'],
var_name='stocks',
value_name='value'
)

df_stocks_long['value'] = df_stocks_long['value'].round(3)

page = vm.Page(
title="My first page",
components=[
vm.Graph(figure=px.line(df_stocks_long, x="date", y="value", color="stocks")),
],
controls=[
vm.Filter(column="stocks"),
vm.Filter(column="value"),
vm.Filter(column="date"),
],
)

dashboard = vm.Dashboard(pages=[page])

Vizro().build(dashboard).run()
```
=== "app.yaml"
```yaml
# Still requires a .py to add data to the data manager and parse YAML configuration
# See yaml_version example
pages:
- components:
- figure:
_target_: line
data_frame: df_stocks_long
x: date
y: value
color: stocks
type: graph
controls:
- column: stocks
type: filter
- column: value
type: filter
- column: date
type: filter
title: My first page
```
=== "Result"
[![Filter]][Filter]

[Filter]: ../../assets/user_guides/selectors/default_filter_selectors.png

## Change selector

If you want to have a different selector for your filter, you can give the `selector` argument of the [`Filter`][vizro.models.Filter] a different selector model.
Currently available selectors are [`Checklist`][vizro.models.Checklist], [`Dropdown`][vizro.models.Dropdown], [`RadioItems`][vizro.models.RadioItems], [`RangeSlider`][vizro.models.RangeSlider], [`Slider`][vizro.models.Slider], and [`DatePicker`][vizro.models.DatePicker].

!!! example "Filter with custom Selector"
!!! example "Filter with different selector"
=== "app.py"
```{.python pycafe-link}
from vizro import Vizro
Expand Down Expand Up @@ -118,11 +189,10 @@ Currently available selectors are [`Checklist`][vizro.models.Checklist], [`Dropd

## Further customization

For further customizations, you can always refer to the [`Filter`][vizro.models.Filter] reference. Some popular choices are:
For further customizations, you can always refer to the [`Filter` model][vizro.models.Filter] reference and the [guide to selectors](selectors.md). Some popular choices are:

- select which component the filter will apply to by using `targets`
- select what the target column type is, hence choosing the default selector by using `column_type`
- choose options of lower level components, such as the `selector` models
- specify configuration of the `selector`, for example `multi` to switch between a multi-option and single-option selector, `options` for a categorical filter or `min` and `max` for a numerical filter

Below is an advanced example where we only target one page component, and where we further customize the chosen `selector`.

Expand All @@ -142,7 +212,7 @@ Below is an advanced example where we only target one page component, and where
vm.Graph(figure=px.scatter(iris, x="petal_length", y="sepal_width", color="species")),
],
controls=[
vm.Filter(column="petal_length",targets=["scatter_chart"],selector=vm.RangeSlider(step=1)),
vm.Filter(column="petal_length",targets=["scatter_chart"], selector=vm.RangeSlider(step=1)),
],
)

Expand Down Expand Up @@ -186,3 +256,5 @@ Below is an advanced example where we only target one page component, and where
[![Advanced]][Advanced]

[Advanced]: ../../assets/user_guides/control/control3.png

To further customize selectors, see our [how-to-guide on creating custom components](custom-components.md).
Loading

0 comments on commit dfafd88

Please sign in to comment.