Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a problem where linked selections were resulting in repeated columns #6336

Merged
merged 3 commits into from
Oct 16, 2024

Conversation

grapesmoker
Copy link
Contributor

@grapesmoker grapesmoker commented Jul 22, 2024

While working with link_selections I ran into a problem where trying to link two plots that came from the same dataset would result in an error because at some point during the creation of the selection expression the columns would get duplicated. Here's an MRE that illustrates this problem:

import holoviews as hv
from holoviews.selection import link_selections
import numpy as np
import pandas as pd
import panel as pn

hv.extension('bokeh')
pn.extension()

# some fake data
data = np.random.default_rng(seed=42).normal(size=(100, 3))
idx = np.arange(100)
cols = ['x', 'y', 'z']
df = pd.DataFrame(data, columns=cols)
df['id'] = idx

# want to link two plots across the `id` column
ls = link_selections.instance(index_cols=['id'])

img1 = hv.Points(df, kdims=['x', 'y'], vdims=['id']).opts(tools=['box_select'])
img2 = hv.Points(df, kdims=['x', 'z'], vdims=['id']).opts(tools=['box_select'])

pn.Row(ls(img1), ls(img2)).servable()

This results in the following slightly truncated error:

File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/element/selection.py", line 334, in _get_selection_expr_for_stream_value
    expr, _, _ = self._get_index_selection(kwargs['index'], index_cols)
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/element/selection.py", line 39, in _get_index_selection
    vals = dim(index_dim).apply(ds.iloc[index], expanded=False)
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/core/data/interface.py", line 33, in __getitem__
    res = self._perform_getitem(self.dataset, index)
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/core/data/interface.py", line 98, in _perform_getitem
    return dataset.clone(data, kdims=kdims, vdims=vdims, datatype=datatype)
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/core/data/__init__.py", line 1203, in clone
    return super().clone(data, shared_data, new_type, *args, **overrides)
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/core/dimension.py", line 561, in clone
    return clone_type(data, *args, **{k:v for k,v in settings.items()
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/core/data/__init__.py", line 329, in __init__
    initialized = Interface.initialize(type(self), data, kdims, vdims,
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/core/data/interface.py", line 253, in initialize
    (data, dims, extra_kws) = interface.init(eltype, data, kdims, vdims)
  File "/Users/jerry/Development/project/venv/lib/python3.10/site-packages/holoviews/core/data/pandas.py", line 83, in init
    raise DataError('Dimensions may not reference duplicated DataFrame '
holoviews.core.data.interface.DataError: Dimensions may not reference duplicated DataFrame columns (found duplicate 'id' columns). If you want to plot a column against itself simply declare two dimensions with the same name.

I traced this problem down into the _get_index_selection function, where the dataset is cloned on line 35:

ds = self.clone(kdims=index_cols, new_type=Dataset)

Because cloning will take all the vdims if that parameter is left unspecified, if there is overlap between kdims and vdims, the selection on line 38 will throw an error because the columns are duplicated in the resulting dataframe:

vals = dim(index_dim).apply(ds.iloc[index], expanded=False)

This PR makes the slight change that vdims are explicitly set to be cloned only if they do not overlap with the kdims. Then the selection is made on both rows and columns to ensure that each column appears only once.

This fixes the problem I observed in the MRE and doesn't seem to break anything else. It's my first attempt to contribute to HoloViews so I would appreciate any pointers on how to properly test this or anything else it might affect.

@grapesmoker
Copy link
Contributor Author

Just curious if anyone has any thoughts on this PR, even a hint about whether this is a sensible thing to attempt or not.

@ahuang11 ahuang11 added the type: bug Something isn't correct or isn't working label Jul 26, 2024
@ahuang11
Copy link
Collaborator

Thanks for fixing this! I'm not terribly familiar with this part of the code base so I can't comment on whether it's the right way or not, but I think other maintainers would appreciate it if you could add a UI test (something like https://github.com/holoviz/holoviews/blob/main/holoviews/tests/ui/bokeh/test_callback.py#L351-L378 basically copying your minimal example, and adding a few assert statements) and perhaps a short inline comment about what you're doing.

Test failures seem unrelated.

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.50%. Comparing base (a28189d) to head (fd09e2f).
Report is 32 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6336      +/-   ##
==========================================
+ Coverage   88.48%   88.50%   +0.01%     
==========================================
  Files         323      323              
  Lines       68162    68598     +436     
==========================================
+ Hits        60313    60712     +399     
- Misses       7849     7886      +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@philippjfr
Copy link
Member

This definitely does not need a UI test, a simple unit test would suffice. UI tests are for functionality that lives on the frontend or for integration tests ensuring that all the pieces work together.

@philippjfr
Copy link
Member

I haven't been able to reproduce the actual issue here.

@philippjfr
Copy link
Member

Sorry, was able to reproduce and added a test.

@philippjfr philippjfr merged commit ee7a485 into holoviz:main Oct 16, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't correct or isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants