Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] KPI Infinite scroll AgGrid #519

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

petar-qb
Copy link
Contributor

@petar-qb petar-qb commented Jun 7, 2024

Description

FYI:

  • Dash AgGrid pagination stores all data on the client side. Therefore, the recommended implementation for big data is to use infinite row model. The downsides of this approach are that the sorting and filtering interface options must be set to False (disabled) (because only a small part of the source data is visible on the client side).

Additionally:

  • For the further optimisation of the initial page loading, we should:
    • or reimplement the Page.pre_build() so it doesn't trigger the on_page_load mechanism if there are only tables and grids and without any controls on the page,
    • or we should reimplement the AgGrid/Table to initially return an empty pd.DataFrame.
  • The new custom ag-grid (infinite_scroll_ag_grid) implementation is introduced since the predefined one (dash_ag_grid) doesn't allow to skip the following code execution which significantly slows down the page loading time:
...
"rowData": data_frame.apply(
            lambda x: (
                x.dt.strftime("%Y-%m-%d")  # set date columns to `dateString` for AG Grid filtering to function
                if pd.api.types.is_datetime64_any_dtype(x)
                else x
            )
        ).to_dict("records"),
...

Open Question:

  • Should we make the infinite_scroll_dash_ag_grid public?

Screenshot

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

    • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
    • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorized to submit this contribution on behalf of the original creator(s) or their licensees.
    • I certify that the use of this contribution as authorized by the Apache 2.0 license does not violate the intellectual property rights of anyone else.
    • I have not referenced individuals, products or companies in any commits, directly or indirectly.
    • I have not added data or restricted code in any commits, directly or indirectly.

@petar-qb petar-qb temporarily deployed to circleci_secrets June 7, 2024 08:58 — with GitHub Actions Inactive
@petar-qb petar-qb temporarily deployed to circleci_secrets June 7, 2024 09:00 — with GitHub Actions Inactive
Copy link
Contributor

@huong-li-nguyen huong-li-nguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @petar-qb - thanks for the quick investigation! 🚀

I would like to keep this PR open as a draft for discussion when @antonymilne is back, but I wouldn't like to merge it into the kpi dashboard if this can only be implemented when we turn off filtering and sorting.

The page with the ag-grid doesn't have any controls on purpose, as it looks nicer if the table takes up all space and people don't have to switch between the side panel and main panel. But that's why we do need the filtering and sorting functionality on that table. I would prefer the slower speed over not having filtering/sorting enabled in this case.

@petar-qb petar-qb marked this pull request as draft June 7, 2024 09:13
@petar-qb petar-qb self-assigned this Jun 7, 2024
@maxschulz-COL
Copy link
Contributor

Hey @petar-qb - thanks for the quick investigation! 🚀

I would like to keep this PR open as a draft for discussion when @antonymilne is back, but I wouldn't like to merge it into the kpi dashboard if this can only be implemented when we turn off filtering and sorting.

The page with the ag-grid doesn't have any controls on purpose, as it looks nicer if the table takes up all space and people don't have to switch between the side panel and main panel. But that's why we do need the filtering and sorting functionality on that table. I would prefer the slower speed over not having filtering/sorting enabled in this case.

I am sure you have looked at this @petar-qb, but just in case you had not seen: https://dash.plotly.com/dash-ag-grid/infinite-scroll#infinite-scroll-with-sort-and-filter

Would this work?

@petar-qb
Copy link
Contributor Author

petar-qb commented Jun 7, 2024

I am sure you have looked at this @petar-qb, but just in case you had not seen: https://dash.plotly.com/dash-ag-grid/infinite-scroll#infinite-scroll-with-sort-and-filter

Would this work?

@maxschulz-COL Thanks for pointing this out (I forgot even though I saw it). I don't know how good idea it is to push this code for the demo example (since its complexity could confuse our users and repeal them to use Vizro).

However, let me investigate how generic this code is on Monday. If it is, maybe we can implicitly enable this code if the infinite scroll is turned on.

@petar-qb petar-qb temporarily deployed to circleci_secrets June 11, 2024 12:29 — with GitHub Actions Inactive
@pre-commit-ci pre-commit-ci bot temporarily deployed to circleci_secrets June 11, 2024 12:30 Inactive
@petar-qb
Copy link
Contributor Author

@maxschulz-COL @huong-li-nguyen @antonymilne

The code from the link works 🎉.

What do you think, do we want to predefined the infinite_scroll_dash_ag_grid?

@maxschulz-COL
Copy link
Contributor

@maxschulz-COL @huong-li-nguyen @antonymilne

The code from the link works 🎉.

What do you think, do we want to predefined the infinite_scroll_dash_ag_grid?

I am not opposed to having another grid callable, it would be in line with what we are doing for KPI cards.

But I think we need to abstract the entire callback logic away. Ideally we also have only a minimum set of defaults.

Another route would be to enhance the existing callable for the case that someone chooses infinite scroll: e.g. we could make this an argument of the dash_ag_grid and when chosen True, we would take care of the rest. Not sure I like it, but it would reduce the number of CapturedCallable we produce, but could lead to other maintenance issue.

Overall I think I am in favour if we can prove that this setting improves the performance noticeably. If we can enable it in an easy way, then I think this is exactly what Vizro stands for

@antonymilne
Copy link
Contributor

antonymilne commented Jun 25, 2024

Sorry for the very delayed response to this but hopefully better late than never... 🤞 Just to check we're all on the same page here, I think this is where things stand:

  • default AgGrid uses client-side row model that downloads all data to browser at start. No callbacks involved
  • infinite row model without sorting or filtering can be achieved with one simple callback
  • infinite row model with sorting or filtering can be achieved with a much more complicated callback

I think @maxschulz-COL was describing above two possible APIs:

  1. keep our dash_ag_grid with client-side row model, no callbacks + new dash_ag_grid_infinite with infinite row model and callback
  2. keep our dash_ag_grid with client-side row model, no callbacks. Intercept the argument rowModelType="infinite" and use that to make it infinite row model and callback

In theory I agree with either API, but I'm not sure how easily we can actually implement either one? The main problem I see here is where do you define the callback?

  1. if it's outside dash_ag_grid then you don't have correct access to data_frame and so can't get the correct data (after filters/parameters on control panel). If you don't have any controls then it's possible to do a hack here to just do data_manager["iris"].load() directly in the function but not a good general solution
  2. if it's inside the dash_ag_grid then will this actually work? We would be redefining a callback every time the function is run, which might work ok but doesn't feel right1. Possibly there's a way to avoid this and look to see if the callback is already defined though to avoid duplication? 🤔

Am I missing something here? If not then next steps here would be to investigate 2, which would be useful anyway due to #109.

Footnotes

  1. we already do this every Dash page navigation callback since this rebuilds the page and hence e.g. the rangeslider callbacks. This already feels quite wrong but seems to work ok. See also https://github.com/mckinsey/vizro/issues/109#issuecomment-1768715486)

@antonymilne
Copy link
Contributor

The other thing I'm curious about and maybe @AnnMarieW can answer... Where does all the code in this example come from? How well tested and generic is this? i.e. could we safely apply it to any dataframe, or is it really just suitable for a few representative examples?

@AnnMarieW
Copy link
Contributor

The other thing I'm curious about and maybe @AnnMarieW can answer... Where does all the code in this example come from? How well tested and generic is this? i.e. could we safely apply it to any dataframe, or is it really just suitable for a few representative examples?

The example is adapted from the example in the AG Grid docs. It's designed to be more generic than that example and should work in most cases.

This example was provided by @BSd3v who is the lead developer of dash-ag-grid. Hey Bryan, do you have any other comments?

@BSd3v
Copy link

BSd3v commented Jul 1, 2024

Hello, All,

While this code is robust, and I believe it will work for most cases, it hasnt been tried on every single dataset, or even a broad spectrum of datasets for that matter. I fear that this could run into some of the enterprise features, such as filter sets and advanced filters.

I think if you want to go this route, you have it be a function that can be passed to the component if needed to get more advanced than this, where this first function is set as a default.


On a side note, the first post mentions pagination and being clientside. While this is the case in a default setting, it is possible to utilize pagination with an infinite row model as well. Just throwing that out there. 😄

@antonymilne
Copy link
Contributor

Thanks so much for chipping in @AnnMarieW and @BSd3v, very helpful! We are indeed considering doing pagination with the infinite row model and just wondering whether it would be a good idea to build the code in https://dash.plotly.com/dash-ag-grid/infinite-scroll#infinite-scroll-with-sort-and-filter into Vizro.

There are other technical problems on the Vizro side here so not sure whether we'll actually end up doing it but would be good to know in principle whether you would mind us doing so @BSd3v? 🙂

I fear that this could run into some of the enterprise features, such as filter sets and advanced filters.

This is a very good point and something that had occurred to me too. We'll need to make sure we don't clash with them.

I think if you want to go this route, you have it be a function that can be passed to the component if needed to get more advanced than this, where this first function is set as a default.

This should indeed be possible through a user defining their own custom AgGrid function, although I imagine in practice not many people would.

@BSd3v
Copy link

BSd3v commented Jul 2, 2024

There are other technical problems on the Vizro side here so not sure whether we'll actually end up doing it but would be good to know in principle whether you would mind us doing so @BSd3v? 🙂

This shouldnt be an issue, the dash-ag-grid library is just a wrapper for AG Grid. Just put tests in place for updates from the library. 😄

This should indeed be possible through a user defining their own custom AgGrid function, although I imagine in practice not many people would.

You can always give them the base code in your documentation, and demonstrate how to override it. And yes, for the vast majority I dont think there will be an issue.

@huong-li-nguyen huong-li-nguyen changed the title [Tidy] KPI Infinite scroll AgGrid [POC] KPI Infinite scroll AgGrid Jul 9, 2024
@huong-li-nguyen
Copy link
Contributor

For my own tracking - blocked until this is solved: https://github.com/McK-Internal/vizro-internal/issues/1100

@huong-li-nguyen
Copy link
Contributor

Is this still relevant based on the new speed optimisations for the Tables and Graph? :)

@petar-qb
Copy link
Contributor Author

petar-qb commented Sep 3, 2024

Is this still relevant based on the new speed optimisations for the Tables and Graph? :)

Only a half of the "ideally fast ag_grid" problem is solved with the #644.

The other half involves enabling server-side pagination e.g. by enabling infinite_scroll_ag_grid figure function, and this is blocked until we investigate already mentioned -> https://github.com/McK-Internal/vizro-internal/issues/1100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants