Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JupyterViz: Simulation step is very slow for huge grid size (e.g. 80x80) #1806

Closed
rht opened this issue Sep 16, 2023 · 8 comments
Closed

JupyterViz: Simulation step is very slow for huge grid size (e.g. 80x80) #1806

rht opened this issue Sep 16, 2023 · 8 comments

Comments

@rht
Copy link
Contributor

rht commented Sep 16, 2023

This is a continuation of the discussion started in #1772 (comment).
@rlskoeser suggested Altair, which might be faster than Solara's Matplotlib backend.

I did manual benchmark. I found that on my laptop (i5-1345U), the portray generation and ax.scatter took 80 ms, but the slowest part is actually Solara's savefig: https://github.com/widgetti/solara/blob/a747a680478653ab73c3f9323aeb5fee45147b60/solara/components/matplotlib.py#L54, which took 1.2 s.

I changed the output format from png to svgsvg to png, and the savefig elapsed went down from 1.2 s to 180 ms. I additionally experimented with updating the scatter plot with set_offsets (for x, y), and set_sizes for size: the portrayal generation and scatter update went down from 80 ms to 6 ms. The branch I used to experiment can be found at https://github.com/rht/mesa/tree/solara_perf.

This is very promising.

@rht
Copy link
Contributor Author

rht commented Sep 16, 2023

According to https://discuss.streamlit.io/t/plot-library-speed-trial/4688, Altair is orders of magnitude faster than Matplotlib.

@Corvince
Copy link
Contributor

Altair is also suitable for interactive explorations, where matplotlib only displays a static image in solara. (https://solara.dev/examples/libraries/altair)

Also we don't have to worry about thread safety since the actual plotting of altair happens in javascript (vega)

@Corvince
Copy link
Contributor

Corvince commented Sep 20, 2023

Here is some code for a very simple altair grid chart

def altair_space(model, test):
    def get_data(agent, pos):
        if agent:
            return {"x": pos[0], "y": pos[1], "type": agent.type}

    data = list(
        filter(None, (get_data(agent, pos) for agent, pos in model.grid.coord_iter()))
    )
    chart = (
        alt.Chart(alt.Data(values=data))
        .mark_rect()
        .encode(x="x:O", y="y:O", color="type:N")
    )
    return solara.FigureAltair(chart)

But it does appear to be slower than Matplotlib :( Could you test this out @rht ?

/edit to be used only for the schelling example

@ankitk50
Copy link
Contributor

According to https://discuss.streamlit.io/t/plot-library-speed-trial/4688, Altair is orders of magnitude faster than Matplotlib.

It might have some issues with Jupyter. But yes the graphics are very rich, especially you can have a lots of information displayed via tooltip.

@rht
Copy link
Contributor Author

rht commented Jan 28, 2024

Just tested. Altair is no-go for now. At first, I got this error

File /venv/lib/python3.11/site-packages/altair/utils/data.py:81, in limit_rows.<locals>.raise_max_rows_error()
     80 def raise_max_rows_error():
---> 81     raise MaxRowsError(
     82         "The number of rows in your dataset is greater "
     83         f"than the maximum allowed ({max_rows}).\n\n"
     84         "Try enabling the VegaFusion data transformer which "
     85         "raises this limit by pre-evaluating data\n"
     86         "transformations in Python.\n"
     87         "    >> import altair as alt\n"
     88         '    >> alt.data_transformers.enable("vegafusion")\n\n'
     89         "Or, see https://altair-viz.github.io/user_guide/large_datasets.html "
     90         "for additional information\n"
     91         "on how to plot large datasets."
     92     )

MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).

Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
    >> import altair as alt
    >> alt.data_transformers.enable("vegafusion")

Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.

Then I did pip install -U "vegafusion[embed]" vl-convert-python, then got this Solara error

Traceback (most recent call last):
  File "/code/venv/lib/python3.11/site-packages/reacton/core.py", line 1661, in _render
    root_element = el.component.f(*el.args, **el.kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/venv/lib/python3.11/site-packages/solara/components/figure_altair.py", line 31, in FigureAltair
    raise KeyError(f"{key4} and {key5} not in mimebundle:\n\n{bundle}")
KeyError: 'application/vnd.vegalite.v4+json and application/vnd.vegalite.v5+json not in mimebundle:\n\n{\'application/vnd.vega.v5+json\': {\'$schema\': \'https://vega.github.io/schema/vega/v5.json\', \'data\': [{\'name\': \'source_0\', \'values\': [{\'x\': 0, \'y\': 0}, {\'x\': 0, \'y\': 3}, {\'x\': 0, \'y\': 4}, {\'x\': 0, \'y\': 5}, ..., 73}, {\'x\': 79, \'y\': 74}, {\'x\': 79, \'y\': 75}, {\'x\': 79, \'y\': 76}, {\'x\': 79, \'y\': 77}, {\'x\':79, \'y\': 78}, {\'x\': 79, \'y\': 79}]}, {\'name\': \'source_0_x_domain_x\', \'values\': [{\'min\': 0, \'max\': 79}]}, {\'name\': \'source_0_y_domain_y\', \'values\': [{\'min\': 0, \'max\': 79}]}], \'marks\': [{\'type\': \'symbol\', \'name\': \'marks\', \'from\': {\'data\': \'source_0\'}, \'encode\': {\'update\': {\'y\': {\'field\': \'y\', \'scale\': \'y\'}, \'ariaRoleDescription\': {\'value\': \'point\'}, \'x\': {\'field\': \'x\', \'scale\': \'x\'}, \'opacity\': {\'value\': 0.7}, \'fill\': {\'value\': \'#4c78a8\'}, \'description\': {\'signal\': \'"x: " + (format(datum["x"], "")) + "; y: " + (format(datum["y"], ""))\'}}}, \'style\': [\'point\']}], \'scales\': [{\'name\': \'x\', \'type\': \'linear\', \'domain\': [{\'signal\': \'(data("source_0_x_domain_x")[0] || {}).min\'}, {\'signal\': \'(data("source_0_x_domain_x")[0] || {}).max\'}], \'range\': [0, {\'signal\': \'width\'}], \'zero\': True, \'nice\': True}, {\'name\': \'y\', \'type\': \'linear\', \'domain\': [{\'signal\': \'(data("source_0_y_domain_y")[0] || {}).min\'}, {\'signal\': \'(data("source_0_y_domain_y")[0] || {}).max\'}], \'range\': [{\'signal\':\'height\'}, 0], \'zero\': True, \'nice\': True}], \'style\': \'cell\', \'padding\': 5, \'width\': 300, \'height\': 300, \'background\': \'white\'}, \'text/plain\': \'<VegaLite 5 object>\\n\\nIf you see this message, it means the renderer has not been properly enabled\\nfor the frontend that you are using. For more information, see\\nhttps://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting\\n\'}'

@Corvince
Copy link
Contributor

You got me curious. This page talks about this: https://altair-viz.github.io/user_guide/large_datasets.html

However, I tried it in mesa-interactive and it handled it just fine, so it seems to be solvable. For performance see the screencast

e6497a6d-aa3d-42cf-b626-e824c290ca11.webm

@rht
Copy link
Contributor Author

rht commented Jan 29, 2024

There is not much different in the code other than:

  • mesa-interactive uses mark_rect()
  • that I haven't implemented on_click
  • I don't specify scale for x and y
  • I specify the type as type="ordinal" instead of "y:N" for clarity

It could be my own install is problematic (I am using pip instead of conda). But at least it seems that for huge grid, the answer is to use Altair.

@rht
Copy link
Contributor Author

rht commented Mar 5, 2024

It seems fixing #1741 made it faster for both Matplotlib and Altair. Currently, the Altair version on main works even without the vegafusion acceleration, but the Matplotlib version of the space drawer is faster than Altair's, so I am leaving Matplotlib as the default, and am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants