Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VegaFusion treats pandas categoricals differently than Altair's default transformer #402

Open
Tracked by #1365
joelostblom opened this issue Sep 29, 2023 · 3 comments

Comments

@joelostblom
Copy link

The chart from the spec below renders differently with vegafusion enabled. vegafusion handles it better than the the default transformer and creates a more reasonable x-scale (the same result can be obtained in altair by casting to a string instead of a category as in vega/altair#3140 (reply in thread)). If we can have altair render the chart the same without vegafusion that would be great, but depending on what vegafusion actually does here that might not be possible.

import altair as alt
from vega_datasets import data

source = data.wheat()
source.year = source.year.astype('category')

chart = alt.Chart(source, height=200).mark_point().encode(
    x='year:T',
    y='wheat',
)

image

@joelostblom
Copy link
Author

joelostblom commented Sep 29, 2023

Oh I noticed that Vegafusion displays the desirable behavior with integers as well! That's great and resolves the issue I had when openeing vega/altair#3140 in the first place. Now I really want use to bring that behavior into altair's base transformer too if possible...

@jonmmease
Copy link
Collaborator

VegaFusion doesn't actually support categoricals internally, and "expands" them during the conversion to arrow, so it makes sense that you see the same behavior as with integers in this case.

https://github.com/hex-inc/vegafusion/blob/d94fee469524879d2d29cb9320a31ecacf7a25dc/python/vegafusion/vegafusion/transformer.py#L52-L55

For integers parsed as temporal columns, VegaFusion currently interprets them as years:

https://github.com/hex-inc/vegafusion/blob/d94fee469524879d2d29cb9320a31ecacf7a25dc/vegafusion-runtime/src/data/tasks.rs#L337-L354

I honestly don't recall why I put that logic in there. I thought it was to match an Altair or Vega-Lite example, but maybe not. I think this could be a change to propose in Vega's date parsing. I'm really not sure what it's currently doing, I would have guessed that the alternative to treating integers as years would be to treat them as UTC milliseconds, but that doesn't appear to be happening either given the .800 x-axis tick label.

As an alternative, Altair could re-interpret integer columns as years by adding a custom calculate expression. Though that might be more variation from Vega/Vega-Lite than we want. Vega-Lite doesn't have access to column type info, so the special treatment wouldn't be able to happen there.

@jonmmease jonmmease changed the title Vegafusion treats pandas categoricals differently than Altair's default transformer VegaFusion treats pandas categoricals differently than Altair's default transformer Sep 29, 2023
@joelostblom
Copy link
Author

joelostblom commented Sep 29, 2023

I see, I do find that the Vegafusion behavior adds a lot of convenience here, so I would be in favor of re-implementing that solution in Altair as well, although I would prefer if it could be added in Vega so that we don't have to depart from VL (although I think this is a smaller departure). I found a Vega issue where this was initially discussed and suggested the change there again vega/vega#1681. There is also an issue in Altair from the same time vega/altair#1365

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants