-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow util_corr_fit() to group by a variable #58
Comments
Hey @emcfalls ! Thank you for the very helpful write-up and for sharing your ideas (and also the picture :) ). I have a few thoughts:
One alternative to the two options presented here would be to make any functions that get a
|
One other note as a total aside, markdown is supported on Github issues, so you can get
with one and three tic marks, respectively. |
@Deckart2 I agree with not changing the format of the output and keeping it consistent with the other functions that use groupby! I also like your idea of using a group parameter so instead of returning the results for all groups we just return it for one. I think that would be the easiest as far as keeping the same function output, but I'm thinking it may be tedious for users who want to look at multiple groups. I think a best of both worlds would be to have both parameters (groupby and group), but that may be overkill. |
Sounds right to me, and I think it could be totally okay to have a groupby and group argument. I also agree it could be tedious, but if we go this route, we could add some documentation that could show how to do it in a few lines of code. It may be harder than this, but at its core, we would need to do something like:
Anyway, this is really thoughtful, and excited to discuss it synchronously with you @emcfalls and hear @awunderground 's thoughts :)! |
Brief Notes from Convo with Aaron: |
This extension will allow users to return correlation data for the numerical columns in their synthesis by a certain variable (i.e., gender, age, race, etc.). Therefore, users can assess the performance of their synthesis based how it preserves multivariate relationships for different subgroups in the population. I don't if we would want to group by multiple variables due to the complexity.
Right now, the util_corr_fit() function returns a list
I have two ideas for this extension
Example for variable sex:
Example correlation matrix for a dataframe with sex, income (numeric), and height (numeric)
The text was updated successfully, but these errors were encountered: