-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clarify missingness of point estimates; add examples for determining output_type_id format #197
Conversation
I would like to get another modeler perspective on this if it sounds clear: @trobacker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anna made the technical comments, but from a readability angle, it looks good to me. It even made me realize I should uncapitalize a "parquet" which I had capitalized previously.
I've updated the text according to Anna's suggestions and what I found in #198 |
I have not read all of this carefully. But I noticed Anna's comment above, "I believe also exists in python via pd.NA, is the closest thing to R's NA and I think that's actually what @MKupperman used in his successful exploration (but I might be wrong)." Just noting that missing values in polars are best/most often/most generally represented using |
It might indeed be relevant @elray1. What we are interested here is what missing values will serialise in an interoperable way that will be correctly interepreted as a missing value by our software when read in. We will definitely have Having said all that, it's of course best not to make any assumptions and actually test to ensure our suggestions don't cause any problems to desired downstream functionality. So @elray1, would you be able to add what a successful test of polars functionality requirements would look like here 🙏? #198 |
This PR is intended to clarify that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks very good, as far as I can tell. I made some small capitalization suggestions to be consistent with the rest of hubDocs.
Co-authored-by: Alvaro J. Castro Rivadeneira <[email protected]>
@annakrystalli, did I address your concerns? |
Co-authored-by: Anna Krystalli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting there from my perspective with some really useful and detailed additions!
Made a few corrections and a suggestion for moving a section and cleaning up some possible misconceptions.
Co-authored-by: Anna Krystalli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the missing data question. I reviewed the text and tried running all of the code, so made some notes about those things.
As for the requested "Python perspective"... after reading these changes, I'm honestly not sure what the desired outputs are. If I'm understanding the write-up correct, there's a distinction between "NA" for something not required and NA/None for something that is optional and missing?
I might need to hop on a zoom to better understand what we're going for
Co-authored-by: Becky Sweger <[email protected]> Co-authored-by: Anna Krystalli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to make sure I'm understanding the problem space correctly! Based on that, I made a few more wording suggestions for your consideration (feel free to disregard)
Co-authored-by: Becky Sweger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update, it looks all clear to me, I just made some minor comments.
Co-authored-by: Lucie Contamin <[email protected]>
Co-authored-by: Becky Sweger <[email protected]>
Based on feedback in https://github.com/hubverse-org/hubDocs/pull/197/files#r1822764075, It made more sense to put these directly in the examples.
Thank you @annakrystalli, @bsweger, @LucieContamin, and @micokoch for your comments! As per @annakrystalli and @LucieContamin's suggestions, I have moved the documentation about the template generation and arrow schema closer to the examples. Specifically, I added the section about the arrow schema inside the example so that users do not have to look far to know what the arrow schema would look like. As per @bsweger's suggestions (and #198 (comment)), I have attempted to focus the discussion for the modelers on the column type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a couple of minor last minute corrections and added a suggested note about another useful function, hubData::coerce_to_hub_schema()
.
Other than that, all my comments have been resolved. Thanks @zkamvar !
FYI as a response to functions being relevant to modelers appearing across too many packages, I've made a proposal to re-export them to Input welcome |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much, it looks all good to me. I just made 2 small comment, feel free to ignore
Co-authored-by: Anna Krystalli <[email protected]> Co-authored-by: Lucie Contamin <[email protected]>
It made more sense for this function to be what you would use before writing out the schema to parquet in R. It doesn't really do much for Python folks because they aren't likely to go through R to submit their models.
Thank you all for sticking with me through this! |
This modifies the description of
output_type_id
for point estimates to clearly state that the entry should be a missing value, eitherNA
in R orNone
in Python.This also states that this applies to schemas prior to v4.0.0 (see hubverse-org/schemas#109) and adds the python one-liner as suggested by @MKupperman in #198.
The preview for this PR is at https://hubdocs--197.org.readthedocs.build/en/197/user-guide/model-output.html