clarify missingness of point estimates; add examples for determining output_type_id format #197

zkamvar · 2024-10-21T22:24:54Z

This modifies the description of output_type_id for point estimates to clearly state that the entry should be a missing value, either NA in R or None in Python.

This also states that this applies to schemas prior to v4.0.0 (see hubverse-org/schemas#109) and adds the python one-liner as suggested by @MKupperman in #198.

The preview for this PR is at https://hubdocs--197.org.readthedocs.build/en/197/user-guide/model-output.html

…for the schema

zkamvar · 2024-10-21T22:26:56Z

I would like to get another modeler perspective on this if it sounds clear: @trobacker

docs/source/user-guide/model-output.md

micokoch

Anna made the technical comments, but from a readability angle, it looks good to me. It even made me realize I should uncapitalize a "parquet" which I had capitalized previously.

zkamvar · 2024-10-22T22:18:49Z

I've updated the text according to Anna's suggestions and what I found in #198

elray1 · 2024-10-22T22:26:02Z

I have not read all of this carefully. But I noticed Anna's comment above, "I believe also exists in python via pd.NA, is the closest thing to R's NA and I think that's actually what @MKupperman used in his successful exploration (but I might be wrong)."

Just noting that missing values in polars are best/most often/most generally represented using null (which is in practice set by specifying None). Again, haven't read this but just wanted to quickly mention it in case it's relevant to anything here.

annakrystalli · 2024-10-23T09:15:35Z

It might indeed be relevant @elray1.

What we are interested here is what missing values will serialise in an interoperable way that will be correctly interepreted as a missing value by our software when read in. We will definitely have NAs in our files as we will always have R users writing to csv files so either way polars will need to deal with that. I believe it can though as you would expect the R version of the polars package to understand NAs. There are also arguments when reading in files that allow us to specify null values on read (in both python and R).

Having said all that, it's of course best not to make any assumptions and actually test to ensure our suggestions don't cause any problems to desired downstream functionality. So @elray1, would you be able to add what a successful test of polars functionality requirements would look like here 🙏? #198

zkamvar · 2024-10-23T18:37:58Z

This PR is intended to clarify that "NA" in the schema actually means missing data and not the character "NA". I've tested these assumptions in #198 (comment) and I think it satisfies this comment:

Having said all that, it's of course best not to make any assumptions and actually test to ensure our suggestions don't cause any problems to desired downstream functionality.

micokoch

It looks very good, as far as I can tell. I made some small capitalization suggestions to be consistent with the rest of hubDocs.

docs/source/user-guide/model-output.md

Co-authored-by: Alvaro J. Castro Rivadeneira <[email protected]>

zkamvar · 2024-10-25T16:05:29Z

@annakrystalli, did I address your concerns?

docs/source/user-guide/model-output.md

Co-authored-by: Anna Krystalli <[email protected]>

annakrystalli

Getting there from my perspective with some really useful and detailed additions!

Made a few corrections and a suggestion for moving a section and cleaning up some possible misconceptions.

docs/source/user-guide/model-output.md

Co-authored-by: Anna Krystalli <[email protected]>

bsweger

Thanks for addressing the missing data question. I reviewed the text and tried running all of the code, so made some notes about those things.

As for the requested "Python perspective"... after reading these changes, I'm honestly not sure what the desired outputs are. If I'm understanding the write-up correct, there's a distinction between "NA" for something not required and NA/None for something that is optional and missing?

I might need to hop on a zoom to better understand what we're going for

docs/source/user-guide/model-output.md

Co-authored-by: Becky Sweger <[email protected]> Co-authored-by: Anna Krystalli <[email protected]>

bsweger

Thanks for taking the time to make sure I'm understanding the problem space correctly! Based on that, I made a few more wording suggestions for your consideration (feel free to disregard)

docs/source/user-guide/model-output.md

Co-authored-by: Becky Sweger <[email protected]>

LucieContamin

Thank you for the update, it looks all clear to me, I just made some minor comments.

docs/source/user-guide/model-output.md

Co-authored-by: Lucie Contamin <[email protected]>

Co-authored-by: Becky Sweger <[email protected]>

Based on feedback in https://github.com/hubverse-org/hubDocs/pull/197/files#r1822764075, It made more sense to put these directly in the examples.

zkamvar · 2024-10-30T22:34:52Z

Thank you @annakrystalli, @bsweger, @LucieContamin, and @micokoch for your comments!

As per @annakrystalli and @LucieContamin's suggestions, I have moved the documentation about the template generation and arrow schema closer to the examples. Specifically, I added the section about the arrow schema inside the example so that users do not have to look far to know what the arrow schema would look like.

As per @bsweger's suggestions (and #198 (comment)), I have attempted to focus the discussion for the modelers on the column type.

docs/source/user-guide/model-output.md

annakrystalli

I made a couple of minor last minute corrections and added a suggested note about another useful function, hubData::coerce_to_hub_schema().

Other than that, all my comments have been resolved. Thanks @zkamvar !

annakrystalli · 2024-10-31T08:57:30Z

FYI as a response to functions being relevant to modelers appearing across too many packages, I've made a proposal to re-export them to hubValidations here: hubverse-org/hubValidations#149

Input welcome

LucieContamin

Thank you very much, it looks all good to me. I just made 2 small comment, feel free to ignore

docs/source/user-guide/model-output.md

Co-authored-by: Anna Krystalli <[email protected]> Co-authored-by: Lucie Contamin <[email protected]>

It made more sense for this function to be what you would use before writing out the schema to parquet in R. It doesn't really do much for Python folks because they aren't likely to go through R to submit their models.

zkamvar · 2024-10-31T13:48:58Z

Thank you all for sticking with me through this!

zkamvar added 2 commits October 21, 2024 15:14

modify model output chapter to indicate that missingness is required …

904d017

…for the schema

add NA explanation in quickstart

a83275e

zkamvar requested review from annakrystalli, micokoch, mmkerr and bsweger October 21, 2024 22:25

annakrystalli requested changes Oct 22, 2024

View reviewed changes

annakrystalli linked an issue Oct 22, 2024 that may be closed by this pull request

Test and document how to produce typed NA/missing values in Python. #198

Open

annakrystalli removed a link to an issue Oct 22, 2024

Test and document how to produce typed NA/missing values in Python. #198

Open

micokoch approved these changes Oct 22, 2024

View reviewed changes

zkamvar added 3 commits October 22, 2024 15:02

rework output_type_id discussion

60b2ee7

update language

5308cd5

add more details and fix links

030de5b

zkamvar requested review from annakrystalli and micokoch October 22, 2024 22:17

annakrystalli mentioned this pull request Oct 23, 2024

Test and document how to produce typed NA/missing values in Python. #198

Open

zkamvar mentioned this pull request Oct 23, 2024

redefine model output representation to be two columns #202

Merged

micokoch reviewed Oct 24, 2024

View reviewed changes

zkamvar and others added 2 commits October 24, 2024 09:57

Apply suggestions from code review

f26c129

Co-authored-by: Alvaro J. Castro Rivadeneira <[email protected]>

Merge branch 'main' into znk/clarify-missingness

943bfe6

annakrystalli requested changes Oct 25, 2024

View reviewed changes

zkamvar commented Oct 25, 2024

View reviewed changes

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

zkamvar commented Oct 25, 2024

View reviewed changes

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

Apply suggestions from code review

f4eb77c

Co-authored-by: Anna Krystalli <[email protected]>

zkamvar requested a review from annakrystalli October 30, 2024 12:58

annakrystalli requested changes Oct 30, 2024

View reviewed changes

Apply suggestions from code review

aaeec18

Co-authored-by: Anna Krystalli <[email protected]>

bsweger reviewed Oct 30, 2024

View reviewed changes

Apply suggestions from code review

2716974

Co-authored-by: Becky Sweger <[email protected]> Co-authored-by: Anna Krystalli <[email protected]>

bsweger reviewed Oct 30, 2024

View reviewed changes

Apply suggestions from code review

fd39624

Co-authored-by: Becky Sweger <[email protected]>

LucieContamin reviewed Oct 30, 2024

View reviewed changes

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

zkamvar and others added 6 commits October 30, 2024 12:40

Apply suggestions from code review

56b1326

Co-authored-by: Lucie Contamin <[email protected]>

change smart quotes to ASCII

26104d3

make model_id consistent

9692c5c

Apply suggestions from code review

3556918

Co-authored-by: Becky Sweger <[email protected]>

move demo of template and schema to examples

5a117d1

Based on feedback in https://github.com/hubverse-org/hubDocs/pull/197/files#r1822764075, It made more sense to put these directly in the examples.

add more emph

1941004

zkamvar requested review from bsweger, micokoch, annakrystalli and LucieContamin October 30, 2024 22:34

annakrystalli reviewed Oct 31, 2024

View reviewed changes

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

docs/source/user-guide/model-output.md Show resolved Hide resolved

annakrystalli approved these changes Oct 31, 2024

View reviewed changes

annakrystalli mentioned this pull request Oct 31, 2024

Re-export functions useful to modelers from hubUtils and hubData hubverse-org/hubValidations#149

Closed

LucieContamin approved these changes Oct 31, 2024

View reviewed changes

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

docs/source/user-guide/model-output.md Outdated Show resolved Hide resolved

Apply suggestions from code review

7870e9e

Co-authored-by: Anna Krystalli <[email protected]> Co-authored-by: Lucie Contamin <[email protected]>

zkamvar mentioned this pull request Oct 31, 2024

Explore potential for creating in-browser schema information hubverse-org/hub-dash-site-builder#1

Open

move coerce_to_hub_schema() to the R section; add 🐍 context

128fdcd

It made more sense for this function to be what you would use before writing out the schema to parquet in R. It doesn't really do much for Python folks because they aren't likely to go through R to submit their models.

zkamvar merged commit 8c1fa4e into main Oct 31, 2024
1 check passed

zkamvar deleted the znk/clarify-missingness branch October 31, 2024 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify missingness of point estimates; add examples for determining output_type_id format #197

clarify missingness of point estimates; add examples for determining output_type_id format #197

zkamvar commented Oct 21, 2024 •

edited

Loading

zkamvar commented Oct 21, 2024

micokoch left a comment

zkamvar commented Oct 22, 2024

elray1 commented Oct 22, 2024 •

edited

Loading

annakrystalli commented Oct 23, 2024

zkamvar commented Oct 23, 2024

micokoch left a comment

zkamvar commented Oct 25, 2024

annakrystalli left a comment

bsweger left a comment

bsweger left a comment

LucieContamin left a comment

zkamvar commented Oct 30, 2024

annakrystalli left a comment

annakrystalli commented Oct 31, 2024

LucieContamin left a comment

zkamvar commented Oct 31, 2024

clarify missingness of point estimates; add examples for determining output_type_id format #197

clarify missingness of point estimates; add examples for determining output_type_id format #197

Conversation

zkamvar commented Oct 21, 2024 • edited Loading

zkamvar commented Oct 21, 2024

micokoch left a comment

Choose a reason for hiding this comment

zkamvar commented Oct 22, 2024

elray1 commented Oct 22, 2024 • edited Loading

annakrystalli commented Oct 23, 2024

zkamvar commented Oct 23, 2024

micokoch left a comment

Choose a reason for hiding this comment

zkamvar commented Oct 25, 2024

annakrystalli left a comment

Choose a reason for hiding this comment

bsweger left a comment

Choose a reason for hiding this comment

bsweger left a comment

Choose a reason for hiding this comment

LucieContamin left a comment

Choose a reason for hiding this comment

zkamvar commented Oct 30, 2024

annakrystalli left a comment

Choose a reason for hiding this comment

annakrystalli commented Oct 31, 2024

LucieContamin left a comment

Choose a reason for hiding this comment

zkamvar commented Oct 31, 2024

zkamvar commented Oct 21, 2024 •

edited

Loading

elray1 commented Oct 22, 2024 •

edited

Loading