Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding/persisting galaxy objIDs to qp ancillary data #39

Open
alexandergagliano opened this issue Apr 20, 2022 · 10 comments
Open

Adding/persisting galaxy objIDs to qp ancillary data #39

alexandergagliano opened this issue Apr 20, 2022 · 10 comments
Assignees
Labels
question Further information is requested

Comments

@alexandergagliano
Copy link

I have qp quantiles stored for 1000 galaxies and am trying to link these properties back to the original galaxies. Would it be possible to revise the metadata of the files to include the original object ID of the galaxy? Currently the metadata only includes:

{'pdf_name': array([b'quant_piecewise'], dtype='|S15'),
'pdf_version': array([0]),
'quants': array([[0. , 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 ,
0.9 , 0.999, 1. ]])}

@aimalz
Copy link
Collaborator

aimalz commented Apr 20, 2022

Alternatively, it would be acceptable to add them to Ensemble.ancil. However, if this attribute doesn't already exist, using Ensemble.add_to_ancil() throws a NoneType error rather than creating an ancil dict (nor does it get added to metadata from Ensemble.add_to_metadata()).

in_pdfs = qp.Ensemble(qp.interp, data=dict(xvals=xvals, yvals=yvals, check_input=True))
in_pdfs.ancil = {'GALID': in_df['GALID']}

@eacharles
Copy link
Collaborator

You should be able to do

in_pdfs = qp.Ensemble(qp.interp, data=dict(xvals=xvals, yvals=yvals, check_input=True))
in_pdfs.set_ancil({'GALID': in_df['GALID']})

@eacharles
Copy link
Collaborator

Could you check if Ensemble.set_ancil() works, and if it does, close this issue, thanks?

@eacharles eacharles added the question Further information is requested label Apr 20, 2022
@aimalz
Copy link
Collaborator

aimalz commented Apr 21, 2022

It doesn't throw an error, but it doesn't actually get added to the metadata and Ensemble.ancil remains None. Including it as a keyword upon instantiation (in_pdfs = qp.Ensemble(qp.interp, data=dict(xvals=xvals, yvals=yvals, check_input=True), ancil=any_dictionary)) causes a KeyError. (Apologies if it's not even supposed to be provided as a dictionary argument -- it would help if there were docstrings to specify the types of anticipated inputs.)

@eacharles
Copy link
Collaborator

eacharles commented Apr 21, 2022 via email

@eacharles
Copy link
Collaborator

eacharles commented Apr 21, 2022 via email

@eacharles
Copy link
Collaborator

eacharles commented Apr 21, 2022

So, the setting, reading and writing of ancillary data is explicitly being testing in the unit tests.
If you look in https://github.com/LSSTDESC/qp/blob/8f6c2c4bd22f9e3061ae5329625b35ef8fa34280/qp/scipy_pdfs.py#L27 ancil is being set for a particular test case.
Then set_ancil is being called when building test pdfs
https://github.com/LSSTDESC/qp/blob/8f6c2c4bd22f9e3061ae5329625b35ef8fa34280/qp/test_funcs.py#L34
And it the persistence loopback test it is checking that the ancillary data matches in the original version and in the written and re-read version.
https://github.com/LSSTDESC/qp/blob/8f6c2c4bd22f9e3061ae5329625b35ef8fa34280/qp/test_funcs.py#L152
I just confirmed that this line of code is being executed when I run the unit tests. Most likely the issue you are seeing is that you have an older version tables_io that wasn't writing the ancillary data.

@aimalz
Copy link
Collaborator

aimalz commented Apr 21, 2022

@alexandergagliano and I got the same behavior with presumably different versions of qp, but the first way of doing it from the test setup worked for me after updating tables_io, thanks! Should I make a separate issue for including the ancil functionality to one of the demos, or is it too specific to the RAIL use case?

A closely related issue is that ancillary data doesn't get carried over when converting parameterizations, e.g.

in_pdfs = qp.Ensemble(qp.interp, data=dict(xvals=zgrid, yvals=flow_z, check_input=True), ancil=dict(GALID=in_df['GALID']))
out_pdfs = in_pdfs.convert_to(qp.quant_piecewise_gen, quants=quants, check_input=False)
print(out_pdfs.ancil)
> > None

It's not a bug per se, but it is definitely an instance of the code's behavior not matching standard user expectations.

@eacharles
Copy link
Collaborator

Well, go ahead and add a couple of lines in ensemble.convert_to to copy over the ancillary data if it exists.

@aimalz aimalz self-assigned this Apr 21, 2022
@aimalz aimalz changed the title Adding galaxy objIDs to qp metadata Adding/persisting galaxy objIDs to qp ancillary data Apr 28, 2022
@aimalz aimalz transferred this issue from LSSTDESC/qp Aug 4, 2023
@eacharles
Copy link
Collaborator

eacharles commented Aug 6, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants