Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish non-detections and non-observations #5

Open
jfcrenshaw opened this issue Nov 4, 2021 · 8 comments
Open

Distinguish non-detections and non-observations #5

jfcrenshaw opened this issue Nov 4, 2021 · 8 comments
Labels
question Further information is requested

Comments

@jfcrenshaw
Copy link
Contributor

Non-detections (i.e. you looked and didn't see anything) and non-observations (i.e. you didn't look) are fundamentally different. The former tells you the galaxy was too faint to be observed, while the latter tells you nothing.

LSST should have observations from all bands for every galaxy it observes, but this will be relevant if we combine e.g. Roman and Euclid photometry which will have only partial coverage of the LSST catalog.

Thus, RAIL should handle these differently.

Creation:
For now, the Creation Module will only produce non-detections. Including non-observations with a different flag from non-detections would require upgrading pzflow so that it can handle multiple flags for marginalization. This should be pretty easy, but I will not work on it until someone actually needs a data set with non-observations.

When this does get included, I expect to record non-detections as np.inf and non-observations as np.nan

Estimation:
In the mean time, Estimation codes should be developed as if both might appear in the data sets they ingest.

@jfcrenshaw jfcrenshaw added the enhancement New feature or request label Nov 4, 2021
@eacharles
Copy link
Collaborator

Just noting that this is a general issue across RAIL and we should agree on a convention.

@aimalz
Copy link
Collaborator

aimalz commented Jul 6, 2022

A need for consistent flagging has also come up in LSSTDESC/RAIL#197, so I'm going to remove the enhancement and backburner labels.

@aimalz aimalz removed the enhancement New feature or request label Jul 6, 2022
@eacharles
Copy link
Collaborator

Let's agree that we will try to come up with a convention @ the collab metting?

@aimalz
Copy link
Collaborator

aimalz commented Jul 11, 2022

To add to this, I'd propose "flag" values not being the same type as normal values in whatever column, so it will throw errors if they aren't handled properly, as opposed to running silently but giving nonsensical results.

@sschmidt23
Copy link
Collaborator

As discussed at the tag-up this morning, having entries of a different type is a problem, as you can't stick different types in numpy arrays easily. I like John Franklin's suggestion on the pz-rail Slack channel of using np.inf for non-detection and np.nan for non-observation.

@egawiser
Copy link

egawiser commented Jul 18, 2022

This is also a key issue for spectroscopic follow-up used to generate catalogs for photo-z training and calibration, where we can expect both non-observations (many) and non-detections (which I would interpret as not yielding a spectroscopic redshift, whether or not flux is detected in the spectrum). Spectroscopic surveys usually record a quality flag ranging from e.g., 1 to 4 where 4 is a definitely correct spectroscopic redshift and 1 is a pretty wild guess (or weak cross-correlation due to a lack of clear features). I bring this up because it is common to note non-detections and non-observations in those catalogs' columns with other integers like -1 for non-observation and 0 for non-detection. Those could easily be translated to np.nan and np.inf respectively as we ingest such catalogs. But it does bring up the issue of whether we need a more flexible flag array that uses a flag value to note non-observation or non-detection and can propagate a quality flag rather than simply setting redshifts to np.nan or np.inf.

@jfcrenshaw
Copy link
Contributor Author

jfcrenshaw commented Jul 19, 2022

@egawiser I think the redshift flags can follow their own convention. For galaxies with spec-z's that we use in training/calibration sets, we will need a redshift column, and a separate redshift quality column. I don't think any methods currently use quality flags, but we can perform different cuts on redshift flags to assemble different training sets.

So this is a little different than photometry. For photometry, we need flags when the photometry is missing, while for redshifts, we need flags when the redshift is present.

Since the redshift flags don't need to sit in the redshift arrays alongside actual redshift values, we have more flexibility in how we format them. But since the photometric flags have to sit in arrays alongside actual photometry values, we have less flexibility.

Does this match what you are thinking? Or is there a bigger point I am missing?

@egawiser
Copy link

I think that's fine, but I was trying to point out that you may encounter flags in the spec-z column of a spec-z survey that correspond to "not detected". And if you create a spec-z column for an LSST galaxy catalog that tracks the best known spec-z for every galaxy, you would then need values for both "not observed" and "not detected".

@aimalz aimalz added the question Further information is requested label Jan 23, 2023
@eacharles eacharles transferred this issue from LSSTDESC/rail_attic Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants