Distinguish non-detections and non-observations #5

jfcrenshaw · 2021-11-04T00:59:13Z

Non-detections (i.e. you looked and didn't see anything) and non-observations (i.e. you didn't look) are fundamentally different. The former tells you the galaxy was too faint to be observed, while the latter tells you nothing.

LSST should have observations from all bands for every galaxy it observes, but this will be relevant if we combine e.g. Roman and Euclid photometry which will have only partial coverage of the LSST catalog.

Thus, RAIL should handle these differently.

Creation:
For now, the Creation Module will only produce non-detections. Including non-observations with a different flag from non-detections would require upgrading pzflow so that it can handle multiple flags for marginalization. This should be pretty easy, but I will not work on it until someone actually needs a data set with non-observations.

When this does get included, I expect to record non-detections as np.inf and non-observations as np.nan

Estimation:
In the mean time, Estimation codes should be developed as if both might appear in the data sets they ingest.

The text was updated successfully, but these errors were encountered:

eacharles · 2022-06-13T16:25:11Z

Just noting that this is a general issue across RAIL and we should agree on a convention.

aimalz · 2022-07-06T21:35:53Z

A need for consistent flagging has also come up in LSSTDESC/RAIL#197, so I'm going to remove the enhancement and backburner labels.

eacharles · 2022-07-11T16:26:31Z

Let's agree that we will try to come up with a convention @ the collab metting?

aimalz · 2022-07-11T16:36:44Z

To add to this, I'd propose "flag" values not being the same type as normal values in whatever column, so it will throw errors if they aren't handled properly, as opposed to running silently but giving nonsensical results.

sschmidt23 · 2022-07-18T19:45:40Z

As discussed at the tag-up this morning, having entries of a different type is a problem, as you can't stick different types in numpy arrays easily. I like John Franklin's suggestion on the pz-rail Slack channel of using np.inf for non-detection and np.nan for non-observation.

egawiser · 2022-07-18T20:00:29Z

This is also a key issue for spectroscopic follow-up used to generate catalogs for photo-z training and calibration, where we can expect both non-observations (many) and non-detections (which I would interpret as not yielding a spectroscopic redshift, whether or not flux is detected in the spectrum). Spectroscopic surveys usually record a quality flag ranging from e.g., 1 to 4 where 4 is a definitely correct spectroscopic redshift and 1 is a pretty wild guess (or weak cross-correlation due to a lack of clear features). I bring this up because it is common to note non-detections and non-observations in those catalogs' columns with other integers like -1 for non-observation and 0 for non-detection. Those could easily be translated to np.nan and np.inf respectively as we ingest such catalogs. But it does bring up the issue of whether we need a more flexible flag array that uses a flag value to note non-observation or non-detection and can propagate a quality flag rather than simply setting redshifts to np.nan or np.inf.

jfcrenshaw · 2022-07-19T03:46:46Z

@egawiser I think the redshift flags can follow their own convention. For galaxies with spec-z's that we use in training/calibration sets, we will need a redshift column, and a separate redshift quality column. I don't think any methods currently use quality flags, but we can perform different cuts on redshift flags to assemble different training sets.

So this is a little different than photometry. For photometry, we need flags when the photometry is missing, while for redshifts, we need flags when the redshift is present.

Since the redshift flags don't need to sit in the redshift arrays alongside actual redshift values, we have more flexibility in how we format them. But since the photometric flags have to sit in arrays alongside actual photometry values, we have less flexibility.

Does this match what you are thinking? Or is there a bigger point I am missing?

egawiser · 2022-07-20T01:14:03Z

I think that's fine, but I was trying to point out that you may encounter flags in the spec-z column of a spec-z survey that correspond to "not detected". And if you create a spec-z column for an LSST galaxy catalog that tracks the best known spec-z for every galaxy, you would then need values for both "not observed" and "not detected".

jfcrenshaw added the enhancement New feature or request label Nov 4, 2021

aimalz removed the enhancement New feature or request label Jul 6, 2022

eacharles mentioned this issue Jun 13, 2023

Make release 1.0 LSSTDESC/rail#17

Closed

16 tasks

aimalz added the question Further information is requested label Jan 23, 2023

aimalz mentioned this issue Jun 13, 2023

Evaluator use cases to guide comprehensive flagging system #8

Closed

eacharles transferred this issue from LSSTDESC/rail_attic Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish non-detections and non-observations #5

Distinguish non-detections and non-observations #5

jfcrenshaw commented Nov 4, 2021

eacharles commented Jun 13, 2022

aimalz commented Jul 6, 2022

eacharles commented Jul 11, 2022

aimalz commented Jul 11, 2022

sschmidt23 commented Jul 18, 2022

egawiser commented Jul 18, 2022 •

edited

Loading

jfcrenshaw commented Jul 19, 2022 •

edited

Loading

egawiser commented Jul 20, 2022

Distinguish non-detections and non-observations #5

Distinguish non-detections and non-observations #5

Comments

jfcrenshaw commented Nov 4, 2021

eacharles commented Jun 13, 2022

aimalz commented Jul 6, 2022

eacharles commented Jul 11, 2022

aimalz commented Jul 11, 2022

sschmidt23 commented Jul 18, 2022

egawiser commented Jul 18, 2022 • edited Loading

jfcrenshaw commented Jul 19, 2022 • edited Loading

egawiser commented Jul 20, 2022

egawiser commented Jul 18, 2022 •

edited

Loading

jfcrenshaw commented Jul 19, 2022 •

edited

Loading