-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish non-detections and non-observations #5
Comments
Just noting that this is a general issue across RAIL and we should agree on a convention. |
A need for consistent flagging has also come up in LSSTDESC/RAIL#197, so I'm going to remove the enhancement and backburner labels. |
Let's agree that we will try to come up with a convention @ the collab metting? |
To add to this, I'd propose "flag" values not being the same type as normal values in whatever column, so it will throw errors if they aren't handled properly, as opposed to running silently but giving nonsensical results. |
As discussed at the tag-up this morning, having entries of a different type is a problem, as you can't stick different types in numpy arrays easily. I like John Franklin's suggestion on the pz-rail Slack channel of using np.inf for non-detection and np.nan for non-observation. |
This is also a key issue for spectroscopic follow-up used to generate catalogs for photo-z training and calibration, where we can expect both non-observations (many) and non-detections (which I would interpret as not yielding a spectroscopic redshift, whether or not flux is detected in the spectrum). Spectroscopic surveys usually record a quality flag ranging from e.g., 1 to 4 where 4 is a definitely correct spectroscopic redshift and 1 is a pretty wild guess (or weak cross-correlation due to a lack of clear features). I bring this up because it is common to note non-detections and non-observations in those catalogs' columns with other integers like -1 for non-observation and 0 for non-detection. Those could easily be translated to |
@egawiser I think the redshift flags can follow their own convention. For galaxies with spec-z's that we use in training/calibration sets, we will need a redshift column, and a separate redshift quality column. I don't think any methods currently use quality flags, but we can perform different cuts on redshift flags to assemble different training sets. So this is a little different than photometry. For photometry, we need flags when the photometry is missing, while for redshifts, we need flags when the redshift is present. Since the redshift flags don't need to sit in the redshift arrays alongside actual redshift values, we have more flexibility in how we format them. But since the photometric flags have to sit in arrays alongside actual photometry values, we have less flexibility. Does this match what you are thinking? Or is there a bigger point I am missing? |
I think that's fine, but I was trying to point out that you may encounter flags in the spec-z column of a spec-z survey that correspond to "not detected". And if you create a spec-z column for an LSST galaxy catalog that tracks the best known spec-z for every galaxy, you would then need values for both "not observed" and "not detected". |
Non-detections (i.e. you looked and didn't see anything) and non-observations (i.e. you didn't look) are fundamentally different. The former tells you the galaxy was too faint to be observed, while the latter tells you nothing.
LSST should have observations from all bands for every galaxy it observes, but this will be relevant if we combine e.g. Roman and Euclid photometry which will have only partial coverage of the LSST catalog.
Thus, RAIL should handle these differently.
Creation:
For now, the Creation Module will only produce non-detections. Including non-observations with a different flag from non-detections would require upgrading pzflow so that it can handle multiple flags for marginalization. This should be pretty easy, but I will not work on it until someone actually needs a data set with non-observations.
When this does get included, I expect to record non-detections as
np.inf
and non-observations asnp.nan
Estimation:
In the mean time, Estimation codes should be developed as if both might appear in the data sets they ingest.
The text was updated successfully, but these errors were encountered: