-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parser for TESS DVT files #164
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #164 +/- ##
==========================================
- Coverage 93.92% 90.12% -3.81%
==========================================
Files 37 41 +4
Lines 1598 2187 +589
==========================================
+ Hits 1501 1971 +470
- Misses 97 216 +119 ☔ View full report in Codecov by Sentry. |
I think this is ready for testing/opinions on how it's implemented, I imagine I'll need to iterate a little. |
flux_err=data['LC_INIT_ERR']) | ||
lc.meta = hdulist[0].header | ||
lc.meta['MISSION'] = 'TESS DVT' | ||
lc.meta['FLUX_ORIGIN'] = "LC_INIT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the (near?) future, we should either add flux column validation or work to ensure that non-standard flux columns are selectable.
By default, all of the table columns will be options in the Flux Column plugin. So far, we haven't written any intentional support for plotting models, but there are pre-computed transit light curve models in the DVT files, stored under the "MODEL_INIT" and "MODEL_WHITE" columns. This causes a hiccup if you try to set "MODEL_WHITE" as the flux column, since we look for a corresponding _err
column, and none exists for the model:
lcviz/lcviz/components/components.py
Lines 224 to 225 in 0f18bec
self.app._jdaviz_helper._set_data_component(dc_item, 'flux', dc_item[self.selected]) | |
self.app._jdaviz_helper._set_data_component(dc_item, 'flux_err', dc_item[self.selected+"_err"]) # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using .get
instead might fix L225, but we should check that's the only issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Down the road, it'll be great to change the plot options defaults for "MODEL_*" columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I fixed this in components.py
but it needs more testing to make sure something downstream isn't broken by an all-nan uncertainty array.
""" | ||
# Determine if we're loading a DVT file, which has a separate parser | ||
if isinstance(data, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be just as easy to support HDUList too, which would be helpful for writing tests and for files we make/edit on the fly. Can we add that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but handling HDULists in general is probably a separate ticket/PR.
lcviz/parsers.py
Outdated
lc = lightkurve.LightCurve(data=data, | ||
flux=data['LC_INIT'], | ||
flux_err=data['LC_INIT_ERR']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some work to do on the units here. We're currently expecting the time unit to be in JD (I think?), but these files have the units:
TTYPE1 = 'TIME ' / column title: data time stamps
TFORM1 = 'D ' / column format: 64-bit floating point
TUNIT1 = 'BJD - 2457000, days' / column units: Barycenter corrected TESS Julian
but the offset of 2457000
isn't being reapplied in this parser yet, so the time axis is prehistoric:
We might also need to intentionally not load the DVT's phase column, or mark its caveats. The phase units are not the (0, 1) orbital phase that we usually use:
TTYPE4 = 'PHASE ' / column title: Phase using period and epoch
TFORM4 = 'E ' / column format: 32-bit floating point
TUNIT4 = 'days ' / column units: [-0.25*period, 0.75*period]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, we probably shouldn't load the phase column but could load the ephemeris and let lcviz automatically phase-fold (if that's easy - otherwise can be a follow-up ticket)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, we probably shouldn't load the phase column but could load the ephemeris and let lcviz automatically phase-fold (if that's easy - otherwise can be a follow-up ticket)
I already made a pass at doing that at L53-54.
@bmorris3 I fixed the time handling, but I'm surprised that using the epoch and period from the DVT header doesn't seem to result in a nice light curve in the time vs phase viewer: |
lcviz/parsers.py
Outdated
# add ephemeris information from the DVT extension | ||
if ephem_plugin is not None and show_ext_in_viewer: | ||
ephem_plugin.period = header['TPERIOD'] | ||
ephem_plugin.t0 = header['TEPOCH'] + time_offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The midtransit time doesn't align with this epoch because of the time conventions – when we get it right, the midtransit time should be phase zero.
You're correctly applying the BJD offset to TEPOCH
, but then you need to subtract the LC's coordinate reference time like this:
dvt_file = "https://archive.stsci.edu/missions/tess/tid/s0002/0000/0001/0010/0827/tess2018235142541-s0002-s0002-0000000100100827-00109_dvt.fits"
with fits.open(dvt_file, mode='readonly') as hdulist:
print(
"t0 for ephemeris plugin:",
(hdulist[1].header['TEPOCH'] + 2457000) -
lcviz.app.data_collection[0].coords.reference_time.jd
)
That puts the transit at phase zero.
lcviz/components/components.py
Outdated
nan_errs = np.empty(dc_item['flux'].shape) | ||
nan_errs[:] = np.nan | ||
self.app._jdaviz_helper._set_data_component(dc_item, 'flux_err', nan_errs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explicitly add support for light curves without flux uncertainties, rather than adding nan uncertainties? If/when we add fitting features, we'll have to undo the nans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What currently requires uncertainties? I can't think of anywhere that we currently use them (although that really should be bumped up near the top of our priority list) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we can't change the flux_err
component to be None
, because you can't change the array shape of a component. We could remove the flux_err
component entirely in this case and add it back when needed, does that sound better than setting the array to NaNs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so the problem is if some flux entries in the same data-entry have uncertainties but others do not (if none have uncertainties, then we could just not call _set_data_component
ever and that would be ok, right?)?
I think maybe just a comment here to explain why we have to do this (or removing/re-adding) so that when we get to using the uncertainties we can adjust as needed... its hard for me to say which is the better/worse choice at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's LC_INIT
with LC_INIT_ERR
, but for example LC_WHITE
doesn't have an LC_WHITE_ERR
column. We'll need a check either way when fitting is added, either for whether the error exists or if it's all NaN, just a matter of which to check. Sounds like @kecnry and I don't have a strong preference so I can defer to @bmorris3 and remove the component instead of doing all NaN.
Co-authored-by: Brett M. Morris <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small comment, otherwise looks good to me. Are there any necessary follow-ups or does this at least achieve everything requested by MAST?
lcviz/parsers.py
Outdated
|
||
# Loop through the TCEs in the file. If we only want one (specified by | ||
# `extname` keyword) then only load that one into the viewers and ephemeris. | ||
for i in range(1, len(hdulist)-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just iterate over hdulist directly (for hdulitem in hdulist[1:]
) or is the iterator i
needed anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at some point I was going to accept an integer rather than string name for extension, I can change this to what you suggested.
I think that this covers everything MAST needs, if they decide they want to provide an integer instead of |
dev failures seem unrelated - will follow-up on that separately. Thanks! |
By default, this will load each TCE extension into the data collection as a separate object and display them all. Specifying
extname
will skip adding the deselected extensions into viewers. Opening as draft since I need to fix the auto-generated data label. I probably also need to think a little harder about what to do with the ephemeris in the case of multiple TCE extensions that are all added to the viewer.