-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify carton_program_search to accept initial query #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me so far. How does this timing compare to when doing a straight program/carton query? I assume it's faster now.
python/valis/db/queries.py
Outdated
if query is None: | ||
query = vizdb.SDSSidFlat.select(peewee.fn.DISTINCT(vizdb.SDSSidFlat.sdss_id)) | ||
|
||
query = (query.join( | ||
vizdb.SDSSidFlat, | ||
on=(vizdb.SDSSidFlat.sdss_id == vizdb.SDSSidStacked.sdss_id)) | ||
.join(targetdb.Target, | ||
on=(targetdb.Target.catalogid == vizdb.SDSSidFlat.catalogid)) | ||
.join(targetdb.CartonToTarget) | ||
.join(targetdb.Carton) | ||
.where(getattr(targetdb.Carton, name_type) == name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll likely need to do this for many of the query functions we have, as we add them to the main search. Is this how you'd recommend modifying them? Did you try the select_extend
method? If so, how did that compare?
Relatedly, should we be writing our queries differently to make this kind of single-use or dynamic extension easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The select_extend()
will only help if we want to return more columns than in the initial query select
. Do you want this function to also add returning the program
and carton
columns? As it is written right now that function can be called with any query that initiates with a SDSSidStacked
model and it will restrict it to that carton or program.
The problem with the original query was that it would do a subquery to return all the unique sdss_ids and then subset to only those in the program or carton. That's a very expensive query and I think not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually I'd like to add the ability in the main search to specify additional columns to return. These columns in principle could come from any table, which might make things more complicated, but that can be addressed later.
RIght now only a single carton or program can be selected. It probably doesn't make since to return the carton or program name in that case. I'd like to eventually move to a multi-select option for program/carton, in which case it would be nice to have those values returned. I think the query would have to be re-written anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for multiple cartons this query would need to be rewritten. It's fairly easy to change the .where(getattr(targetdb.Carton, name_type) == name))
to .where(getattr(targetdb.Carton, name_type).in_(name)))
but I'd wait until that functionality is implemented in the API/Zora since the IN
statement is less efficient than ==
.
But I did test adding .select_extend(targetdb.Carton.carton)
for the carton and program after carton_program_search
has been called and that works fine.
And yes, the data release is only needed right now for queries to the pipelines tables and getting the spectra. It isn't used for sdss_id queries or anything into catalogdb/targetdb. It's still a required parameter by valis though. |
For the catalogid issues, can you check if the values returned by |
I'll look into that. I thought about the overflow, int64 issue, but the thing is that the catalogids returned by Valis do exist and are in the same FoV. For example catalogid=63050395029379380 associated (in the JSON) with sdss_id=68025052 exists and has RA/Dec 314.9192239012484/45.2846467763595 so I don't think it's that. I'll dig a bit deeper. |
OK, it took a bit but I figured out the issue. You are correct this is a bigint issue, but is not a FastAPI/valis problem. In fact if you do the query with The issue happens when trying to parse the response in JavaScript (and maybe also in Python; I'm not sure what language the Swagger UI is written). Most parsers will convert the big integer to float and then back to integer, but in the process some precision is lost. For example, The solution, in JavaScript, is to use json-bigint to parse the request response. Here is a more detailed description. I have opened a PR to fix this in the main query. |
Yeah I encountered the same issue when pulling pipeline info for a given target for the Target page. I left some comments about that on the other PR. |
I saw those and made some comments there. I'll merge this one in the meantime. |
Fixes #20
Modifies
carton_program_search
to appenth the carton/program filter to an input query. Thequery/main
route backend now sends the initial query (cone search or sdss_id-based) tocarton_program_search
.I think this is ready for review, but while testing this against the
pipelines
DB I found some inconsistencies that I don't understand. I'll add them here but this may be a different issue/bug.If I do a request to
/query/main
with parametersI get the following response
This corresponds to the query
which I print just before the route returns (after the
append_to_pipes()
call). This looks fine, however the same query when run insdss5db
in thepipelines
machine returnsNote that the
sdss_id
andcatalogid21
seem to match the API response, but thecatalogid25
andcatalogid31
are different (although not by much, which maybe indicates these are duplicates somehow).The same query in
Zora
returns a table with the same results of the request JSON dump.One thing I notice is that although I specified release
IPL3
that does not appear in the query, although maybe that's expected for this query.