Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify carton_program_search to accept initial query #23

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

albireox
Copy link
Member

@albireox albireox commented Jun 12, 2024

Fixes #20

Modifies carton_program_search to appenth the carton/program filter to an input query. The query/main route backend now sends the initial query (cone search or sdss_id-based) to carton_program_search.

I think this is ready for review, but while testing this against the pipelines DB I found some inconsistencies that I don't understand. I'll add them here but this may be a different issue/bug.

If I do a request to /query/main with parameters

{
  "ra": 315.01417,
  "dec": 45.299,
  "radius": 0.1,
  "units": "degree",
  "id": 23326,
  "program": "mwm_gg"
}

I get the following response

{
  "status": "success",
  "msg": "data successfully retrieved",
  "data": [
    {
      "sdss_id": 68024942,
      "ra_sdss_id": 315.05084411121544,
      "dec_sdss_id": 45.20588033521532,
      "catalogid21": 4220994466,
      "catalogid25": 27021597780505884,
      "catalogid31": 63050395029375420,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68024943,
      "ra_sdss_id": 315.062827092735,
      "dec_sdss_id": 45.21487254418657,
      "catalogid21": 4220994932,
      "catalogid25": 27021597780506348,
      "catalogid31": 63050395029375460,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68024961,
      "ra_sdss_id": 315.09597354158774,
      "dec_sdss_id": 45.27862222819132,
      "catalogid21": 4220997447,
      "catalogid25": 27021597780508864,
      "catalogid31": 63050395029376104,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68024987,
      "ra_sdss_id": 314.95347840004644,
      "dec_sdss_id": 45.216076160585764,
      "catalogid21": 4220994710,
      "catalogid25": 27021597780506130,
      "catalogid31": 63050395029376960,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025009,
      "ra_sdss_id": 314.94202956896544,
      "dec_sdss_id": 45.23171132283618,
      "catalogid21": 4220995714,
      "catalogid25": 27021597780507132,
      "catalogid31": 63050395029377660,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025016,
      "ra_sdss_id": 314.89990763754577,
      "dec_sdss_id": 45.270289766329235,
      "catalogid21": 4220996144,
      "catalogid25": 27021597780507560,
      "catalogid31": 63050395029377930,
      "in_boss": false,
      "in_apogee": true,
      "in_astra": true
    },
    {
      "sdss_id": 68025018,
      "ra_sdss_id": 314.89693109761066,
      "dec_sdss_id": 45.27824207067139,
      "catalogid21": 4221000580,
      "catalogid25": 27021597780511990,
      "catalogid31": 63050395029377960,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025019,
      "ra_sdss_id": 315.00491792931393,
      "dec_sdss_id": 45.202994234609825,
      "catalogid21": 4220994499,
      "catalogid25": 27021597780505916,
      "catalogid31": 63050395029377980,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025024,
      "ra_sdss_id": 315.0335240455252,
      "dec_sdss_id": 45.243782636228346,
      "catalogid21": 4220995289,
      "catalogid25": 27021597780506704,
      "catalogid31": 63050395029378100,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025031,
      "ra_sdss_id": 315.0008232677731,
      "dec_sdss_id": 45.27332492307533,
      "catalogid21": 4220996357,
      "catalogid25": 27021597780507776,
      "catalogid31": 63050395029378420,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025035,
      "ra_sdss_id": 315.04050094391545,
      "dec_sdss_id": 45.265637026700745,
      "catalogid21": 4220996374,
      "catalogid25": 27021597780507790,
      "catalogid31": 63050395029378510,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025038,
      "ra_sdss_id": 315.06185546679467,
      "dec_sdss_id": 45.30014261435378,
      "catalogid21": 4220998720,
      "catalogid25": 27021597780510136,
      "catalogid31": 63050395029378660,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025046,
      "ra_sdss_id": 314.9601099202946,
      "dec_sdss_id": 45.26546724961669,
      "catalogid21": 4220996549,
      "catalogid25": 27021597780507970,
      "catalogid31": 63050395029379060,
      "in_boss": false,
      "in_apogee": true,
      "in_astra": true
    },
    {
      "sdss_id": 68025050,
      "ra_sdss_id": 314.9717849168135,
      "dec_sdss_id": 45.29245137188428,
      "catalogid21": 4220996722,
      "catalogid25": 27021597780508136,
      "catalogid31": 63050395029379240,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025052,
      "ra_sdss_id": 314.91147854099677,
      "dec_sdss_id": 45.287862738192686,
      "catalogid21": 4221001109,
      "catalogid25": 27021597780512520,
      "catalogid31": 63050395029379380,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025055,
      "ra_sdss_id": 314.961038303085,
      "dec_sdss_id": 45.323149633427676,
      "catalogid21": 4221001406,
      "catalogid25": 27021597780512816,
      "catalogid31": 63050395029379460,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025060,
      "ra_sdss_id": 315.03812617377923,
      "dec_sdss_id": 45.346884016876224,
      "catalogid21": 4221003535,
      "catalogid25": 27021597780514944,
      "catalogid31": 63050395029379760,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025063,
      "ra_sdss_id": 315.0109459811423,
      "dec_sdss_id": 45.36783499184685,
      "catalogid21": 4221003973,
      "catalogid25": 27021597780515380,
      "catalogid31": 63050395029380030,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025193,
      "ra_sdss_id": 315.1012441652457,
      "dec_sdss_id": 45.32288809841765,
      "catalogid21": 4220998956,
      "catalogid25": 27021597780510370,
      "catalogid31": 63050395029385690,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025197,
      "ra_sdss_id": 315.11669853225345,
      "dec_sdss_id": 45.3534231499345,
      "catalogid21": 4220999976,
      "catalogid25": 27021597780511388,
      "catalogid31": 63050395029385910,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025198,
      "ra_sdss_id": 315.1219170037156,
      "dec_sdss_id": 45.359593784411366,
      "catalogid21": 4220999981,
      "catalogid25": 27021597780511390,
      "catalogid31": 63050395029385910,
      "in_boss": false,
      "in_apogee": true,
      "in_astra": true
    },
    {
      "sdss_id": 68025383,
      "ra_sdss_id": 314.92181922071006,
      "dec_sdss_id": 45.36748657355299,
      "catalogid21": 4221005047,
      "catalogid25": 27021597780516456,
      "catalogid31": 63050395029393650,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    }
  ]
}

This corresponds to the query

SELECT DISTINCT ON ("t1"."sdss_id") "t2"."sdss_id", "t2"."catalogid21", "t2"."catalogid25", "t2"."catalogid31", "t2"."ra_sdss_id", "t2"."dec_sdss_id", "t1"."in_boss", "t1"."in_apogee", "t1"."in_astra" FROM "vizdb"."sdss_id_stacked" AS "t2" INNER JOIN "vizdb"."sdss_id_flat" AS "t3" ON ("t3"."sdss_id" = "t2"."sdss_id") INNER JOIN "targetdb"."target" AS "t4" ON ("t4"."catalogid" = "t3"."catalogid") INNER JOIN "targetdb"."carton_to_target" AS "t5" ON ("t5"."target_pk" = "t4"."pk") INNER JOIN "targetdb"."carton" AS "t6" ON ("t5"."carton_pk" = "t6"."pk") INNER JOIN "vizdb"."sdssid_to_pipes" AS "t1" ON ("t2"."sdss_id" = "t1"."sdss_id") WHERE (q3c_radial_query("t2"."ra_sdss_id", "t2"."dec_sdss_id", 315.01417, 45.299, 0.1) AND ("t6"."program" = 'mwm_gg'))

which I print just before the route returns (after the append_to_pipes() call). This looks fine, however the same query when run in sdss5db in the pipelines machine returns

 sdss_id  | catalogid21 |    catalogid25    |    catalogid31    |     ra_sdss_id     |    dec_sdss_id     | in_boss | in_apogee | in_astra
----------+-------------+-------------------+-------------------+--------------------+--------------------+---------+-----------+----------
 68024942 |  4220994466 | 27021597780505885 | 63050395029375424 | 315.05084411121544 |  45.20588033521532 | f       | f         | f
 68024943 |  4220994932 | 27021597780506349 | 63050395029375456 |   315.062827092735 |  45.21487254418657 | f       | f         | f
 68024961 |  4220997447 | 27021597780508863 | 63050395029376102 | 315.09597354158774 |  45.27862222819132 | f       | f         | f
 68024987 |  4220994710 | 27021597780506128 | 63050395029376964 | 314.95347840004644 | 45.216076160585764 | f       | f         | f
 68025009 |  4220995714 | 27021597780507131 | 63050395029377668 | 314.94202956896544 |  45.23171132283618 | f       | f         | f
 68025016 |  4220996144 | 27021597780507561 | 63050395029377931 | 314.89990763754577 | 45.270289766329235 | f       | t         | t
 68025018 |  4221000580 | 27021597780511993 | 63050395029377962 | 314.89693109761066 |  45.27824207067139 | f       | f         | f
 68025019 |  4220994499 | 27021597780505917 | 63050395029377984 | 315.00491792931393 | 45.202994234609825 | f       | f         | f
 68025024 |  4220995289 | 27021597780506706 | 63050395029378098 |  315.0335240455252 | 45.243782636228346 | f       | f         | f
 68025031 |  4220996357 | 27021597780507774 | 63050395029378419 |  315.0008232677731 |  45.27332492307533 | f       | f         | f
 68025035 |  4220996374 | 27021597780507791 | 63050395029378513 | 315.04050094391545 | 45.265637026700745 | f       | f         | f
 68025038 |  4220998720 | 27021597780510134 | 63050395029378656 | 315.06185546679467 |  45.30014261435378 | f       | f         | f
 68025046 |  4220996549 | 27021597780507966 | 63050395029379057 |  314.9601099202946 |  45.26546724961669 | f       | t         | t
 68025050 |  4220996722 | 27021597780508138 | 63050395029379243 |  314.9717849168135 |  45.29245137188428 | f       | f         | f
 68025052 |  4221001109 | 27021597780512521 | 63050395029379376 | 314.91147854099677 | 45.287862738192686 | f       | f         | f
 68025055 |  4221001406 | 27021597780512818 | 63050395029379452 |   314.961038303085 | 45.323149633427676 | f       | f         | f
 68025060 |  4221003535 | 27021597780514943 | 63050395029379757 | 315.03812617377923 | 45.346884016876224 | f       | f         | f
 68025063 |  4221003973 | 27021597780515381 | 63050395029380030 |  315.0109459811423 |  45.36783499184685 | f       | f         | f
 68025193 |  4220998956 | 27021597780510370 | 63050395029385690 |  315.1012441652457 |  45.32288809841765 | f       | f         | f
 68025197 |  4220999976 | 27021597780511389 | 63050395029385913 | 315.11669853225345 |   45.3534231499345 | f       | f         | f
 68025198 |  4220999981 | 27021597780511394 | 63050395029385914 |  315.1219170037156 | 45.359593784411366 | f       | t         | t
 68025383 |  4221005047 | 27021597780516455 | 63050395029393648 | 314.92181922071006 |  45.36748657355299 | f       | f         | f

Note that the sdss_id and catalogid21 seem to match the API response, but the catalogid25 and catalogid31 are different (although not by much, which maybe indicates these are duplicates somehow).

The same query in Zora returns a table with the same results of the request JSON dump.

One thing I notice is that although I specified release IPL3 that does not appear in the query, although maybe that's expected for this query.

@albireox albireox marked this pull request as ready for review June 13, 2024 16:55
@albireox albireox requested a review from havok2063 as a code owner June 13, 2024 16:55
Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me so far. How does this timing compare to when doing a straight program/carton query? I assume it's faster now.

Comment on lines 267 to 277
if query is None:
query = vizdb.SDSSidFlat.select(peewee.fn.DISTINCT(vizdb.SDSSidFlat.sdss_id))

query = (query.join(
vizdb.SDSSidFlat,
on=(vizdb.SDSSidFlat.sdss_id == vizdb.SDSSidStacked.sdss_id))
.join(targetdb.Target,
on=(targetdb.Target.catalogid == vizdb.SDSSidFlat.catalogid))
.join(targetdb.CartonToTarget)
.join(targetdb.Carton)
.where(getattr(targetdb.Carton, name_type) == name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll likely need to do this for many of the query functions we have, as we add them to the main search. Is this how you'd recommend modifying them? Did you try the select_extend method? If so, how did that compare?

Relatedly, should we be writing our queries differently to make this kind of single-use or dynamic extension easier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The select_extend() will only help if we want to return more columns than in the initial query select. Do you want this function to also add returning the program and carton columns? As it is written right now that function can be called with any query that initiates with a SDSSidStacked model and it will restrict it to that carton or program.

The problem with the original query was that it would do a subquery to return all the unique sdss_ids and then subset to only those in the program or carton. That's a very expensive query and I think not necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I'd like to add the ability in the main search to specify additional columns to return. These columns in principle could come from any table, which might make things more complicated, but that can be addressed later.

RIght now only a single carton or program can be selected. It probably doesn't make since to return the carton or program name in that case. I'd like to eventually move to a multi-select option for program/carton, in which case it would be nice to have those values returned. I think the query would have to be re-written anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for multiple cartons this query would need to be rewritten. It's fairly easy to change the .where(getattr(targetdb.Carton, name_type) == name)) to .where(getattr(targetdb.Carton, name_type).in_(name))) but I'd wait until that functionality is implemented in the API/Zora since the IN statement is less efficient than ==.

But I did test adding .select_extend(targetdb.Carton.carton) for the carton and program after carton_program_search has been called and that works fine.

@havok2063
Copy link
Contributor

And yes, the data release is only needed right now for queries to the pipelines tables and getting the spectra. It isn't used for sdss_id queries or anything into catalogdb/targetdb. It's still a required parameter by valis though.

@havok2063
Copy link
Contributor

For the catalogid issues, can you check if the values returned by sdssdb match what is returned in psql or in the JSON response? I wonder if it's some kind of bigint / numerical conversion issue.

@albireox
Copy link
Member Author

I'll look into that. I thought about the overflow, int64 issue, but the thing is that the catalogids returned by Valis do exist and are in the same FoV. For example catalogid=63050395029379380 associated (in the JSON) with sdss_id=68025052 exists and has RA/Dec 314.9192239012484/45.2846467763595 so I don't think it's that. I'll dig a bit deeper.

@albireox
Copy link
Member Author

albireox commented Jun 26, 2024

OK, it took a bit but I figured out the issue. You are correct this is a bigint issue, but is not a FastAPI/valis problem. In fact if you do the query with curl or with httpx the JSON response has the correct values.

The issue happens when trying to parse the response in JavaScript (and maybe also in Python; I'm not sure what language the Swagger UI is written). Most parsers will convert the big integer to float and then back to integer, but in the process some precision is lost. For example, catalogid=27021597780505885 will be converted to 2.7021597780505884e+16 and then back to integer 27021597780505884 and in the process the original value has been lost.

The solution, in JavaScript, is to use json-bigint to parse the request response. Here is a more detailed description.

I have opened a PR to fix this in the main query.

@havok2063
Copy link
Contributor

Yeah I encountered the same issue when pulling pipeline info for a given target for the Target page. I left some comments about that on the other PR.

@havok2063 havok2063 self-requested a review June 26, 2024 15:57
@albireox
Copy link
Member Author

I saw those and made some comments there. I'll merge this one in the meantime.

@albireox albireox merged commit 580a24c into main Jun 26, 2024
2 checks passed
@albireox albireox deleted the albireox/issue20 branch June 26, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Program/Carton Constraints not working
2 participants