Modify carton_program_search to accept initial query #23

albireox · 2024-06-12T22:48:18Z

Fixes #20

Modifies carton_program_search to appenth the carton/program filter to an input query. The query/main route backend now sends the initial query (cone search or sdss_id-based) to carton_program_search.

I think this is ready for review, but while testing this against the pipelines DB I found some inconsistencies that I don't understand. I'll add them here but this may be a different issue/bug.

If I do a request to /query/main with parameters

{
  "ra": 315.01417,
  "dec": 45.299,
  "radius": 0.1,
  "units": "degree",
  "id": 23326,
  "program": "mwm_gg"
}

I get the following response

{
  "status": "success",
  "msg": "data successfully retrieved",
  "data": [
    {
      "sdss_id": 68024942,
      "ra_sdss_id": 315.05084411121544,
      "dec_sdss_id": 45.20588033521532,
      "catalogid21": 4220994466,
      "catalogid25": 27021597780505884,
      "catalogid31": 63050395029375420,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68024943,
      "ra_sdss_id": 315.062827092735,
      "dec_sdss_id": 45.21487254418657,
      "catalogid21": 4220994932,
      "catalogid25": 27021597780506348,
      "catalogid31": 63050395029375460,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68024961,
      "ra_sdss_id": 315.09597354158774,
      "dec_sdss_id": 45.27862222819132,
      "catalogid21": 4220997447,
      "catalogid25": 27021597780508864,
      "catalogid31": 63050395029376104,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68024987,
      "ra_sdss_id": 314.95347840004644,
      "dec_sdss_id": 45.216076160585764,
      "catalogid21": 4220994710,
      "catalogid25": 27021597780506130,
      "catalogid31": 63050395029376960,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025009,
      "ra_sdss_id": 314.94202956896544,
      "dec_sdss_id": 45.23171132283618,
      "catalogid21": 4220995714,
      "catalogid25": 27021597780507132,
      "catalogid31": 63050395029377660,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025016,
      "ra_sdss_id": 314.89990763754577,
      "dec_sdss_id": 45.270289766329235,
      "catalogid21": 4220996144,
      "catalogid25": 27021597780507560,
      "catalogid31": 63050395029377930,
      "in_boss": false,
      "in_apogee": true,
      "in_astra": true
    },
    {
      "sdss_id": 68025018,
      "ra_sdss_id": 314.89693109761066,
      "dec_sdss_id": 45.27824207067139,
      "catalogid21": 4221000580,
      "catalogid25": 27021597780511990,
      "catalogid31": 63050395029377960,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025019,
      "ra_sdss_id": 315.00491792931393,
      "dec_sdss_id": 45.202994234609825,
      "catalogid21": 4220994499,
      "catalogid25": 27021597780505916,
      "catalogid31": 63050395029377980,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025024,
      "ra_sdss_id": 315.0335240455252,
      "dec_sdss_id": 45.243782636228346,
      "catalogid21": 4220995289,
      "catalogid25": 27021597780506704,
      "catalogid31": 63050395029378100,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025031,
      "ra_sdss_id": 315.0008232677731,
      "dec_sdss_id": 45.27332492307533,
      "catalogid21": 4220996357,
      "catalogid25": 27021597780507776,
      "catalogid31": 63050395029378420,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025035,
      "ra_sdss_id": 315.04050094391545,
      "dec_sdss_id": 45.265637026700745,
      "catalogid21": 4220996374,
      "catalogid25": 27021597780507790,
      "catalogid31": 63050395029378510,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025038,
      "ra_sdss_id": 315.06185546679467,
      "dec_sdss_id": 45.30014261435378,
      "catalogid21": 4220998720,
      "catalogid25": 27021597780510136,
      "catalogid31": 63050395029378660,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025046,
      "ra_sdss_id": 314.9601099202946,
      "dec_sdss_id": 45.26546724961669,
      "catalogid21": 4220996549,
      "catalogid25": 27021597780507970,
      "catalogid31": 63050395029379060,
      "in_boss": false,
      "in_apogee": true,
      "in_astra": true
    },
    {
      "sdss_id": 68025050,
      "ra_sdss_id": 314.9717849168135,
      "dec_sdss_id": 45.29245137188428,
      "catalogid21": 4220996722,
      "catalogid25": 27021597780508136,
      "catalogid31": 63050395029379240,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025052,
      "ra_sdss_id": 314.91147854099677,
      "dec_sdss_id": 45.287862738192686,
      "catalogid21": 4221001109,
      "catalogid25": 27021597780512520,
      "catalogid31": 63050395029379380,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025055,
      "ra_sdss_id": 314.961038303085,
      "dec_sdss_id": 45.323149633427676,
      "catalogid21": 4221001406,
      "catalogid25": 27021597780512816,
      "catalogid31": 63050395029379460,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025060,
      "ra_sdss_id": 315.03812617377923,
      "dec_sdss_id": 45.346884016876224,
      "catalogid21": 4221003535,
      "catalogid25": 27021597780514944,
      "catalogid31": 63050395029379760,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025063,
      "ra_sdss_id": 315.0109459811423,
      "dec_sdss_id": 45.36783499184685,
      "catalogid21": 4221003973,
      "catalogid25": 27021597780515380,
      "catalogid31": 63050395029380030,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025193,
      "ra_sdss_id": 315.1012441652457,
      "dec_sdss_id": 45.32288809841765,
      "catalogid21": 4220998956,
      "catalogid25": 27021597780510370,
      "catalogid31": 63050395029385690,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025197,
      "ra_sdss_id": 315.11669853225345,
      "dec_sdss_id": 45.3534231499345,
      "catalogid21": 4220999976,
      "catalogid25": 27021597780511388,
      "catalogid31": 63050395029385910,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    },
    {
      "sdss_id": 68025198,
      "ra_sdss_id": 315.1219170037156,
      "dec_sdss_id": 45.359593784411366,
      "catalogid21": 4220999981,
      "catalogid25": 27021597780511390,
      "catalogid31": 63050395029385910,
      "in_boss": false,
      "in_apogee": true,
      "in_astra": true
    },
    {
      "sdss_id": 68025383,
      "ra_sdss_id": 314.92181922071006,
      "dec_sdss_id": 45.36748657355299,
      "catalogid21": 4221005047,
      "catalogid25": 27021597780516456,
      "catalogid31": 63050395029393650,
      "in_boss": false,
      "in_apogee": false,
      "in_astra": false
    }
  ]
}

This corresponds to the query

SELECT DISTINCT ON ("t1"."sdss_id") "t2"."sdss_id", "t2"."catalogid21", "t2"."catalogid25", "t2"."catalogid31", "t2"."ra_sdss_id", "t2"."dec_sdss_id", "t1"."in_boss", "t1"."in_apogee", "t1"."in_astra" FROM "vizdb"."sdss_id_stacked" AS "t2" INNER JOIN "vizdb"."sdss_id_flat" AS "t3" ON ("t3"."sdss_id" = "t2"."sdss_id") INNER JOIN "targetdb"."target" AS "t4" ON ("t4"."catalogid" = "t3"."catalogid") INNER JOIN "targetdb"."carton_to_target" AS "t5" ON ("t5"."target_pk" = "t4"."pk") INNER JOIN "targetdb"."carton" AS "t6" ON ("t5"."carton_pk" = "t6"."pk") INNER JOIN "vizdb"."sdssid_to_pipes" AS "t1" ON ("t2"."sdss_id" = "t1"."sdss_id") WHERE (q3c_radial_query("t2"."ra_sdss_id", "t2"."dec_sdss_id", 315.01417, 45.299, 0.1) AND ("t6"."program" = 'mwm_gg'))

which I print just before the route returns (after the append_to_pipes() call). This looks fine, however the same query when run in sdss5db in the pipelines machine returns

 sdss_id  | catalogid21 |    catalogid25    |    catalogid31    |     ra_sdss_id     |    dec_sdss_id     | in_boss | in_apogee | in_astra
----------+-------------+-------------------+-------------------+--------------------+--------------------+---------+-----------+----------
 68024942 |  4220994466 | 27021597780505885 | 63050395029375424 | 315.05084411121544 |  45.20588033521532 | f       | f         | f
 68024943 |  4220994932 | 27021597780506349 | 63050395029375456 |   315.062827092735 |  45.21487254418657 | f       | f         | f
 68024961 |  4220997447 | 27021597780508863 | 63050395029376102 | 315.09597354158774 |  45.27862222819132 | f       | f         | f
 68024987 |  4220994710 | 27021597780506128 | 63050395029376964 | 314.95347840004644 | 45.216076160585764 | f       | f         | f
 68025009 |  4220995714 | 27021597780507131 | 63050395029377668 | 314.94202956896544 |  45.23171132283618 | f       | f         | f
 68025016 |  4220996144 | 27021597780507561 | 63050395029377931 | 314.89990763754577 | 45.270289766329235 | f       | t         | t
 68025018 |  4221000580 | 27021597780511993 | 63050395029377962 | 314.89693109761066 |  45.27824207067139 | f       | f         | f
 68025019 |  4220994499 | 27021597780505917 | 63050395029377984 | 315.00491792931393 | 45.202994234609825 | f       | f         | f
 68025024 |  4220995289 | 27021597780506706 | 63050395029378098 |  315.0335240455252 | 45.243782636228346 | f       | f         | f
 68025031 |  4220996357 | 27021597780507774 | 63050395029378419 |  315.0008232677731 |  45.27332492307533 | f       | f         | f
 68025035 |  4220996374 | 27021597780507791 | 63050395029378513 | 315.04050094391545 | 45.265637026700745 | f       | f         | f
 68025038 |  4220998720 | 27021597780510134 | 63050395029378656 | 315.06185546679467 |  45.30014261435378 | f       | f         | f
 68025046 |  4220996549 | 27021597780507966 | 63050395029379057 |  314.9601099202946 |  45.26546724961669 | f       | t         | t
 68025050 |  4220996722 | 27021597780508138 | 63050395029379243 |  314.9717849168135 |  45.29245137188428 | f       | f         | f
 68025052 |  4221001109 | 27021597780512521 | 63050395029379376 | 314.91147854099677 | 45.287862738192686 | f       | f         | f
 68025055 |  4221001406 | 27021597780512818 | 63050395029379452 |   314.961038303085 | 45.323149633427676 | f       | f         | f
 68025060 |  4221003535 | 27021597780514943 | 63050395029379757 | 315.03812617377923 | 45.346884016876224 | f       | f         | f
 68025063 |  4221003973 | 27021597780515381 | 63050395029380030 |  315.0109459811423 |  45.36783499184685 | f       | f         | f
 68025193 |  4220998956 | 27021597780510370 | 63050395029385690 |  315.1012441652457 |  45.32288809841765 | f       | f         | f
 68025197 |  4220999976 | 27021597780511389 | 63050395029385913 | 315.11669853225345 |   45.3534231499345 | f       | f         | f
 68025198 |  4220999981 | 27021597780511394 | 63050395029385914 |  315.1219170037156 | 45.359593784411366 | f       | t         | t
 68025383 |  4221005047 | 27021597780516455 | 63050395029393648 | 314.92181922071006 |  45.36748657355299 | f       | f         | f

Note that the sdss_id and catalogid21 seem to match the API response, but the catalogid25 and catalogid31 are different (although not by much, which maybe indicates these are duplicates somehow).

The same query in Zora returns a table with the same results of the request JSON dump.

One thing I notice is that although I specified release IPL3 that does not appear in the query, although maybe that's expected for this query.

havok2063

This seems reasonable to me so far. How does this timing compare to when doing a straight program/carton query? I assume it's faster now.

havok2063 · 2024-06-17T19:47:05Z

python/valis/db/queries.py

+    if query is None:
+        query = vizdb.SDSSidFlat.select(peewee.fn.DISTINCT(vizdb.SDSSidFlat.sdss_id))
+
+    query = (query.join(
+                vizdb.SDSSidFlat,
+                on=(vizdb.SDSSidFlat.sdss_id == vizdb.SDSSidStacked.sdss_id))
+             .join(targetdb.Target,
+                   on=(targetdb.Target.catalogid == vizdb.SDSSidFlat.catalogid))
+             .join(targetdb.CartonToTarget)
+             .join(targetdb.Carton)
+             .where(getattr(targetdb.Carton, name_type) == name))


We'll likely need to do this for many of the query functions we have, as we add them to the main search. Is this how you'd recommend modifying them? Did you try the select_extend method? If so, how did that compare?

Relatedly, should we be writing our queries differently to make this kind of single-use or dynamic extension easier?

The select_extend() will only help if we want to return more columns than in the initial query select. Do you want this function to also add returning the program and carton columns? As it is written right now that function can be called with any query that initiates with a SDSSidStacked model and it will restrict it to that carton or program.

The problem with the original query was that it would do a subquery to return all the unique sdss_ids and then subset to only those in the program or carton. That's a very expensive query and I think not necessary.

Eventually I'd like to add the ability in the main search to specify additional columns to return. These columns in principle could come from any table, which might make things more complicated, but that can be addressed later.

RIght now only a single carton or program can be selected. It probably doesn't make since to return the carton or program name in that case. I'd like to eventually move to a multi-select option for program/carton, in which case it would be nice to have those values returned. I think the query would have to be re-written anyways.

Yes, for multiple cartons this query would need to be rewritten. It's fairly easy to change the .where(getattr(targetdb.Carton, name_type) == name)) to .where(getattr(targetdb.Carton, name_type).in_(name))) but I'd wait until that functionality is implemented in the API/Zora since the IN statement is less efficient than ==.

But I did test adding .select_extend(targetdb.Carton.carton) for the carton and program after carton_program_search has been called and that works fine.

havok2063 · 2024-06-17T19:53:12Z

And yes, the data release is only needed right now for queries to the pipelines tables and getting the spectra. It isn't used for sdss_id queries or anything into catalogdb/targetdb. It's still a required parameter by valis though.

havok2063 · 2024-06-17T19:56:44Z

For the catalogid issues, can you check if the values returned by sdssdb match what is returned in psql or in the JSON response? I wonder if it's some kind of bigint / numerical conversion issue.

albireox · 2024-06-17T21:44:40Z

I'll look into that. I thought about the overflow, int64 issue, but the thing is that the catalogids returned by Valis do exist and are in the same FoV. For example catalogid=63050395029379380 associated (in the JSON) with sdss_id=68025052 exists and has RA/Dec 314.9192239012484/45.2846467763595 so I don't think it's that. I'll dig a bit deeper.

albireox · 2024-06-26T02:25:32Z

OK, it took a bit but I figured out the issue. You are correct this is a bigint issue, but is not a FastAPI/valis problem. In fact if you do the query with curl or with httpx the JSON response has the correct values.

The issue happens when trying to parse the response in JavaScript (and maybe also in Python; I'm not sure what language the Swagger UI is written). Most parsers will convert the big integer to float and then back to integer, but in the process some precision is lost. For example, catalogid=27021597780505885 will be converted to 2.7021597780505884e+16 and then back to integer 27021597780505884 and in the process the original value has been lost.

The solution, in JavaScript, is to use json-bigint to parse the request response. Here is a more detailed description.

I have opened a PR to fix this in the main query.

havok2063 · 2024-06-26T15:55:38Z

Yeah I encountered the same issue when pulling pipeline info for a given target for the Target page. I left some comments about that on the other PR.

albireox · 2024-06-26T16:34:08Z

I saw those and made some comments there. I'll merge this one in the meantime.

albireox added 3 commits June 12, 2024 16:47

Modify carton_program_search to accept initial query

7a4cf1e

Merge branch 'main' into albireox/issue20

efbe74b

Remove comment

11ed193

albireox marked this pull request as ready for review June 13, 2024 16:55

albireox requested a review from havok2063 as a code owner June 13, 2024 16:55

havok2063 reviewed Jun 17, 2024

View reviewed changes

albireox mentioned this pull request Jun 26, 2024

Use json-big to parse main query response sdss/zora#24

Merged

Initial query should be SDSSidStacked instead of SDSSidFlat

03e0ec0

havok2063 self-requested a review June 26, 2024 15:57

havok2063 approved these changes Jun 26, 2024

View reviewed changes

albireox merged commit 580a24c into main Jun 26, 2024
2 checks passed

albireox deleted the albireox/issue20 branch June 26, 2024 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify carton_program_search to accept initial query #23

Modify carton_program_search to accept initial query #23

albireox commented Jun 12, 2024 •

edited

Loading

havok2063 left a comment

havok2063 Jun 17, 2024

albireox Jun 26, 2024

havok2063 Jun 26, 2024

albireox Jun 26, 2024

havok2063 commented Jun 17, 2024

havok2063 commented Jun 17, 2024

albireox commented Jun 17, 2024

albireox commented Jun 26, 2024 •

edited

Loading

havok2063 commented Jun 26, 2024

albireox commented Jun 26, 2024

Modify carton_program_search to accept initial query #23

Modify carton_program_search to accept initial query #23

Conversation

albireox commented Jun 12, 2024 • edited Loading

havok2063 left a comment

Choose a reason for hiding this comment

havok2063 Jun 17, 2024

Choose a reason for hiding this comment

albireox Jun 26, 2024

Choose a reason for hiding this comment

havok2063 Jun 26, 2024

Choose a reason for hiding this comment

albireox Jun 26, 2024

Choose a reason for hiding this comment

havok2063 commented Jun 17, 2024

havok2063 commented Jun 17, 2024

albireox commented Jun 17, 2024

albireox commented Jun 26, 2024 • edited Loading

havok2063 commented Jun 26, 2024

albireox commented Jun 26, 2024

albireox commented Jun 12, 2024 •

edited

Loading

albireox commented Jun 26, 2024 •

edited

Loading