Add support for cumulative counts at different levels #82

fedorov · 2023-06-26T21:00:48Z

Whenever selected level of data hierarchy is queried, it would be helpful to know characteristics of the hierarchy item, such as:

Collection: how many patients are contained
Patient: how many studies
Study: how many series, what modalities are included
Series: how many slices, what is the series size (probably in MB, for brevity)

Would this be possible?

bcli4d · 2023-06-27T20:58:08Z

Here's a query and the results that could be used to get the number of studies in a cohort. The query would return a list of the distinct StudyInstanceUIDs in the cohort. I set the page_size=0, so 0 rows are returned, but the totalFound field shows that there are 153 studies in the cohort. Instead of StudyInstanceUID, one can specify Collection_ID, Patient_ID, SeriesInstanceUID or SOPInstanceUID to get corresponding counts:

bcliffor@etl-dev-whc:~$ curl -s -X POST "https://api.imaging.datacommons.cancer.gov/v1/cohorts/manifest/preview?StudyInstanceUID=True&page_size=0" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"name\": \"mycohort\", \"description\": \"Example description\", \"filters\": { \"collection_id\": [ \"tcga_read\" ], \"age_at_diagnosis_btw\": [50,90] }}"|jq
{
  "code": 200,
  "cohort": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "age_at_diagnosis_btw": [
          50,
          90
        ],
        "collection_id": [
          "tcga_read"
        ]
      },
      "idc_data_version": "14.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "manifest": {
    "json_manifest": [],
    "rowsReturned": 0,
    "totalFound": 152
  },
  "next_page": ""
}

bcli4d · 2023-06-30T21:29:22Z

re: Series: how many slices, what is the series size (probably in MB, for brevity)
The issue is what would this do if the query fields are, e.g. just patient_id and series_size? Each row would be a patient ID and the size of some anonymous series (or actually of all series which have that same size.)

I guess the API could reject such a query or any query that doesn't have series or instance granularity.

Or the field ID could be 'size' and the API would return a size at the granularity of the row. So if manifest is instance level, then size is instance size; if granularity is series level, then size is series size, etc. If we were to do this, the column ID should probably indicate the granularlity: instance_size, series_size, etc,

Or like above, but the API would return the size at the row granularity and all higher granularities. E.G. if the manifest has series granularity, the there would be series_size, study_size, patient_size and collection_size columns.

bcli4d self-assigned this Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for cumulative counts at different levels #82

Add support for cumulative counts at different levels #82

fedorov commented Jun 26, 2023

bcli4d commented Jun 27, 2023 •

edited

Loading

bcli4d commented Jun 30, 2023

Add support for cumulative counts at different levels #82

Add support for cumulative counts at different levels #82

Comments

fedorov commented Jun 26, 2023

bcli4d commented Jun 27, 2023 • edited Loading

bcli4d commented Jun 30, 2023

bcli4d commented Jun 27, 2023 •

edited

Loading