Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cumulative counts at different levels #82

Open
fedorov opened this issue Jun 26, 2023 · 2 comments
Open

Add support for cumulative counts at different levels #82

fedorov opened this issue Jun 26, 2023 · 2 comments
Assignees

Comments

@fedorov
Copy link
Member

fedorov commented Jun 26, 2023

Whenever selected level of data hierarchy is queried, it would be helpful to know characteristics of the hierarchy item, such as:

  • Collection: how many patients are contained
  • Patient: how many studies
  • Study: how many series, what modalities are included
  • Series: how many slices, what is the series size (probably in MB, for brevity)

Would this be possible?

@bcli4d
Copy link
Member

bcli4d commented Jun 27, 2023

Here's a query and the results that could be used to get the number of studies in a cohort. The query would return a list of the distinct StudyInstanceUIDs in the cohort. I set the page_size=0, so 0 rows are returned, but the totalFound field shows that there are 153 studies in the cohort. Instead of StudyInstanceUID, one can specify Collection_ID, Patient_ID, SeriesInstanceUID or SOPInstanceUID to get corresponding counts:

bcliffor@etl-dev-whc:~$ curl -s -X POST "https://api.imaging.datacommons.cancer.gov/v1/cohorts/manifest/preview?StudyInstanceUID=True&page_size=0" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"name\": \"mycohort\", \"description\": \"Example description\", \"filters\": { \"collection_id\": [ \"tcga_read\" ], \"age_at_diagnosis_btw\": [50,90] }}"|jq
{
  "code": 200,
  "cohort": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "age_at_diagnosis_btw": [
          50,
          90
        ],
        "collection_id": [
          "tcga_read"
        ]
      },
      "idc_data_version": "14.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "manifest": {
    "json_manifest": [],
    "rowsReturned": 0,
    "totalFound": 152
  },
  "next_page": ""
}

@bcli4d
Copy link
Member

bcli4d commented Jun 30, 2023

re: Series: how many slices, what is the series size (probably in MB, for brevity)
The issue is what would this do if the query fields are, e.g. just patient_id and series_size? Each row would be a patient ID and the size of some anonymous series (or actually of all series which have that same size.)

I guess the API could reject such a query or any query that doesn't have series or instance granularity.

Or the field ID could be 'size' and the API would return a size at the granularity of the row. So if manifest is instance level, then size is instance size; if granularity is series level, then size is series size, etc. If we were to do this, the column ID should probably indicate the granularlity: instance_size, series_size, etc,

Or like above, but the API would return the size at the row granularity and all higher granularities. E.G. if the manifest has series granularity, the there would be series_size, study_size, patient_size and collection_size columns.

@bcli4d bcli4d self-assigned this Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants