diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..5106ab32 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,122 @@ +# Contributing to idc-index + +There are many ways to contribute to idc-index, with varying levels of effort. +Do try to look through the [documentation](idc-index-docs) first if something is +unclear, and let us know how we can do better. + +- Ask a question on the [IDC forum][idc-forum] +- Use [idc-index issues][idc-index-issues] to submit a feature request or bug, + or add to the discussion on an existing issue +- Submit a [Pull Request](https://github.com/ImagingDataCommons/idc-index/pulls) + to improve idc-index or its documentation + +We encourage a range of Pull Requests, from patches that include passing tests +and documentation, all the way down to half-baked ideas that launch discussions. + +## The PR Process, Circle CI, and Related Gotchas + +### How to submit a PR ? + +If you are new to idc-index development and you don't have push access to the +repository, here are the steps: + +1. [Fork and clone](https://docs.github.com/get-started/quickstart/fork-a-repo) + the repository. +2. Create a branch dedicated to the feature/bugfix you plan to implement (do not + use `main` branch - this will complicate further development and + collaboration) +3. [Push](https://docs.github.com/get-started/using-git/pushing-commits-to-a-remote-repository) + the branch to your GitHub fork. +4. Create a + [Pull Request](https://github.com/ImagingDataCommons/idc-index/pulls). + +This corresponds to the `Fork & Pull Model` described in the +[GitHub collaborative development](https://docs.github.com/pull-requests/collaborating-with-pull-requests/getting-started/about-collaborative-development-models) +documentation. + +When submitting a PR, the developers following the project will be notified. +That said, to engage specific developers, you can add `Cc: @` comment +to notify them of your awesome contributions. Based on the comments posted by +the reviewers, you may have to revisit your patches. + +### How to efficiently contribute ? + +We encourage all developers to: + +- set up pre-commit hooks so that you can remedy various formatting and other + issues early, without waiting for the continuous integration (CI) checks to + complete: `pre-commit install` + +- add or update tests. You can see current tests + [here](https://github.com/ImagingDataCommons/idc-index/tree/main/tests). If + you contribute new functionality, adding test(s) covering it is mandatory! + +- you can run individual tests from the root repository using the following + command: `python -m unittest -vv tests.idcindex.TestIDCClient.` + +### How to write commit messages ? + +Write your commit messages using the standard prefixes for commit messages: + +- `BUG:` Fix for runtime crash or incorrect result +- `COMP:` Compiler error or warning fix +- `DOC:` Documentation change +- `ENH:` New functionality +- `PERF:` Performance improvement +- `STYLE:` No logic impact (indentation, comments) +- `WIP:` Work In Progress not ready for merge + +The body of the message should clearly describe the motivation of the commit +(**what**, **why**, and **how**). In order to ease the task of reviewing +commits, the message body should follow the following guidelines: + +1. Leave a blank line between the subject and the body. This helps `git log` and + `git rebase` work nicely, and allows to smooth generation of release notes. +2. Try to keep the subject line below 72 characters, ideally 50. +3. Capitalize the subject line. +4. Do not end the subject line with a period. +5. Use the imperative mood in the subject line (e.g. + `BUG: Fix spacing not being considered.`). +6. Wrap the body at 80 characters. +7. Use semantic line feeds to separate different ideas, which improves the + readability. +8. Be concise, but honor the change: if significant alternative solutions were + available, explain why they were discarded. +9. If the commit refers to a topic discussed on the [IDC forum][idc-forum], or + fixes a regression test, provide the link. If it fixes a compiler error, + provide a minimal verbatim message of the compiler error. If the commit + closes an issue, use the + [GitHub issue closing keywords](https://docs.github.com/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue). + +Keep in mind that the significant time is invested in reviewing commits and +_pull requests_, so following these guidelines will greatly help the people +doing reviews. + +These guidelines are largely inspired by Chris Beam's +[How to Write a Commit Message](https://chris.beams.io/posts/git-commit/) post. + +### How to integrate a PR ? + +Getting your contributions integrated is relatively straightforward, here is the +checklist: + +- All tests pass +- Consensus is reached. This usually means that at least two reviewers approved + the changes (or added a `LGTM` comment) and at least one business day passed + without anyone objecting. `LGTM` is an acronym for _Looks Good to Me_. +- To accommodate developers explicitly asking for more time to test the proposed + changes, integration time can be delayed by few more days. +- If you do NOT have push access, a core developer will integrate your PR. If + you would like to speed up the integration, do not hesitate to add a reminder + comment to the PR + +### Automatic testing of pull requests + +Every pull request is tested automatically using GitHub Actions each time you +push a commit to it. The GitHub UI will restrict users from merging pull +requests until the CI build has returned with a successful result indicating +that all tests have passed. + +[idc-forum]: https://discourse.canceridc.dev +[idc-index-issues]: https://github.com/ImagingDataCommons/idc-index/issues +[idc-index-docs]: https://idc-index.readthedocs.io/ diff --git a/idc_index/index.py b/idc_index/index.py index 8a5e33f7..4089342c 100644 --- a/idc_index/index.py +++ b/idc_index/index.py @@ -21,6 +21,7 @@ aws_endpoint_url = "https://s3.amazonaws.com" gcp_endpoint_url = "https://storage.googleapis.com" +asset_endpoint_url = f"https://api.github.com/repos/ImagingDataCommons/idc-index-data/releases/tags/{idc_index_data.__version__}" logging.basicConfig(format="%(asctime)s - %(message)s", level=logging.INFO) logger = logging.getLogger(__name__) @@ -67,7 +68,24 @@ def __init__(self): self.collection_summary = self.index.groupby("collection_id").agg( {"Modality": pd.Series.unique, "series_size_MB": "sum"} ) - self.indices_overview = self.list_indices() + + self.indices_overview = pd.DataFrame( + { + "index": {"description": None, "installed": True, "url": None}, + "sm_index": { + "description": None, + "installed": True, + "url": os.path.join(asset_endpoint_url, "sm_index.parquet"), + }, + "sm_instance_index": { + "description": None, + "installed": True, + "url": os.path.join( + asset_endpoint_url, "sm_instance_index.parquet" + ), + }, + } + ) # Lookup s5cmd self.s5cmdPath = shutil.which("s5cmd") @@ -172,33 +190,6 @@ def get_idc_version(): idc_version = Version(idc_index_data.__version__).major return f"v{idc_version}" - @staticmethod - def _get_latest_idc_index_data_release_assets(): - """ - Retrieves a list of the latest idc-index-data release assets. - - Returns: - release_assets (list): List of tuples (asset_name, asset_url). - """ - release_assets = [] - url = f"https://api.github.com/repos/ImagingDataCommons/idc-index-data/releases/tags/{idc_index_data.__version__}" - try: - response = requests.get(url, timeout=30) - if response.status_code == 200: - release_data = response.json() - assets = release_data.get("assets", []) - for asset in assets: - release_assets.append( - (asset["name"], asset["browser_download_url"]) - ) - else: - logger.error(f"Failed to fetch releases: {response.status_code}") - - except FileNotFoundError: - logger.error(f"Failed to fetch releases: {response.status_code}") - - return release_assets - def list_indices(self): """ Lists all available indices including their installation status. @@ -207,40 +198,6 @@ def list_indices(self): indices_overview (pd.DataFrame): DataFrame containing information per index. """ - if "indices_overview" not in locals(): - indices_overview = {} - # Find installed indices - for file in distribution("idc-index-data").files: - if str(file).endswith("index.parquet"): - index_name = os.path.splitext( - str(file).rsplit("/", maxsplit=1)[-1] - )[0] - - indices_overview[index_name] = { - "description": None, - "installed": True, - "local_path": os.path.join( - idc_index_data.IDC_INDEX_PARQUET_FILEPATH.parents[0], - f"{index_name}.parquet", - ), - } - - # Find available indices from idc-index-data - release_assets = self._get_latest_idc_index_data_release_assets() - for asset_name, asset_url in release_assets: - if asset_name.endswith(".parquet"): - asset_name = os.path.splitext(asset_name)[0] - if asset_name not in indices_overview: - indices_overview[asset_name] = { - "description": None, - "installed": False, - "url": asset_url, - } - - self.indices_overview = pd.DataFrame.from_dict( - indices_overview, orient="index" - ) - return self.indices_overview def fetch_index(self, index) -> None: @@ -251,14 +208,14 @@ def fetch_index(self, index) -> None: index (str): Name of the index to be downloaded. """ - if index not in self.indices_overview.index.tolist(): + if index not in self.indices_overview.keys(): logger.error(f"Index {index} is not available and can not be fetched.") - elif self.indices_overview.loc[index, "installed"]: + elif self.indices_overview[index]["installed"]: logger.warning( f"Index {index} already installed and will not be fetched again." ) else: - response = requests.get(self.indices_overview.loc[index, "url"], timeout=30) + response = requests.get(self.indices_overview[index]["url"], timeout=30) if response.status_code == 200: filepath = os.path.join( idc_index_data.IDC_INDEX_PARQUET_FILEPATH.parents[0], @@ -266,8 +223,7 @@ def fetch_index(self, index) -> None: ) with open(filepath, mode="wb") as file: file.write(response.content) - self.indices_overview.loc[index, "installed"] = True - self.indices_overview.loc[index, "local_path"] = filepath + self.indices_overview[index]["installed"] = True else: logger.error(f"Failed to fetch index: {response.status_code}") @@ -668,8 +624,8 @@ def _validate_update_manifest_and_get_download_size( # create a copy of the index index_df_copy = self.index - # Extract s3 url and crdc_instance_uuid from the manifest copy commands - # Next, extract crdc_instance_uuid from aws_series_url in the index and + # Extract s3 url and crdc_series_uuid from the manifest copy commands + # Next, extract crdc_series_uuid from aws_series_url in the index and # try to verify if every series in the manifest is present in the index # TODO: need to remove the assumption that manifest commands will have 'cp' @@ -697,8 +653,9 @@ def _validate_update_manifest_and_get_download_size( seriesInstanceuid, s3_url, series_size_MB, - index_crdc_series_uuid==manifest_crdc_series_uuid AS crdc_series_uuid_match, + index_crdc_series_uuid is not NULL as crdc_series_uuid_match, s3_url==series_aws_url AS s3_url_match, + manifest_temp.manifest_cp_cmd, CASE WHEN s3_url==series_aws_url THEN 'aws' ELSE @@ -717,19 +674,23 @@ def _validate_update_manifest_and_get_download_size( endpoint_to_use = None - if validate_manifest: - # Check if crdc_instance_uuid is found in the index - if not all(merged_df["crdc_series_uuid_match"]): - missing_manifest_cp_cmds = merged_df.loc[ - ~merged_df["crdc_series_uuid_match"], "manifest_cp_cmd" - ] - missing_manifest_cp_cmds_str = f"The following manifest copy commands do not have any associated series in the index: {missing_manifest_cp_cmds.tolist()}" - raise ValueError(missing_manifest_cp_cmds_str) + # Check if any crdc_series_uuid are not found in the index + if not all(merged_df["crdc_series_uuid_match"]): + missing_manifest_cp_cmds = merged_df.loc[ + ~merged_df["crdc_series_uuid_match"], "manifest_cp_cmd" + ] + logger.error( + "The following manifest copy commands are not recognized as referencing any associated series in the index.\n" + "This means either these commands are invalid, or they may correspond to files available in a release of IDC\n" + f"different from {self.get_idc_version()} used in this version of idc-index. The corresponding files will not be downloaded.\n" + ) + logger.error("\n" + "\n".join(missing_manifest_cp_cmds.tolist())) - # Check if there are more than one endpoints + if validate_manifest: + # Check if there is more than one endpoint if len(merged_df["endpoint"].unique()) > 1: raise ValueError( - "Either GCS bucket path is invalid or manifest has a mix of GCS and AWS urls. If so, please use urls from one provider only" + "Either GCS bucket path is invalid or manifest has a mix of GCS and AWS urls. " ) if ( diff --git a/pyproject.toml b/pyproject.toml index a4d8c825..92304920 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -125,6 +125,7 @@ disallow_incomplete_defs = true [tool.ruff] src = ["idc_index"] +extend-exclude = ["./CONTRIBUTING.md"] [tool.ruff.lint] extend-select = [ diff --git a/tests/idcindex.py b/tests/idcindex.py deleted file mode 100644 index c0806134..00000000 --- a/tests/idcindex.py +++ /dev/null @@ -1,476 +0,0 @@ -from __future__ import annotations - -import logging -import os -import tempfile -import unittest -from itertools import product -from pathlib import Path - -import pandas as pd -import pytest -from click.testing import CliRunner -from idc_index import IDCClient, cli - -# Run tests using the following command from the root of the repository: -# python -m unittest -vv tests/idcindex.py - -logging.basicConfig(level=logging.DEBUG) - - -@pytest.fixture(autouse=True) -def _change_test_dir(request, monkeypatch): - monkeypatch.chdir(request.fspath.dirname) - - -class TestIDCClient(unittest.TestCase): - def setUp(self): - self.client = IDCClient() - self.download_from_manifest = cli.download_from_manifest - self.download_from_selection = cli.download_from_selection - self.download = cli.download - - logger = logging.getLogger("idc_index") - logger.setLevel(logging.DEBUG) - - def test_get_collections(self): - collections = self.client.get_collections() - self.assertIsNotNone(collections) - - def test_get_idc_version(self): - idc_version = self.client.get_idc_version() - self.assertIsNotNone(idc_version) - self.assertTrue(idc_version.startswith("v")) - - def test_get_patients(self): - # Define the values for each optional parameter - output_format_values = ["list", "dict", "df"] - collection_id_values = [ - "htan_ohsu", - ["ct_phantom4radiomics", "cmb_gec"], - ] - - # Test each combination - for collection_id in collection_id_values: - for output_format in output_format_values: - patients = self.client.get_patients( - collection_id=collection_id, outputFormat=output_format - ) - - # Check if the output format matches the expected type - if output_format == "list": - self.assertIsInstance(patients, list) - self.assertTrue(bool(patients)) # Check that the list is not empty - elif output_format == "dict": - self.assertTrue( - isinstance(patients, dict) - or ( - isinstance(patients, list) - and all(isinstance(i, dict) for i in patients) - ) - ) # Check that the output is either a dictionary or a list of dictionaries - self.assertTrue( - bool(patients) - ) # Check that the output is not empty - elif output_format == "df": - self.assertIsInstance(patients, pd.DataFrame) - self.assertFalse( - patients.empty - ) # Check that the DataFrame is not empty - - def test_get_studies(self): - # Define the values for each optional parameter - output_format_values = ["list", "dict", "df"] - patient_id_values = ["PCAMPMRI-00001", ["PCAMPMRI-00001", "NoduleLayout_1"]] - - # Test each combination - for patient_id in patient_id_values: - for output_format in output_format_values: - studies = self.client.get_dicom_studies( - patientId=patient_id, outputFormat=output_format - ) - - # Check if the output format matches the expected type - if output_format == "list": - self.assertIsInstance(studies, list) - self.assertTrue(bool(studies)) # Check that the list is not empty - elif output_format == "dict": - self.assertTrue( - isinstance(studies, dict) - or ( - isinstance(studies, list) - and all(isinstance(i, dict) for i in studies) - ) - ) # Check that the output is either a dictionary or a list of dictionaries - self.assertTrue(bool(studies)) # Check that the output is not empty - elif output_format == "df": - self.assertIsInstance(studies, pd.DataFrame) - self.assertFalse( - studies.empty - ) # Check that the DataFrame is not empty - - def test_get_series(self): - """ - Query used for selecting the smallest series/studies: - - SELECT - StudyInstanceUID, - ARRAY_AGG(DISTINCT(collection_id)) AS collection, - ARRAY_AGG(DISTINCT(series_aws_url)) AS aws_url, - ARRAY_AGG(DISTINCT(series_gcs_url)) AS gcs_url, - COUNT(DISTINCT(SOPInstanceUID)) AS num_instances, - SUM(instance_size) AS series_size - FROM - `bigquery-public-data.idc_current.dicom_all` - GROUP BY - StudyInstanceUID - HAVING - num_instances > 2 - ORDER BY - series_size asc - LIMIT - 10 - """ - # Define the values for each optional parameter - output_format_values = ["list", "dict", "df"] - study_instance_uid_values = [ - "1.3.6.1.4.1.14519.5.2.1.6279.6001.175012972118199124641098335511", - [ - "1.3.6.1.4.1.14519.5.2.1.1239.1759.691327824408089993476361149761", - "1.3.6.1.4.1.14519.5.2.1.1239.1759.272272273744698671736205545239", - ], - ] - - # Test each combination - for study_instance_uid in study_instance_uid_values: - for output_format in output_format_values: - series = self.client.get_dicom_series( - studyInstanceUID=study_instance_uid, outputFormat=output_format - ) - - # Check if the output format matches the expected type - if output_format == "list": - self.assertIsInstance(series, list) - self.assertTrue(bool(series)) # Check that the list is not empty - elif output_format == "dict": - self.assertTrue( - isinstance(series, dict) - or ( - isinstance(series, list) - and all(isinstance(i, dict) for i in series) - ) - ) # Check that the output is either a dictionary or a list of dictionaries - elif output_format == "df": - self.assertIsInstance(series, pd.DataFrame) - self.assertFalse( - series.empty - ) # Check that the DataFrame is not empty - - def test_download_dicom_series(self): - with tempfile.TemporaryDirectory() as temp_dir: - self.client.download_dicom_series( - seriesInstanceUID="1.3.6.1.4.1.14519.5.2.1.7695.1700.153974929648969296590126728101", - downloadDir=temp_dir, - ) - self.assertEqual(sum([len(files) for r, d, files in os.walk(temp_dir)]), 3) - - def test_download_with_template(self): - dirTemplateValues = [ - None, - "%collection_id_%PatientID/%Modality-%StudyInstanceUID%SeriesInstanceUID", - "%collection_id%PatientID-%Modality_%StudyInstanceUID/%SeriesInstanceUID", - "%collection_id-%PatientID_%Modality/%StudyInstanceUID-%SeriesInstanceUID", - "%collection_id_%PatientID/%Modality/%StudyInstanceUID_%SeriesInstanceUID", - ] - for template in dirTemplateValues: - with tempfile.TemporaryDirectory() as temp_dir: - self.client.download_from_selection( - downloadDir=temp_dir, - studyInstanceUID="1.3.6.1.4.1.14519.5.2.1.7695.1700.114861588187429958687900856462", - dirTemplate=template, - ) - self.assertEqual( - sum([len(files) for r, d, files in os.walk(temp_dir)]), 3 - ) - - def test_download_from_selection(self): - # Define the values for each optional parameter - dry_run_values = [True, False] - quiet_values = [True, False] - show_progress_bar_values = [True, False] - use_s5cmd_sync_values = [True, False] - - # Generate all combinations of optional parameters - combinations = product( - dry_run_values, - quiet_values, - show_progress_bar_values, - use_s5cmd_sync_values, - ) - - # Test each combination - for ( - dry_run, - quiet, - show_progress_bar, - use_s5cmd_sync, - ) in combinations: - with tempfile.TemporaryDirectory() as temp_dir: - self.client.download_from_selection( - downloadDir=temp_dir, - dry_run=dry_run, - patientId=None, - studyInstanceUID="1.3.6.1.4.1.14519.5.2.1.7695.1700.114861588187429958687900856462", - seriesInstanceUID=None, - quiet=quiet, - show_progress_bar=show_progress_bar, - use_s5cmd_sync=use_s5cmd_sync, - ) - - if not dry_run: - self.assertNotEqual(len(os.listdir(temp_dir)), 0) - - def test_sql_queries(self): - df = self.client.sql_query("SELECT DISTINCT(collection_id) FROM index") - - self.assertIsNotNone(df) - - def test_download_from_aws_manifest(self): - # Define the values for each optional parameter - quiet_values = [True, False] - validate_manifest_values = [True, False] - show_progress_bar_values = [True, False] - use_s5cmd_sync_values = [True, False] - dirTemplateValues = [ - None, - "%collection_id/%PatientID/%Modality/%StudyInstanceUID/%SeriesInstanceUID", - "%collection_id%PatientID%Modality%StudyInstanceUID%SeriesInstanceUID", - ] - # Generate all combinations of optional parameters - combinations = product( - quiet_values, - validate_manifest_values, - show_progress_bar_values, - use_s5cmd_sync_values, - dirTemplateValues, - ) - # Test each combination - for ( - quiet, - validate_manifest, - show_progress_bar, - use_s5cmd_sync, - dirTemplate, - ) in combinations: - with tempfile.TemporaryDirectory() as temp_dir: - self.client.download_from_manifest( - manifestFile="./study_manifest_aws.s5cmd", - downloadDir=temp_dir, - quiet=quiet, - validate_manifest=validate_manifest, - show_progress_bar=show_progress_bar, - use_s5cmd_sync=use_s5cmd_sync, - dirTemplate=dirTemplate, - ) - - if sum([len(files) for _, _, files in os.walk(temp_dir)]) != 9: - print( - f"Failed for {quiet} {validate_manifest} {show_progress_bar} {use_s5cmd_sync} {dirTemplate}" - ) - self.assertFalse(True) - - def test_download_from_gcp_manifest(self): - # Define the values for each optional parameter - quiet_values = [True, False] - validate_manifest_values = [True, False] - show_progress_bar_values = [True, False] - use_s5cmd_sync_values = [True, False] - dirTemplateValues = [ - None, - "%collection_id/%PatientID/%Modality/%StudyInstanceUID/%SeriesInstanceUID", - "%collection_id_%PatientID_%Modality_%StudyInstanceUID_%SeriesInstanceUID", - ] - # Generate all combinations of optional parameters - combinations = product( - quiet_values, - validate_manifest_values, - show_progress_bar_values, - use_s5cmd_sync_values, - dirTemplateValues, - ) - - # Test each combination - for ( - quiet, - validate_manifest, - show_progress_bar, - use_s5cmd_sync, - dirTemplate, - ) in combinations: - with tempfile.TemporaryDirectory() as temp_dir: - self.client.download_from_manifest( - manifestFile="./study_manifest_gcs.s5cmd", - downloadDir=temp_dir, - quiet=quiet, - validate_manifest=validate_manifest, - show_progress_bar=show_progress_bar, - use_s5cmd_sync=use_s5cmd_sync, - dirTemplate=dirTemplate, - ) - - self.assertEqual( - sum([len(files) for r, d, files in os.walk(temp_dir)]), 9 - ) - - def test_download_from_bogus_manifest(self): - # Define the values for each optional parameter - quiet_values = [True, False] - validate_manifest_values = [True, False] - show_progress_bar_values = [True, False] - use_s5cmd_sync_values = [True, False] - - # Generate all combinations of optional parameters - combinations = product( - quiet_values, - validate_manifest_values, - show_progress_bar_values, - use_s5cmd_sync_values, - ) - - # Test each combination - for ( - quiet, - validate_manifest, - show_progress_bar, - use_s5cmd_sync, - ) in combinations: - with tempfile.TemporaryDirectory() as temp_dir: - self.client.download_from_manifest( - manifestFile="./study_manifest_bogus.s5cmd", - downloadDir=temp_dir, - quiet=quiet, - validate_manifest=validate_manifest, - show_progress_bar=show_progress_bar, - use_s5cmd_sync=use_s5cmd_sync, - ) - - self.assertEqual(len(os.listdir(temp_dir)), 0) - - """ - disabling these tests due to a consistent server timeout issue - def test_citations(self): - citations = self.client.citations_from_selection( - collection_id="tcga_gbm", - citation_format=index.IDCClient.CITATION_FORMAT_APA, - ) - self.assertIsNotNone(citations) - - citations = self.client.citations_from_selection( - seriesInstanceUID="1.3.6.1.4.1.14519.5.2.1.7695.4164.588007658875211151397302775781", - citation_format=index.IDCClient.CITATION_FORMAT_BIBTEX, - ) - self.assertIsNotNone(citations) - - citations = self.client.citations_from_selection( - studyInstanceUID="1.2.840.113654.2.55.174144834924218414213677353968537663991", - citation_format=index.IDCClient.CITATION_FORMAT_BIBTEX, - ) - self.assertIsNotNone(citations) - - citations = self.client.citations_from_manifest("./study_manifest_aws.s5cmd") - self.assertIsNotNone(citations) - """ - - def test_cli_download_from_selection(self): - runner = CliRunner() - with tempfile.TemporaryDirectory() as temp_dir: - result = runner.invoke( - self.download_from_selection, - [ - "--download-dir", - temp_dir, - "--dry-run", - False, - "--quiet", - True, - "--show-progress-bar", - True, - "--use-s5cmd-sync", - False, - "--study-instance-uid", - "1.3.6.1.4.1.14519.5.2.1.7695.1700.114861588187429958687900856462", - ], - ) - assert len(os.listdir(temp_dir)) != 0 - - def test_cli_download_from_manifest(self): - runner = CliRunner() - with tempfile.TemporaryDirectory() as temp_dir: - result = runner.invoke( - self.download_from_manifest, - [ - "--manifest-file", - "./study_manifest_aws.s5cmd", - "--download-dir", - temp_dir, - "--quiet", - True, - "--show-progress-bar", - True, - "--use-s5cmd-sync", - False, - ], - ) - assert len(os.listdir(temp_dir)) != 0 - - def test_singleton_attribute(self): - # singleton, initialized on first use - i1 = IDCClient.client() - i2 = IDCClient.client() - - # new instances created via constructor (through init) - i3 = IDCClient() - i4 = self.client - - # all must be not none - assert i1 is not None - assert i2 is not None - assert i3 is not None - assert i4 is not None - - # singletons must return the same instance - assert i1 == i2 - - # new instances must be different - assert i1 != i3 - assert i1 != i4 - assert i3 != i4 - - # all must be instances of IDCClient - assert isinstance(i1, IDCClient) - assert isinstance(i2, IDCClient) - assert isinstance(i3, IDCClient) - assert isinstance(i4, IDCClient) - - def test_cli_download(self): - runner = CliRunner() - with runner.isolated_filesystem(): - result = runner.invoke( - self.download, - ["1.3.6.1.4.1.14519.5.2.1.7695.1700.114861588187429958687900856462"], - ) - assert len(os.listdir(Path.cwd())) != 0 - - def test_list_indices(self): - i = IDCClient() - assert not i.indices_overview.empty # assert that df was created - - def test_fetch_index(self): - i = IDCClient() - assert i.indices_overview["sm_index", "installed"] is False - i.fetch_index("sm_index") - assert i.indices_overview["sm_index", "installed"] is True - - -if __name__ == "__main__": - unittest.main() diff --git a/tests/prior_version_manifest.s5cmd b/tests/prior_version_manifest.s5cmd new file mode 100644 index 00000000..1c91a450 --- /dev/null +++ b/tests/prior_version_manifest.s5cmd @@ -0,0 +1,5 @@ +cp s3://idc-open-data/040fd3e1-0088-4bfd-8439-55e3c5d80a56/* . +cp s3://idc-open-data/04553d0f-1af9-414d-b631-cc31624aced5/* . +cp s3://idc-open-data/068346bf-16ef-4e45-87bf-87feb576a21c/* . +cp s3://idc-open-data/07908d47-5e85-45f3-9649-79c15f606f52/* . +cp s3://idc-open-data/099d180f-1d79-402d-abad-bfd8e2736b04/* .