Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polishing scripts #14

Open
4 tasks
mariacuria opened this issue Nov 12, 2024 · 3 comments · May be fixed by #23
Open
4 tasks

Polishing scripts #14

mariacuria opened this issue Nov 12, 2024 · 3 comments · May be fixed by #23
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@mariacuria
Copy link
Contributor

  • Use the utils module to navigate to config.json in scripts that use config.json
  • Replace hardcoded paths with config.json paths
  • Move processing scripts to convert_step2/cbioportal
  • Add which script goes after which in README.md
@mariacuria mariacuria added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 12, 2024
@mariacuria mariacuria self-assigned this Nov 12, 2024
@mariacuria
Copy link
Contributor Author

mariacuria commented Nov 27, 2024

@Reeya123
Scripts that need to use utils (some of them are bash scripts, so a separate bash utils would need to be created):

  1. Everything in /data/shared/repos/biomuta-old/pipeline/download_step1/cbioportal
  2. Everything in /data/shared/repos/biomuta-old/pipeline/convert_step2/cbioportal
  3. In /data/shared/repos/biomuta-old/pipeline/convert_step2/liftover:
    1. 1_chr_pos_to_bed.py
    2. 2_liftover.sh

You might encounter some wrongly defined paths due to me changing the repo file tree. DM me if you stumble upon them.

@mariacuria
Copy link
Contributor Author

mariacuria commented Nov 29, 2024

How I store data:

  1. Source data is supposed to be downloaded every month. It goes to /data/shared/biomuta/downloads/cbioportal/<yyyy_mm_dd> where <yyyy_mm_dd> is download date.
  2. Data generated by scripts goes to /data/shared/biomuta/generated/datasets/<yyyy_mm_dd> where the date is not necessarily the same as in (1) but I expect it to be the same in most cases.

The logic I use to navigate to these file paths is to pull the main directories (downloads and datasets) from config.json, and then finding the latest subdirectory. Scripts that follow this logic:

  • bash: /data/shared/repos/biomuta-old/pipeline/download_step1/cbioportal/cancer_types.sh
  • python: /data/shared/repos/biomuta-old/pipeline/convert_step2/cbioportal/1_generate_cancer_do_json.py

@Reeya123 Reeya123 linked a pull request Dec 3, 2024 that will close this issue
@Reeya123
Copy link

Reeya123 commented Dec 3, 2024

Hello Maria. Please approve my pull request and merge with the main branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants