Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prod release v0.2.0 #85

Merged
merged 147 commits into from
Dec 22, 2023
Merged

Prod release v0.2.0 #85

merged 147 commits into from
Dec 22, 2023

Conversation

priyanka-surana
Copy link
Contributor

@priyanka-surana priyanka-surana commented Dec 18, 2023

This is a bit long and messy. I am running a full test here: /lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit. Will update once it completes, but please start reviewing if you can. Would be great to get it merged this week. Thanks :)

@priyanka-surana
Copy link
Contributor Author

@gq1 Can you please take a look at the tests for this pipeline? Thanks.

@gq1
Copy link
Member

gq1 commented Dec 20, 2023

@gq1 Can you please take a look at the tests for this pipeline? Thanks.

Can you run the test locally?

Not much help from the log, run out of disk space?

2023-12-20T14:34:17.7624742Z   touch: cannot touch '.command.trace': Permission denied
2023-12-20T14:34:17.7625306Z 
2023-12-20T14:34:17.7625598Z Work dir:
2023-12-20T14:34:17.7626480Z   /home/runner/work/blobtoolkit/blobtoolkit/work/d9/ce32b60e9dba5d624bc89e3bcf10c8

@muffato
Copy link
Member

muffato commented Dec 20, 2023

I looked at /lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit/results/GCA_927399515.1 and browsed the data on the BTK viewer.

These seem fine:

  • GC%
  • Read coverage
  • Lengths
  • Gaps (Ns)
  • "position" and "proportion" files
  • Summary files GCA*.summary.json (incl. the BUSCO score), identifiers.json, meta.json

These seem weird:

  • ${lineage}_odb10_count*.json only have 0s for the related eukaryote lineages (fungi and below) and non-0s for bacteria/archaea ! Maybe there's some filtering happening.
  • I also don't know / understand what's supposed to be in the buscogenes* and buscoregions* JSON files, so can't really tell whether the counts are correct

Copy link
Member

@muffato muffato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR on principle, pending sorting out the BUSCO JSON files and the if statement

Comment on lines +57 to +68
BLASTN_TAXON.out.txt
| map { meta, txt -> txt.isEmpty() }
| set { is_txt_empty }

// repeat the blastn search without excluding taxon_id
if ( is_txt_empty ) {
BLAST_BLASTN ( BLOBTOOLKIT_CHUNK.out.chunks, blastn, [] )
ch_blastn_txt = BLAST_BLASTN.out.txt
}
else {
ch_blastn_txt = BLASTN_TAXON.out.txt
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're on it, but for the record: please check that the if is working as expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried branching and filtering but with no better results so leaving as is for now. This might be a fix for the next version.

Copy link

Python linting (black) is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks.
To fix this CI test, please run:

  • Install black: pip install black
  • Fix formatting errors in your pipeline: black .

Once you push these changes the test should pass, and you can hide this comment 👍

We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

@priyanka-surana
Copy link
Contributor Author

I looked at /lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit/results/GCA_927399515.1 and browsed the data on the BTK viewer.

These seem fine:

  • GC%
  • Read coverage
  • Lengths
  • Gaps (Ns)
  • "position" and "proportion" files
  • Summary files GCA*.summary.json (incl. the BUSCO score), identifiers.json, meta.json

These seem weird:

  • ${lineage}_odb10_count*.json only have 0s for the related eukaryote lineages (fungi and below) and non-0s for bacteria/archaea ! Maybe there's some filtering happening.
  • I also don't know / understand what's supposed to be in the buscogenes* and buscoregions* JSON files, so can't really tell whether the counts are correct

This is fixed.

@priyanka-surana priyanka-surana merged commit bc9fa62 into dev Dec 22, 2023
6 checks passed
@priyanka-surana priyanka-surana deleted the blast branch December 22, 2023 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants