Skip to content

9. Changelog

Verena Kutschera edited this page Nov 27, 2024 · 39 revisions

2024-11-24, version 0.6.2

Bug fixes

  • The slurm profile configuration file for the PDC/KTH cluster Dardel (config/slurm/profile/config_plugin_dardel.yaml) has been fixed so that containers bind the correct directory
  • Documentation on how to run GenErode on Dardel has been updated (config/slurm/README.md)
  • FastP did not merge reads shorter than 30 bp with default settings. The parameters --overlap_len_require 15 --overlap_diff_limit 1 have been implemented to ensure proper merging of shorter reads
  • gerp_derived_alleles did not process the last position of each chromosome/scaffold/contig which is now fixed
  • bam2fasta coded the first and last base of a mapped read as "N" when producing the fasta files for the outgroups used in GERP++, which is now fixed
  • Add the flag -quick to RepeatModeler (using sample sizes from before version 2.0.4 with same sensitivity as versions >= 2.0.4 but faster) and lower the number of threads but keeping memory the same in Dardel slurm profile to fix errors when running RepeatModeler2 of different versions with the White rhino and Sumatran rhino test data (round-2/tmptmpSample.fa and */sampleDB-round4.fa missing)

Software updates

  • Update bedtools version to 2.31.1 and htslib to 1.20 (container from sequera)
  • Update samtools version to 1.20 (container from sequera; except for bam2pro and mlRho rules)
  • Update RepeatModeler to version 2.0.5

Software versions

  • python 3.12.3
  • snakemake 8.14.0
  • biopython 1.83
  • matplotlib 3.8.4
  • pandas 2.2.2
  • numpy 1.26.4
  • snakemake-executor-plugin-slurm 0.6.0
  • bwa 0.7.17
  • samtools 1.20 (mlRho rules are run in a container with samtools 1.9 and mlRho)
  • picard 2.26.6
  • repeatmodeler 2.0.5
  • repeatmasker 4.1.5
  • bedtools 2.31.1
  • fastqc 0.12.1
  • multiqc 1.9
  • fastp 0.22.0
  • qualimap 2.3
  • gatk 3.7
  • mapdamage 2.0.9
  • bcftools 1.20
  • mlrho 2.9
  • plink 1.9
  • vcftools v0.1.16
  • snpeff 4.3.1
  • seqtk 1.4
  • gerp 2.1

2024-06-19, version 0.6.1

New features

  • Under utilities/mutational_load_snpeff, a new Snakemake pipeline has been added to process snpEff results for the purpose of calculating mutational load

Software updates

  • Snakemake has been upgraded to version 8 with some larger changes in the source code. Most importantly for GenErode, the execution on slurm clusters has been implemented in Snakemake itself.
  • Update QualiMap version to 2.3 (container from sequera)
  • Switch Plink container to the container from by GalaxyProject
  • Update BCFtools version to 1.20 (container from GalaxyProject)
  • Update seqtk version to 1.4 (container from sequera)
  • Switch GERP++ container to the container from by GalaxyProject
  • Calculate memory inside of rules based on mem_mb provided under resources instead of based on threads

Software versions

  • python 3.12.3
  • snakemake 8.14.0
  • biopython 1.83
  • matplotlib 3.8.4
  • pandas 2.2.2
  • numpy 1.26.4
  • snakemake-executor-plugin-slurm 0.6.0
  • bwa 0.7.17
  • samtools 1.9
  • picard 2.26.6
  • repeatmodeler 2.0.4
  • repeatmasker 4.1.5
  • bedtools 2.29.2
  • fastqc 0.12.1
  • multiqc 1.9
  • fastp 0.22.0
  • qualimap 2.3
  • gatk 3.7
  • mapdamage 2.0.9
  • bcftools 1.20
  • mlrho 2.9
  • plink 1.9
  • vcftools v0.1.16
  • snpeff 4.3.1
  • seqtk 1.4
  • gerp 2.1

2024-01-29, version 0.6.0

New features

  • Option to remove sex chromosome-linked scaffolds/contigs from the final BCF files and downstream analyses. Can also be used to remove any other scaffolds/contigs.

Minor bug fixes and upgrades

  • Update RepeatModeler version to 2.0.4 to be able to handle large genomes. With the new version, rules to copy RepeatModeler libraries and to run RepeatClassifier are not require anymore and are also removed.
  • Update RepeatMasker version to 4.1.5 (from the new RepeatModeler container)
  • Fix the input for rule missingness_filtered_vcf_multiqc so that it also works when GenErode is only run with modern or only with historical samples
  • Remove *.bai files from mlRho rule input to avoid triggering of re-runs of mapping
  • Update FastQC version to version 0.12.1 with larger default memory allocation
  • Replace the rescale_gerp rule with the gerpcol parameter -s 0.001 in the compute_gerp rule. The same functionality is ensured while there are less intermediate files and users can change the scaling parameter themselves if necessary for their project.
  • Fix file path for temporary fastp output file
  • Fix the documentation regarding the input tree scaling for GERP which should be in millions of years, as (correctly) provided from timetree.org
  • Add MultiQC reports for merged VCF file to the pipeline report
  • Multiple changes to avoid triggering re-runs or duplication of files: keep merged VCF file for testing of missingness filters, do not copy the repeatmask-BED file from the reference location to the GenErode results directory
  • Automatically determine memory allocation to -Xmx in GATK RealignerTargetCreator and IndelRealigner for more efficient memory use
  • Remove flag -a from RepeatMasker command so that *.fasta.align file is not created since it is not needed by downstream analyses

See https://github.com/NBISweden/GenErode/pull/58 for the code changes

2023-03-07, version 0.5.1

Bug fixes

  • Fix filter for missing data in merged VCF across all samples for f_missing: 0.0 (no missing data allowed) and f_missing: 1.0 (any level of missing data allowed)
  • Correct input file name for rule index_realigned_bams

See https://github.com/NBISweden/GenErode/pull/42 for the code changes

2023-02-15, version 0.5.0

Bug fixes and upgrades

  • Update file names of output files and corresponding code in the mitogenome mapping step to solve conflicts
  • Upgrade conda environment file to install Snakemake version 7.20.0
  • Add a slurm profile configuration file, compatible with current slurm profile and Snakemake version 7
  • Update the GenErode pipeline report code to be compatible with Snakemake version 7.20.0
  • Fix rule names in cluster.yaml file
  • Add a rule to localrule in the mitogenome mapping step

See https://github.com/NBISweden/GenErode/pull/35 for the code changes

2022-09-05, version 0.4.2

Updates related to large genome sizes and/or large sample sizes

  • Run snpEff with option to specify -Xmx for large genomes and add the rules to cluster.yaml
  • Fix y-axis labels for mutational load plot so that there is no overlap for large sample sizes
  • Create new Docker images with bedtools and htslib (bgzip) so that VCF files filtered with bedtools can be compressed in a pipe to reduce intermediate file sizes

Minor bug fixes

  • Update conda in GitHub actions to reduce run time
  • Shorten run time and lower number of cores for mutational load calculations in cluster.yaml
  • Remove temp flag from bam index file of rescaled bam files
  • Embed pipeline logo into GenErode pipeline report via link to file on repository so that the pipeline report can be moved to a different location
  • Fix "rerun incomplete" warning for rule make_reference_bed by separating it from the group job reference_prep_group

See https://github.com/NBISweden/GenErode/pull/23 for the code changes

2022-03-03, version 0.4.1

  • Release of public version of GenErode

Changes since version 0.4.0 (unpublished):

  • Bug fix of python code to create output file lists for different CpG filtering methods
  • Updated documentation
  • Removed legacy code