9. Changelog

2024-11-24, version 0.6.2

Bug fixes

The slurm profile configuration file for the PDC/KTH cluster Dardel (config/slurm/profile/config_plugin_dardel.yaml) has been fixed so that containers bind the correct directory
Documentation on how to run GenErode on Dardel has been updated (config/slurm/README.md)
FastP did not merge reads shorter than 30 bp with default settings. The parameters --overlap_len_require 15 --overlap_diff_limit 1 have been implemented to ensure proper merging of shorter reads
gerp_derived_alleles did not process the last position of each chromosome/scaffold/contig which is now fixed
bam2fasta coded the first and last base of a mapped read as "N" when producing the fasta files for the outgroups used in GERP++, which is now fixed
Add the flag -quick to RepeatModeler (using sample sizes from before version 2.0.4 with same sensitivity as versions >= 2.0.4 but faster) and lower the number of threads but keeping memory the same in Dardel slurm profile to fix errors when running RepeatModeler2 of different versions with the White rhino and Sumatran rhino test data (round-2/tmptmpSample.fa and */sampleDB-round4.fa missing)

Software updates

Update bedtools version to 2.31.1 and htslib to 1.20 (container from sequera)
Update samtools version to 1.20 (container from sequera; except for bam2pro and mlRho rules)
Update RepeatModeler to version 2.0.5

Software versions

python 3.12.3
snakemake 8.14.0
biopython 1.83
matplotlib 3.8.4
pandas 2.2.2
numpy 1.26.4
snakemake-executor-plugin-slurm 0.6.0
bwa 0.7.17
samtools 1.20 (mlRho rules are run in a container with samtools 1.9 and mlRho)
picard 2.26.6
repeatmodeler 2.0.5
repeatmasker 4.1.5
bedtools 2.31.1
fastqc 0.12.1
multiqc 1.9
fastp 0.22.0
qualimap 2.3
gatk 3.7
mapdamage 2.0.9
bcftools 1.20
mlrho 2.9
plink 1.9
vcftools v0.1.16
snpeff 4.3.1
seqtk 1.4
gerp 2.1

2024-06-19, version 0.6.1

New features

Under utilities/mutational_load_snpeff, a new Snakemake pipeline has been added to process snpEff results for the purpose of calculating mutational load

Software updates

Snakemake has been upgraded to version 8 with some larger changes in the source code. Most importantly for GenErode, the execution on slurm clusters has been implemented in Snakemake itself.
Update QualiMap version to 2.3 (container from sequera)
Switch Plink container to the container from by GalaxyProject
Update BCFtools version to 1.20 (container from GalaxyProject)
Update seqtk version to 1.4 (container from sequera)
Switch GERP++ container to the container from by GalaxyProject
Calculate memory inside of rules based on mem_mb provided under resources instead of based on threads

Software versions

python 3.12.3
snakemake 8.14.0
biopython 1.83
matplotlib 3.8.4
pandas 2.2.2
numpy 1.26.4
snakemake-executor-plugin-slurm 0.6.0
bwa 0.7.17
samtools 1.9
picard 2.26.6
repeatmodeler 2.0.4
repeatmasker 4.1.5
bedtools 2.29.2
fastqc 0.12.1
multiqc 1.9
fastp 0.22.0
qualimap 2.3
gatk 3.7
mapdamage 2.0.9
bcftools 1.20
mlrho 2.9
plink 1.9
vcftools v0.1.16
snpeff 4.3.1
seqtk 1.4
gerp 2.1

2024-01-29, version 0.6.0

New features

Option to remove sex chromosome-linked scaffolds/contigs from the final BCF files and downstream analyses. Can also be used to remove any other scaffolds/contigs.

Minor bug fixes and upgrades

Update RepeatModeler version to 2.0.4 to be able to handle large genomes. With the new version, rules to copy RepeatModeler libraries and to run RepeatClassifier are not require anymore and are also removed.
Update RepeatMasker version to 4.1.5 (from the new RepeatModeler container)
Fix the input for rule missingness_filtered_vcf_multiqc so that it also works when GenErode is only run with modern or only with historical samples
Remove *.bai files from mlRho rule input to avoid triggering of re-runs of mapping
Update FastQC version to version 0.12.1 with larger default memory allocation
Replace the rescale_gerp rule with the gerpcol parameter -s 0.001 in the compute_gerp rule. The same functionality is ensured while there are less intermediate files and users can change the scaling parameter themselves if necessary for their project.
Fix file path for temporary fastp output file
Fix the documentation regarding the input tree scaling for GERP which should be in millions of years, as (correctly) provided from timetree.org
Add MultiQC reports for merged VCF file to the pipeline report
Multiple changes to avoid triggering re-runs or duplication of files: keep merged VCF file for testing of missingness filters, do not copy the repeatmask-BED file from the reference location to the GenErode results directory
Automatically determine memory allocation to -Xmx in GATK RealignerTargetCreator and IndelRealigner for more efficient memory use
Remove flag -a from RepeatMasker command so that *.fasta.align file is not created since it is not needed by downstream analyses

See https://github.com/NBISweden/GenErode/pull/58 for the code changes

2023-03-07, version 0.5.1

Bug fixes

Fix filter for missing data in merged VCF across all samples for f_missing: 0.0 (no missing data allowed) and f_missing: 1.0 (any level of missing data allowed)
Correct input file name for rule index_realigned_bams

See https://github.com/NBISweden/GenErode/pull/42 for the code changes

2023-02-15, version 0.5.0

Bug fixes and upgrades

Update file names of output files and corresponding code in the mitogenome mapping step to solve conflicts
Upgrade conda environment file to install Snakemake version 7.20.0
Add a slurm profile configuration file, compatible with current slurm profile and Snakemake version 7
Update the GenErode pipeline report code to be compatible with Snakemake version 7.20.0
Fix rule names in cluster.yaml file
Add a rule to localrule in the mitogenome mapping step

See https://github.com/NBISweden/GenErode/pull/35 for the code changes

2022-09-05, version 0.4.2

Updates related to large genome sizes and/or large sample sizes

Run snpEff with option to specify -Xmx for large genomes and add the rules to cluster.yaml
Fix y-axis labels for mutational load plot so that there is no overlap for large sample sizes
Create new Docker images with bedtools and htslib (bgzip) so that VCF files filtered with bedtools can be compressed in a pipe to reduce intermediate file sizes

Minor bug fixes

Update conda in GitHub actions to reduce run time
Shorten run time and lower number of cores for mutational load calculations in cluster.yaml
Remove temp flag from bam index file of rescaled bam files
Embed pipeline logo into GenErode pipeline report via link to file on repository so that the pipeline report can be moved to a different location
Fix "rerun incomplete" warning for rule make_reference_bed by separating it from the group job reference_prep_group

See https://github.com/NBISweden/GenErode/pull/23 for the code changes

2022-03-03, version 0.4.1

Release of public version of GenErode

Changes since version 0.4.0 (unpublished):

Bug fix of python code to create output file lists for different CpG filtering methods
Updated documentation
Removed legacy code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly