Prioritized communication channel is on Microsoft Teams: BioinfoUgrp. Do not hesitate to use the ping function (putting @
and then the name, like in other chat systems), because the discussions on the Team app are a bit easy to miss otherwise.
Please "Google" the issues prior to contacting us. Very often, the main issues will already be reported and the solution available on the reference webpage of the program: in the Issues
tab of GitHub
for some, in GoogleGroups
for others (e.g. for IQ-TREE). Other great platforms are StackOverflow, or Biostars.
Search with a keyword, for instance ml key clustal
.
Execute ml bioinfo-ugrp-modules
to make available the modules installed by the OIST Bioinfo user group. This line can be appended to your ~/.bashrc
to make them available by default.
We autogenerate many modules from softwares packaged the Debian distribution. To see them, execute ml bioinfo-ugrp-modules DebianMed
. More information is available on the DebianMedModules page.
To load a module in DebianMed (an example for loading bcftools):
# load DebianMed module first
ml bioinfo-ugrp-modules DebianMed
# now you can see the list of module installed in DebianMed.
ml avail
# load module
ml bcftools
# check the installation
bcftools --version
We provide some modules for Unix tools useful to everybody including bioinformaticians.
ml bioinfo-ugrp-modules UnixGoodies
ml av
Check oist/BioinfoUgrp_UnixGoodies_Images for details.
We have prepared a Nextflow module (ml bioinfo-ugrp-modules Nextflow2
) and registered OIST's profile to the nf-core community so that you can run their pipelines with the -profile oist
option on Deigo. A nf-core module is also available (ml bioinfo-ugrp-modules nf-core
).
Under the Other/
namespace, we also provide some general bioinformatics tools such as:
- DIAMOND (
ml Other/DIAMOND/2.0.4.142
) - InterProScan and its database (
ml Other/interproscan/5.48-83.0
) - … and more !
See this page for the full list of modules and for more information.
Widely used databases were installed locally. Upon request by users, we plan on upgrading databases (not more than once a year). After upgrading a specific database, users will be asked if the older database should still remain available (completion of projects,...): it will be deleted after 30 days except if still required. At one time, a maximum of two versions of the same database will be available.
The following databases were constructed using ncbi-blast v2.10.0+. The module ncbi-blast/2.10.0+
has to be loaded in order to use these databases.
- NCBI NT and NR databases (release 238) :
ml DB/blastDB/ncbi/238
. To be used with the argumentsnt
ornr
supplied to-db
in the commands of your scripts. Example script to get a taxified blast report:
module load ncbi-blast/2.10.0+
module load DB/blastDB/ncbi/238
WORKDIR="$PWD"
FASTA=FULL/PATH/TO/YOUR/FASTA/FILE
blastn -task megablast -db nt -query $FASTA -num_threads ${SLURM_CPUS_PER_TASK} -out ${WORKDIR}/megablastn.out \
-outfmt '6 qseqid bitscore evalue length qlen qcovs pident sseqid sgi sacc staxid ssciname scomname stitle sseq' \
-max_target_seqs 1
- Swiss-Prot (version 2020_06):
ml DB/blastDB/sprot/2020_06
- UniRef90 (version 2020_06):
ml DB/blastDB/uniref90/2020_06
The following databases were constructed using DIAMOND v2.0.4.142. The module Other/DIAMOND/2.0.4.142
has to be loaded in order to use them.
- the NCBI-NR database (release 238):
ml DB/diamondDB/ncbi/238
- Swiss-Prot (version 2020_06):
ml DB/diamondDB/sprot/2020_06
- UniRef90 (version 2020_06):
ml DB/diamondDB/uniref90/2020_06
Unlike ncbi-blast, DIAMOND requires full path of the databases. The database module automatically create an environment variable "DIAMONDDB" which specifies full path to the DIAMOND database. So you need to prepend ${DIAMONDDB}
to the name of database.
Example script to run diamond with the database module:
# load ncbi database for DIAMOND (proper version of DIAMOND is automatically loaded)
module load DB/diamondDB/ncbi/238
# check the loaded DIAMOND version and ${DIAMONDDB} variable
diamond --version
echo ${DIAMONDDB}
# run diamond search
WORKDIR="$PWD"
FASTA=FULL/PATH/TO/YOUR/FASTA/FILE
diamond blastp -db ${DIAMONDDB}/nr -q $FASTA -p ${SLURM_CPUS_PER_TASK} -out ${WORKDIR}/diamond.blastp.out -outfmt 6
Version 34.0: Use ml DB/Pfam/34.0
to invoke it in your scripts.
Version 3.6 downloaded from https://www.dfam.org/releases/Dfam_3.6/families/Dfam.h5.gz.
The command ml DB/Dfam/3.6
will expose an environment variable $BioinfoUgrp_Dfam
containing the path to the directory containing the database files, that can be passed to RepeatMasker through its -libdir
argument.
The command ml DB/Dfam_RepeatMasker/3.6__4.1.3
will set an environmental variable that changes the behaviour of the repeatmodeler
module, so that it will use the full Dfam database provided by us instead of the “curated only” version provided by default.
The RepeatMasker program does not follow symbolic links and the Dfam database is large (160 Gb), so I had to use hard links to the files of the Dfam
module instead. Also, the modulefile contains:
setenv("BioinfoUgrp_Dfam_Rmsk_4_1_3", apphome.."/RepeatMasker_4.1.3/Libraries")
setenv("SINGULARITY_BINDPATH", apphome.."/Libraries:/opt/RepeatMasker/Libraries")
Here is how you can run RStudio on a compute node.
We have some modules on Saion for GPU-accelerated computations such that can not be run on Deigo. Please remember that the modules system on Saion is older, so the ml
shortcuts will not work. To list the available modules, do:
module load bioinfo-ugrp-modules
module available
We have a very basic implementation of Alpha fold 2.1.1 within the user group modules. You can find (in time) a verbose documentation here. However, for a basic usage, you can try to do something similar to the example script in: /apps/unit/BioinfoUgrp/alphafold/2.1.1/bin/alphafold_example_script.sh
We have modules for basecalling Nanopore data, in particular for Guppy and Rerio.