Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to provide multiple sequence alignment directory to Chai-1 #14

Merged
merged 23 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
1739592
feat: Added msa parameter.
FloWuenne Nov 29, 2024
489727f
Fix: Remove local path from config.
FloWuenne Nov 29, 2024
4723a3d
Fix: Added CHAI_DOWNLOADS_DIR back into module.
FloWuenne Nov 29, 2024
6f6a137
fix: Removed resource label from CHAI_1 and added groundswell optimiz…
FloWuenne Nov 29, 2024
75e9ed9
fix: Added apptainer gpu config runOption.
FloWuenne Nov 29, 2024
0022e02
fix: set output-dir param to meta.id again
FloWuenne Nov 29, 2024
bd2aa5e
fix: changed torch device definition back, to work on cpu only machines.
FloWuenne Nov 29, 2024
0056f46
fix: Fixed indentation in config.
FloWuenne Nov 29, 2024
3744488
fix: Added Path to msa_dir in run_chai_1.py
FloWuenne Nov 29, 2024
f8208e4
fix: Fixed indentation and quotes for defs in CHAI_1
FloWuenne Nov 29, 2024
9f21620
fix: Add exception in run_chai_1.py for case that msa is not provided.
FloWuenne Nov 29, 2024
67b5b99
fix: Updated nextflow_schema.json.
FloWuenne Nov 30, 2024
14db704
chore: Updated changelog.
FloWuenne Nov 30, 2024
33c0d85
fix: Fixed left padding in nextflow.config
FloWuenne Nov 30, 2024
915fe50
fix: Fixed msa_dir param input for run_chai_1.py
FloWuenne Nov 30, 2024
80b206b
chore: update CHANGELOG
drpatelh Dec 2, 2024
48665f5
chore: rename underscores to dashes in Python script for consistency
drpatelh Dec 2, 2024
816b2b6
chore: move --msa_dir param up in schema and add appropriate fields
drpatelh Dec 2, 2024
7ebaddb
chore: move --msa_dir param up in parameter priority as input
drpatelh Dec 2, 2024
a0559bc
fix: bug in fasta file name
drpatelh Dec 2, 2024
9b40231
chore: change some variable names in main module
drpatelh Dec 2, 2024
9c5bc4a
fix: revert removal of process_high label
drpatelh Dec 2, 2024
3b26c20
docs: add sentence about --msa_dir to main README
drpatelh Dec 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Special thanks to the following for their contributions to the release:
- [PR #11](https://github.com/seqeralabs/nf-chai/pull/11) - Expose additional Chai-1 parameters in the pipeline
- [PR #12](https://github.com/seqeralabs/nf-chai/pull/12) - Add log for GPU/CPU
- [PR #13](https://github.com/seqeralabs/nf-chai/pull/13) - Bump `chai_lab` version to 0.4.2
- [PR #14](https://github.com/seqeralabs/nf-chai/pull/14) - Add parameter to provide multiple sequence alignment directory to Chai-1

## 0.1.0

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ nextflow run seqeralabs/nf-chai \

Set the `--weights_dir` parameter to a location with the pre-downloaded weights required by Chai-1 to avoid having to download them every time you run the pipeline.

To further improve prediction performance using pre-built multiple sequence alignments (MSA) with evolutionary information, set the `--msa_dir` parameter to a location with [`*.aligned.pqt`](https://github.com/chaidiscovery/chai-lab/tree/main/examples/msas#adding-msa-evolutionary-information) format as required by Chai-1.

## Credits

nf-chai was originally written by the Seqera Team.
Expand Down
8 changes: 7 additions & 1 deletion bin/run_chai_1.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@ def main():
default=True,
help="Use ESM embeddings (enabled by default)"
)
parser.add_argument(
"--msa-dir",
type=str,
default=None,
help="Directory containing precomputed multiple sequence alignments (MSA)."
)

# Parse arguments
args = parser.parse_args()
Expand All @@ -67,7 +73,6 @@ def main():
logging.info("No GPU found, using CPU")
device = "cpu"


# Run structure prediction
run_inference(
fasta_file=args.fasta_file,
Expand All @@ -77,6 +82,7 @@ def main():
seed=args.seed,
device=device,
use_esm_embeddings=args.use_esm_embeddings,
msa_directory=Path(args.msa_dir) if args.msa_dir else None,
)

if __name__ == "__main__":
Expand Down
1 change: 1 addition & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ workflow {
NF_CHAI (
params.input,
params.weights_dir,
params.msa_dir,
params.num_trunk_recycles,
params.num_diffusion_timesteps,
params.seed,
Expand Down
12 changes: 8 additions & 4 deletions modules/local/chai_1/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ process CHAI_1 {
input:
tuple val(meta), path(fasta)
path weights_dir
path msa_dir
val num_trunk_recycles
val num_diffusion_timesteps
val seed
Expand All @@ -18,16 +19,19 @@ process CHAI_1 {
path "versions.yml" , emit: versions

script:
def esm_flag = use_esm_embeddings ? '--use-esm-embeddings' : ''
def downloads_dir = weights_dir ?: './downloads'
def msa_path = msa_dir ? "--msa-dir=$msa_dir" : ''
def use_esm = use_esm_embeddings ? '--use-esm-embeddings' : ''
"""
CHAI_DOWNLOADS_DIR=$downloads_dir \\
run_chai_1.py \\
--fasta-file ${fasta} \\
--output-dir . \\
--output-dir ${meta.id} \\
--num-trunk-recycles ${num_trunk_recycles} \\
--num-diffn-timesteps ${num_diffusion_timesteps} \\
--seed ${seed} \\
${esm_flag} \\
$args
${use_esm} \\
${msa_path}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
2 changes: 2 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ params {
// Input options
input = null
weights_dir = null
msa_dir = null
use_gpus = false
num_trunk_recycles = 3
num_diffusion_timesteps = 200
Expand Down Expand Up @@ -132,6 +133,7 @@ profiles {
apptainer {
apptainer.enabled = true
apptainer.autoMounts = true
apptainer.runOptions = params.use_gpus ? '--nv' : ''
conda.enabled = false
docker.enabled = false
singularity.enabled = false
Expand Down
7 changes: 7 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@
"description": "Directory containing model weights and other artifacts required by Chai-1.",
"fa_icon": "fas fa-folder-open"
},
"msa_dir": {
"type": "string",
"format": "directory-path",
"exists": true,
"description": "Directory containing precomputed multiple-sequence alignments",
"fa_icon": "fas fa-align-justify"
},
"use_gpus": {
"type": "boolean",
"description": "Run compatible tasks on GPUs rather than CPUs (default).",
Expand Down
4 changes: 3 additions & 1 deletion workflows/nf_chai/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ workflow NF_CHAI {

take:
fasta_file // string: path to fasta file read provided via --input parameter
weights_dir // string: path to model directory read provided via --weights_directory parameter
weights_dir // string: path to model directory read provided via --weights_dir parameter
msa_dir // string: path to the directory containing multiple sequence alignments (msa)
num_trunk_recycles // integer: Number of trunk recycles
num_diffusion_timesteps // integer: Number of diffusion steps to use
seed // integer: Random seed to be used for Chai-1 calculations
Expand All @@ -39,6 +40,7 @@ workflow NF_CHAI {
CHAI_1 (
ch_fasta,
weights_dir ? Channel.fromPath(weights_dir) : [],
msa_dir ? Channel.fromPath(msa_dir) : [],
num_trunk_recycles,
num_diffusion_timesteps,
seed,
Expand Down