-
Notifications
You must be signed in to change notification settings - Fork 1
Running MPNN with PSSM bias
I modified one MPNN file to get this to run, because the out-of-the-box way to run this requires a .npz of the PSSM. Rather than convert the PSSM to .npz, I made the pssm.jsonl separately in make_pssm_dict_ARedit.py. All these files are in projects/andrew
-
Make .fasta of the sequences to be included in the PSSM
-
Make blast db:
- install psi-blast with conda install -c bioconda blast
- makeblastdb -in /ifs/scratch/home/arr2230/data/mpnn/ar23_pssm/nh160to168_ar23first.fasta -dbtype prot -parse_seqids -out makeblastdb -in /ifs/scratch/home/arr2230/data/mpnn/ar23_pssm/nh160to168_ar23first.fasta -dbtype prot -parse_seqids -out /ifs/scratch/home/arr2230/data/mpnn/ar23_pssm/db/ar23
- makes a handful of files that are all part of the db
- Make pssm:
- cd into the folder with the makeblastdb output files, and refer to all those created db files using the prefix specified in the out command in the previous step:
- psiblast -db ar23 -query /ifs/scratch/home/arr2230/data/mpnn/ar23_pssm/ar23.fasta -num_iterations 3 -out_ascii_pssm output.pssm
4.Parse original pdb into jsonl format
- using submit_example_parse_chains.sh in helper_scripts, parse the starting pdb into parsed_pdbs.jsonl
5.Make pssm.jsonl
- Run helper_scripts/other_tools/make_pssm_dict_ARedit.py
- inputs:
- unconverted pssm (maybe still called output.pssm)
- the parsed pdb in jsonl format from previous step (maybe still called parsed_pbbs.jsonl)
- output:
- the converted pssm.jsonl
6.Run MPNN
- Run helper_scripts/submit_pssm_ar23.sh
- inputs:
- the converted pssm.jsonl
- folder with reference pdb
- To run multiple bias values (called pssm_multi), use helper_scripts/run_submit_pssm.sh
Side note: How to interpret the PSSM: Information Content: This column measures the information at each position in the matrix, which is a reflection of how conserved that position is. A higher value indicates that the position is more conserved (less variable) across the sequences in the alignment. Composition: This column provides a measure related to the bias or background frequency of amino acids at that position. In summary: Columns 1-20: PSSM scores for the 20 standard amino acids. Columns 21-40: Raw counts or frequencies for each amino acid at that position. Column 41: Information content for each position. Column 42: Composition or bias measure for each position.
- New member onboarding
- Lab jobs
- Seminar schedules
- How to order
- Group meeting schedule
- Lab notebooks
- Funding opportunities
- Philosophy of science
- Wet lab basics
- Lab safety
- Waste disposal
- Chemical inventory
- -20C inventory
- Molecular biology
- Buffers and reagents
- Protocols library
- DNA synthesis and primers
- 80C freezer organization
- Using server
- C2B2 HPC access
- Update lab website
- Cluster parallel processing
- Mercury at CUIMC
- Getting started with Rosetta
- Install Pyrosetta
- Tutorials
- Clone Github
- Gromacs-Tutorial
- Cluster Specs
- Deep MSA and Statistical Coupling Analysis
- MMseqs2: Make MSA and analyze taxonomy
- Useful tools