-
Notifications
You must be signed in to change notification settings - Fork 1
Deep MSA and Statistical Coupling Analysis
Belen Sundberg edited this page Aug 23, 2023
·
6 revisions
- All of the scripts I used are available in /ifs/scratch/home/mm6732/. They will need to be modified with your correct file paths before running.
- Download phmmer
- Phmmer compares a query sequence to a database of protein sequences. The database used to compare sequences to is stored in the server at /ifs/data/glab/uniref90/uniref90.fasta
- Use the command phmmer -o output.txt query_protein.fasta /path/to/database which takes in an input amino acid sequence fasta file and returns a hmmer txt file with ranked homologs
B. Filter homologs by Enzyme Commission (EC) number. This makes sure all of your homologs catalyze the same reaction.
- The phmmer output only contains the accession numbers, not EC numbers or sequences, so you will need to map the accession number to EC numbers.
- Start by making a copy of the phmmer output text file and adding an additional column for ec numbers by mapping accession numbers to the uniref database. [accession_to_ec.py]
- Then filter the original phmmer output text file to only include accession numbers that map to the ec number for your protein. [filter_phmmer_ec.py]
- phmmer_to_fasta.py
- checkuniquespecies.py
- Install mafft
- Command line instructions for running mafft are available on their website with different options for algorithms. L-insi tends to be faster than E-insi
- Example using L-insi and all 128 server threads: mafft --thread 128 --localpair pfk.fasta > pfk.aln
- Follow all the steps for installation, processing, and doing calculations available on the pySCA website https://ranganathanlab.gitlab.io/pySCA/install/
- My code for visualizing and bootstrapping SCA data is on the server at share/PFK_Project/melody/pySCA/data
- New member onboarding
- Lab jobs
- Seminar schedules
- How to order
- Group meeting schedule
- Lab notebooks
- Funding opportunities
- Philosophy of science
- Wet lab basics
- Lab safety
- Waste disposal
- Chemical inventory
- -20C inventory
- Molecular biology
- Buffers and reagents
- Protocols library
- DNA synthesis and primers
- 80C freezer organization
- Using server
- C2B2 HPC access
- Update lab website
- Cluster parallel processing
- Mercury at CUIMC
- Getting started with Rosetta
- Install Pyrosetta
- Tutorials
- Clone Github
- Gromacs-Tutorial
- Cluster Specs
- Deep MSA and Statistical Coupling Analysis
- MMseqs2: Make MSA and analyze taxonomy
- Useful tools