Skip to content

Deep MSA and Statistical Coupling Analysis

Belen Sundberg edited this page Aug 23, 2023 · 6 revisions

1. Compile a database of homologous sequences

  • All of the scripts I used are available in /ifs/scratch/home/mm6732/. They will need to be modified with your correct file paths before running.

A. Run phmmer

  • Download phmmer
  • Phmmer compares a query sequence to a database of protein sequences. The database used to compare sequences to is stored in the server at /ifs/data/glab/uniref90/uniref90.fasta
  • Use the command phmmer -o output.txt query_protein.fasta /path/to/database which takes in an input amino acid sequence fasta file and returns a hmmer txt file with ranked homologs

B. Filter homologs by Enzyme Commission (EC) number. This makes sure all of your homologs catalyze the same reaction.

  • The phmmer output only contains the accession numbers, not EC numbers or sequences, so you will need to map the accession number to EC numbers.
  • Start by making a copy of the phmmer output text file and adding an additional column for ec numbers by mapping accession numbers to the uniref database. [accession_to_ec.py]
  • Then filter the original phmmer output text file to only include accession numbers that map to the ec number for your protein. [filter_phmmer_ec.py]

C. Convert phmmer text file to fasta file with sequences

  • phmmer_to_fasta.py

D. Filter for unique sequences

  • checkuniquespecies.py

2. Make an MSA

  • Install mafft
  • Command line instructions for running mafft are available on their website with different options for algorithms. L-insi tends to be faster than E-insi
  • Example using L-insi and all 128 server threads: mafft --thread 128 --localpair pfk.fasta > pfk.aln

3. Statistical Coupling Analysis

  • Follow all the steps for installation, processing, and doing calculations available on the pySCA website https://ranganathanlab.gitlab.io/pySCA/install/
  • My code for visualizing and bootstrapping SCA data is on the server at share/PFK_Project/melody/pySCA/data
Clone this wiki locally