Skip to content

Latest commit

 

History

History
143 lines (121 loc) · 5.93 KB

README.md

File metadata and controls

143 lines (121 loc) · 5.93 KB

DOI

Sequence analysis of influenza hemagglutinin (HA) antibodies

This README describes the analysis in:
An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies

Contents

Env setup

if you set up env using conda, run conda installation as follow:

conda env create -f Ab_epitope/environment.yml

Dataset

CDR H3 analysis

  1. Extract CDR H3 sequences and references
    python3 script/parse_Ab_table.py

  2. Clustering CDR H3 sequences
    python3 script/CDRH3_clustering_optimal.py

  3. Analyzing CDR H3 clustering results
    python3 script/analyze_CDRH3_cluster.py

  4. Analyzing CDR H3 property
    python3 script/analyze_CDRH3_property.py

  5. Create sequence logos for different CDR H3 clusters
    python3 script/CDRH3_seqlogo.py

  6. Plot CDR H3 property for HA head and stem antibodies
    Rscript script/plot_CDRH3_property.R

Germline usage analysis

  1. Clonotype assignment
    python3 script/assign_clonotype.py

  2. Compute germline usag and extract public clonotype
    python3 script/extract_public_clonotype_VDJ.py

  3. Extract IGHD4-17-encoded head antibodies
    python3 script/analyze_IGHD4-17.py

  4. Analyzing the occurrence of YGD motif in CDR H3
    python3 script/analyze_YGD_motif.py

  5. Plot VDJ gene usage
    Rscript script/plot_VDJgene_freq.R

  6. Plot IGHV/IGK(L)V pairing frequency
    Rscript script/plot_Vpair_heatmap.R

  7. Plot frequency of YGD motif
    Rscript script/plot_YGD_freq.R

mBLM for specificity prediction

See Ab_epitope