This repository offers a collection of resources, tools, and scripts created for the research paper titled "Selective promotion of oligonucleotides in the course of evolution" by Bernadette Mathew, Abhishek Halder, Nancy Jaiswal, Smruti Panda, Debjit Pramanik, Sreeram Chandra Murthy Peela, Abhishek Garg, Sadhana Tripathi,Prashant Gupta, Vandana Malhotra, Gaurav Ahuja, Debarka Sengupta.
Contents :
- Environment setup: The environment1.yml file is for Python environment generation
- Score generations : Folder named " score_generation" contains the scripts for generating two scores essential for this analysis .i.e., the kGain and the LLR scores
- Figures: The "figure" folder contains the scripts and data associated with the figures presented in the manuscript and supplementary materials.
Work-flow for implementing the kGain scores :
Oligo Promotion
Here are the steps for computing the kGain score:
- For each variant, create two sequences, each 21 nucleotides long (assuming a k-mer length of 10).
- One sequence includes the variant allele at the center (11th position), while the other includes the reference allele at the same position.
- The left and right flanking regions are taken from the corresponding reference genome, depending on the organism.
- For each variant, generate a total of k (k = 10) rolling windows, where each window contains either the reference or alternate allele.
- Apply the rolling window method twice: once for the reference sequence and once for the sequence with the variant.
- This results in sets of k-mers for each variant, with each k-mer containing the position of the variant.
- Track the occurrence of each k-mer across the reference genome.
- For each k-mer, calculate the fold change between the genomic frequencies of the k-mers containing the alternate allele (
$F^{\text{alt}}_{i(v)}$ ), and the reference allele (F^Ref(i(v))). - Compute the kGain score for each variant (
$kGain_{v}$ ) by summing the natural logarithm of the fold changes for each k-mer across all windows.
The kGain score is mathematically represented as:
Where: