Skip to content

cellsemantics/Oligo_Promotion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Selective promotion of oligonucleotides in the course of evolution

This repository offers a collection of resources, tools, and scripts created for the research paper titled "Selective promotion of oligonucleotides in the course of evolution" by Bernadette Mathew, Abhishek Halder, Nancy Jaiswal, Smruti Panda, Debjit Pramanik, Sreeram Chandra Murthy Peela, Abhishek Garg, Sadhana Tripathi,Prashant Gupta, Vandana Malhotra, Gaurav Ahuja, Debarka Sengupta.


Contents :

  1. Environment setup: The environment1.yml file is for Python environment generation
  2. Score generations : Folder named " score_generation" contains the scripts for generating two scores essential for this analysis .i.e., the kGain and the LLR scores
  3. Figures: The "figure" folder contains the scripts and data associated with the figures presented in the manuscript and supplementary materials.

Work-flow for implementing the kGain scores :

Oligo Promotion

snip_of_kgain

Here are the steps for computing the kGain score:

1. Generate Sequences with SNV and Flanks:

  • For each variant, create two sequences, each 21 nucleotides long (assuming a k-mer length of 10).
  • One sequence includes the variant allele at the center (11th position), while the other includes the reference allele at the same position.
  • The left and right flanking regions are taken from the corresponding reference genome, depending on the organism.

2. Generate k-mers Using Rolling Windows:

  • For each variant, generate a total of k (k = 10) rolling windows, where each window contains either the reference or alternate allele.
  • Apply the rolling window method twice: once for the reference sequence and once for the sequence with the variant.
  • This results in sets of k-mers for each variant, with each k-mer containing the position of the variant.

3. Compute kGain Score:

  • Track the occurrence of each k-mer across the reference genome.
  • For each k-mer, calculate the fold change between the genomic frequencies of the k-mers containing the alternate allele ($F^{\text{alt}}_{i(v)}$), and the reference allele (F^Ref(i(v))).
  • Compute the kGain score for each variant ($kGain_{v}$) by summing the natural logarithm of the fold changes for each k-mer across all windows.

The kGain score is mathematically represented as:

kgainformula

Where:

  • gain is the score for variant (v).

  • alt is the frequency of the k-mer containing the alternate allele in the (i)th window.

  • ref is the frequency of the k-mer containing the reference allele in the (i)th window.

  • k is the total number of k-mers generated using the rolling window method.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages