Skip to content

Analysis code for Why do pathway methods work better than they should?

License

Notifications You must be signed in to change notification settings

bence-szalai/why-pathway-methods-work

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis code for: Why do pathway methods work better than they should?

Code repository for Why do pathway methods work better than they should? on bioRxiv.

Abstract

Different pathway analysis methods are frequently applied to cancer gene expression data to identify dysregulated pathways. In most cases these methods infer pathway activity changes based on the gene expression of pathway members. However, pathways are constituted by signaling proteins, and their activity - not their abundance - defines the activity of the pathway; the association between gene expression and protein activity is in turn limited and not well characterised. Other methods infer pathway activity from the expression of the genes whose transcription is regulated by the pathway of interest, which seems a more adequate proxy of activity. Despite these potential limitations, membership based pathway methods are frequently used and often provide statistically significant results.

Here, we submit that pathway based methods are not effective because of the correlation between the gene expression of pathway members and the activity of the pathway, but because pathway member gene sets overlap with the genes regulated by transcription factors (regulons). This implies that pathway methods do not inform about the activity of the pathway of interest, but instead the downstream effects of changes in the activities of transcription factors.

To support our hypothesis, we show that the higher the overlap to transcription factor regulons, the higher the information value of pathway gene sets. Furthermore, removing these overlapping genes reduces the information content of pathway gene sets, but not vice versa. Our results suggest that results of classical pathway analysis methods should be interpreted with caution, and instead methods using pathway regulated genes for activity inference should be prioritised.

Description of notebooks

  • 1_data_preprocessing_and_generation.ipynb
  • 2_gene_set_similarity.ipynb
  • 3_informative_scores.ipynb
  • 4_informative_vs_similarity.ipynb
  • 5_overlap_removed.ipynb

Used libraries:

About

Analysis code for Why do pathway methods work better than they should?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages