overcoming memory issue with gseapy.algorithm.enrichment_score #146

oesterei · 2022-02-15T14:25:20Z

oesterei
Feb 15, 2022

Hello,

My work frequently involved generating a null distribution of NES values for a given reference signature. I kept running into the following error after 50-100 query sets when using prerank().

MemoryError: Unable to allocate 22.7 MiB for an array with shape (1001, 29396) and data type float64

Thank you for creating gseapy.algorithm.enrichment_score, which allowed me to run all 1000 query sets without encountering the memory error. However, gseapy.algorithm.enrichment_score did not generate a NES or nominal p-value from each query set like prerank(), so I expanded the code per the Subramanian 2005 paper (https://www.pnas.org/content/pnas/102/43/15545.full.pdf). I offer it here for review and possible incorporation into the master code for gseapy.algorithm.enrichment_score.

CODE:
import os
import numpy as np
import pandas as pd
import gseapy as gp

def prerankGSEA(refsigdf, queryset, path):
randomNES = []
count = 0
gene_list = list(refsigdf.index.values)
correl_vector = list(refsigdf['Tscore'])
for index, row in queryset.iterrows():
es, esnull, hit_ind, RES = gp.algorithm.enrichment_score(gene_list,
correl_vector, row, weighted_score_type=1, nperm=1000)
negesnull = []
posesnull = []
for i in esnull:
if i < 0:
negesnull.append(i)
else:
posesnull.append(i)
if es < 0:
meannegesnull = np.mean(negesnull)
NES = es / abs(meannegesnull)
else:
meanposesnull = np.mean(posesnull)
NES = es / abs(meanposesnull)
if count % 20 == 0:
print('Item being analyzed = {}'.format(count))
randomNES.append(NES)
count = count + 1
tempdf = pd.DataFrame(randomNES, columns=["column"])
saveloc = path + '\randomNES.txt'
tempdf.to_csv(saveloc, index=False)
return randomNES

#Import files
verifysigdf = pd.read_csv('Tscoredata1.txt', low_memory=False, delimiter = "\t", index_col=0)
verifysigdf.drop(verifysigdf.columns[1], inplace=True, axis=1)
randomsigdf = pd.read_csv('randompaneldata.gmt', low_memory=False, delimiter = "\t", header=0, index_col=0)
randomsigdf.drop(randomsigdf.columns[0], inplace=True, axis=1)

#Random model
cwd = os.getcwd()
path = cwd + "\output"
dfrandomNES = prerankGSEA(verifysigdf, randomsigdf, path)

Tscoredata1.txt
randompaneldata.txt
NOTE: randompaneldata.txt should be renamed randompaneldata.gmt prior to use

Full implementation can be found in the GeneCompare.zip file located here: https://github.com/oesterei/GSEABasedPrograms

zqfang · 2022-07-26T23:19:07Z

zqfang
Jul 26, 2022
Maintainer

This is now fixed by the Rust Binding of GSEApy (>=v0.13.0)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overcoming memory issue with gseapy.algorithm.enrichment_score #146

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

overcoming memory issue with gseapy.algorithm.enrichment_score #146

oesterei Feb 15, 2022

Replies: 1 comment

zqfang Jul 26, 2022 Maintainer

oesterei
Feb 15, 2022

zqfang
Jul 26, 2022
Maintainer