overcoming memory issue with gseapy.algorithm.enrichment_score #146
oesterei
started this conversation in
Show and tell
Replies: 1 comment
-
This is now fixed by the Rust Binding of GSEApy (>=v0.13.0) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
My work frequently involved generating a null distribution of NES values for a given reference signature. I kept running into the following error after 50-100 query sets when using prerank().
MemoryError: Unable to allocate 22.7 MiB for an array with shape (1001, 29396) and data type float64
Thank you for creating gseapy.algorithm.enrichment_score, which allowed me to run all 1000 query sets without encountering the memory error. However, gseapy.algorithm.enrichment_score did not generate a NES or nominal p-value from each query set like prerank(), so I expanded the code per the Subramanian 2005 paper (https://www.pnas.org/content/pnas/102/43/15545.full.pdf). I offer it here for review and possible incorporation into the master code for gseapy.algorithm.enrichment_score.
CODE:
import os
import numpy as np
import pandas as pd
import gseapy as gp
def prerankGSEA(refsigdf, queryset, path):
randomNES = []
count = 0
gene_list = list(refsigdf.index.values)
correl_vector = list(refsigdf['Tscore'])
for index, row in queryset.iterrows():
es, esnull, hit_ind, RES = gp.algorithm.enrichment_score(gene_list,
correl_vector, row, weighted_score_type=1, nperm=1000)
negesnull = []
posesnull = []
for i in esnull:
if i < 0:
negesnull.append(i)
else:
posesnull.append(i)
if es < 0:
meannegesnull = np.mean(negesnull)
NES = es / abs(meannegesnull)
else:
meanposesnull = np.mean(posesnull)
NES = es / abs(meanposesnull)
if count % 20 == 0:
print('Item being analyzed = {}'.format(count))
randomNES.append(NES)
count = count + 1
tempdf = pd.DataFrame(randomNES, columns=["column"])
saveloc = path + '\randomNES.txt'
tempdf.to_csv(saveloc, index=False)
return randomNES
#Import files
verifysigdf = pd.read_csv('Tscoredata1.txt', low_memory=False, delimiter = "\t", index_col=0)
verifysigdf.drop(verifysigdf.columns[1], inplace=True, axis=1)
randomsigdf = pd.read_csv('randompaneldata.gmt', low_memory=False, delimiter = "\t", header=0, index_col=0)
randomsigdf.drop(randomsigdf.columns[0], inplace=True, axis=1)
#Random model
cwd = os.getcwd()
path = cwd + "\output"
dfrandomNES = prerankGSEA(verifysigdf, randomsigdf, path)
Tscoredata1.txt
randompaneldata.txt
NOTE: randompaneldata.txt should be renamed randompaneldata.gmt prior to use
Full implementation can be found in the GeneCompare.zip file located here: https://github.com/oesterei/GSEABasedPrograms
Beta Was this translation helpful? Give feedback.
All reactions