Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpickling Error in gene_set_path #76

Open
bioinfonewguy opened this issue Jul 2, 2024 · 3 comments
Open

Unpickling Error in gene_set_path #76

bioinfonewguy opened this issue Jul 2, 2024 · 3 comments

Comments

@bioinfonewguy
Copy link

Hello,

Thanks for your help on my previous query.

I'm currently working with the replogle_rpe1_essential dataset, and I have added a few unseen genes just to assess how well the model performs with them. My CSV contains all of the genes in the replogle dataset and the new genes I am interested in.

However, I am getting the following error:

`from gears import PertData, GEARS

Initialize PertData with your custom gene list

pert_data = PertData('./data', gene_set_path='/path/to/gene_list.csv')

Load the dataset

pert_data.load(data_name='replogle_rpe1_essential')

Prepare the data split

pert_data.prepare_split(split='simulation', seed=1)

Create dataloaders

pert_data.get_dataloader(batch_size=32, test_batch_size=128)

Initialize the GEARS model

gears_model = GEARS(pert_data, device='cpu') # Use 'cpu' for Mac, or 'cuda' if you have a compatible GPU

Initialize the model architecture

gears_model.model_initialize(hidden_size=64, uncertainty=True)
Found local copy...
Found local copy...
Traceback (most recent call last):

Cell In[2], line 7
pert_data.load(data_name='replogle_rpe1_essential')

File /opt/anaconda3/envs/pyg_env/lib/python3.9/site-packages/gears/pertdata.py:183 in load
self.set_pert_genes()

File /opt/anaconda3/envs/pyg_env/lib/python3.9/site-packages/gears/pertdata.py:109 in set_pert_genes
essential_genes = pickle.load(f)

UnpicklingError: invalid load key, 'A'.`

@jackbrougher
Copy link

Not 100% positive because of the formatting above, but I believe I'm hitting the same issue.

I tried changing the set_pert_genes to read in a dataframe, but no luck. I'm thinking the error is here, but unable to pin down how to change it

def set_pert_genes(self):
       """
       Set the list of genes that can be perturbed and are to be included in 
       perturbation graph
       """
       
       if self.gene_set_path is not None:
           # If gene set specified for perturbation graph, use that
           path_ = self.gene_set_path
           self.default_pert_graph = False
           with open(path_, 'rb') as f:
               essential_genes = pickle.load(f)
           

@toby-clark4
Copy link

If you're still having this issue I think could because you are importing a .csv rather than a .pkl.

The format of the default GO is a dictionary of sets with genes and their GO terms:

image

So, if you add your genes of interest with their terms and save with pickle.dump it should work. This code runs without errors:

image

@bioinfonewguy
Copy link
Author

Thanks, @toby-clark4 - this seems like it should work. Just to ensure, the new list for the pkl file should be all of the genes from the pert dataset (repogle) + any genes of interest, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants