Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] stuck on dds.deseq2() #338

Open
Erikado4 opened this issue Nov 19, 2024 · 8 comments
Open

[BUG] stuck on dds.deseq2() #338

Erikado4 opened this issue Nov 19, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@Erikado4
Copy link

Erikado4 commented Nov 19, 2024

Everytime I DeseqDataSet using my own data (as counts/metadata dataframes or straight from my anndata object) or using the test data on the GettingStarted docs, I then try to run dds.deseq2() and the RAM shoots up and the kernel crashes.

The parameters for DeseqDataSet in the tutorials and the current version do not match.

Reallly would've liked to use this tool :(

@Erikado4 Erikado4 added the bug Something isn't working label Nov 19, 2024
@BorisMuzellec
Copy link
Collaborator

Hi @Erikado4, I'm going to need a bit more information to be able to help you.

Could you fill in the bug template below?

Describe the bug
A clear and concise description of what the bug is.
NB: for questions about pydeseq2 that are not related to a bug, please open a topic on the scverse ecosystem Discourse forum.

To Reproduce
Provide snippets of code and steps on how to reproduce the behavior.
Please also specify the version you are using.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 0.02]

Additional context
Add any other context about the problem here.

@Erikado4
Copy link
Author

Erikado4 commented Nov 20, 2024

I use the same code from the tutorial:
[https://pydeseq2.readthedocs.io/en/latest/auto_examples/plot_pandas_io_example.html#sphx-glr-auto-examples-plot-pandas-io-example-py]
Loading data and saving results with pandas and pickle

Except design="~condition" is no longer a recognized parameter so I changed it too design_factors="condition".

Then with the final dds.deseq2() it is stuck here indefinitely.

image

import os
import pickle as pkl

import pandas as pd

from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats

DATA_PATH = "https://raw.githubusercontent.com/owkin/PyDESeq2/main/datasets/synthetic/"
counts_df = pd.read_csv(os.path.join(DATA_PATH, "test_counts.csv"), index_col=0)
print(counts_df)

counts_df = counts_df.T
print(counts_df)

metadata = pd.read_csv(os.path.join(DATA_PATH, "test_metadata.csv"), index_col=0)
print(metadata)

genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]

inference = DefaultInference(n_cpus=8)
dds = DeseqDataSet(
    counts=counts_df,
    metadata=metadata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

dds.deseq2()

pydeseq2 0.4.12
Model: Precision 7960 Tower
OS: Arch Linux x86_64
Kernel: 6.11.5-arch1-1
Shell: zsh

@Erikado4
Copy link
Author

Erikado4 commented Nov 20, 2024

Update: I tried to run each step individually from the Step-by-Step tutorial (again with the synthetic data) and it seems to be getting stuck on dds.fit_genewise_dispersions()

image

@BorisMuzellec
Copy link
Collaborator

BorisMuzellec commented Nov 21, 2024

Hi @Erikado4,

Thanks for providing the details.

I'm a bit lost here because I can't reproduce the issue on my machine (Mac OS). I'm assuming this bug has something to do with the fact you're using Arch Linux.

Given where the code is stuck, I think it's either the joblib parallelization that is the problem, or fit_lin_mu, which calls the linear regression from scikit learn (in which case the problem could come from low-level linear algebra librairies like BLAS, but this is a wild guess).

Could you try the code below and tell me if anything different happens?

import os
import pickle as pkl

import pandas as pd

from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats

DATA_PATH = "https://raw.githubusercontent.com/owkin/PyDESeq2/main/datasets/synthetic/"
counts_df = pd.read_csv(os.path.join(DATA_PATH, "test_counts.csv"), index_col=0)
print(counts_df)

counts_df = counts_df.T
print(counts_df)

metadata = pd.read_csv(os.path.join(DATA_PATH, "test_metadata.csv"), index_col=0)
print(metadata)

genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]

inference = DefaultInference(n_cpus=8, backend="threading")
dds = DeseqDataSet(
    counts=counts_df,
    metadata=metadata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

dds.deseq2()

It's the same thing with a different joblib backend (inference = DefaultInference(n_cpus=8, backend="threading")).

@Erikado4
Copy link
Author

image

@Erikado4
Copy link
Author

image

@Erikado4
Copy link
Author

Thanks for trying to help out!
I get this error now

@BorisMuzellec
Copy link
Collaborator

OK, so it seems that the threading backend isn't compatible with the max_num_thread argument that is set in the default inference :/.

I'm a bit at a loss for ideas here. One other thing you could try is keeping the default backend but setting n_cpus=1:

import os
import pickle as pkl

import pandas as pd

from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats

DATA_PATH = "https://raw.githubusercontent.com/owkin/PyDESeq2/main/datasets/synthetic/"
counts_df = pd.read_csv(os.path.join(DATA_PATH, "test_counts.csv"), index_col=0)
print(counts_df)

counts_df = counts_df.T
print(counts_df)

metadata = pd.read_csv(os.path.join(DATA_PATH, "test_metadata.csv"), index_col=0)
print(metadata)

genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]

inference = DefaultInference(n_cpus=1)
dds = DeseqDataSet(
    counts=counts_df,
    metadata=metadata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

dds.deseq2()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants