[BUG] stuck on dds.deseq2() #338

Erikado4 · 2024-11-19T18:16:32Z

Everytime I DeseqDataSet using my own data (as counts/metadata dataframes or straight from my anndata object) or using the test data on the GettingStarted docs, I then try to run dds.deseq2() and the RAM shoots up and the kernel crashes.

The parameters for DeseqDataSet in the tutorials and the current version do not match.

Reallly would've liked to use this tool :(

BorisMuzellec · 2024-11-20T08:14:40Z

Hi @Erikado4, I'm going to need a bit more information to be able to help you.

Could you fill in the bug template below?

Describe the bug
A clear and concise description of what the bug is.
NB: for questions about pydeseq2 that are not related to a bug, please open a topic on the scverse ecosystem Discourse forum.

To Reproduce
Provide snippets of code and steps on how to reproduce the behavior.
Please also specify the version you are using.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 0.02]

Additional context
Add any other context about the problem here.

Erikado4 · 2024-11-20T15:00:23Z

I use the same code from the tutorial:
[https://pydeseq2.readthedocs.io/en/latest/auto_examples/plot_pandas_io_example.html#sphx-glr-auto-examples-plot-pandas-io-example-py]
Loading data and saving results with pandas and pickle

Except design="~condition" is no longer a recognized parameter so I changed it too design_factors="condition".

Then with the final dds.deseq2() it is stuck here indefinitely.

import os
import pickle as pkl

import pandas as pd

from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats

DATA_PATH = "https://raw.githubusercontent.com/owkin/PyDESeq2/main/datasets/synthetic/"
counts_df = pd.read_csv(os.path.join(DATA_PATH, "test_counts.csv"), index_col=0)
print(counts_df)

counts_df = counts_df.T
print(counts_df)

metadata = pd.read_csv(os.path.join(DATA_PATH, "test_metadata.csv"), index_col=0)
print(metadata)

genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]

inference = DefaultInference(n_cpus=8)
dds = DeseqDataSet(
    counts=counts_df,
    metadata=metadata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

dds.deseq2()

pydeseq2 0.4.12
Model: Precision 7960 Tower
OS: Arch Linux x86_64
Kernel: 6.11.5-arch1-1
Shell: zsh

Erikado4 · 2024-11-20T15:45:05Z

Update: I tried to run each step individually from the Step-by-Step tutorial (again with the synthetic data) and it seems to be getting stuck on dds.fit_genewise_dispersions()

BorisMuzellec · 2024-11-21T09:19:02Z

Hi @Erikado4,

Thanks for providing the details.

I'm a bit lost here because I can't reproduce the issue on my machine (Mac OS). I'm assuming this bug has something to do with the fact you're using Arch Linux.

Given where the code is stuck, I think it's either the joblib parallelization that is the problem, or fit_lin_mu, which calls the linear regression from scikit learn (in which case the problem could come from low-level linear algebra librairies like BLAS, but this is a wild guess).

Could you try the code below and tell me if anything different happens?

import os
import pickle as pkl

import pandas as pd

from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats

DATA_PATH = "https://raw.githubusercontent.com/owkin/PyDESeq2/main/datasets/synthetic/"
counts_df = pd.read_csv(os.path.join(DATA_PATH, "test_counts.csv"), index_col=0)
print(counts_df)

counts_df = counts_df.T
print(counts_df)

metadata = pd.read_csv(os.path.join(DATA_PATH, "test_metadata.csv"), index_col=0)
print(metadata)

genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]

inference = DefaultInference(n_cpus=8, backend="threading")
dds = DeseqDataSet(
    counts=counts_df,
    metadata=metadata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

dds.deseq2()

It's the same thing with a different joblib backend (inference = DefaultInference(n_cpus=8, backend="threading")).

Erikado4 · 2024-11-21T15:37:47Z

Erikado4 · 2024-11-21T15:38:08Z

Erikado4 · 2024-11-21T15:39:10Z

Thanks for trying to help out!
I get this error now

BorisMuzellec · 2024-11-29T07:55:47Z

OK, so it seems that the threading backend isn't compatible with the max_num_thread argument that is set in the default inference :/.

I'm a bit at a loss for ideas here. One other thing you could try is keeping the default backend but setting n_cpus=1:

import os
import pickle as pkl

import pandas as pd

from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats

DATA_PATH = "https://raw.githubusercontent.com/owkin/PyDESeq2/main/datasets/synthetic/"
counts_df = pd.read_csv(os.path.join(DATA_PATH, "test_counts.csv"), index_col=0)
print(counts_df)

counts_df = counts_df.T
print(counts_df)

metadata = pd.read_csv(os.path.join(DATA_PATH, "test_metadata.csv"), index_col=0)
print(metadata)

genes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]
counts_df = counts_df[genes_to_keep]

inference = DefaultInference(n_cpus=1)
dds = DeseqDataSet(
    counts=counts_df,
    metadata=metadata,
    design_factors="condition",
    refit_cooks=True,
    inference=inference,
)

dds.deseq2()

Erikado4 added the bug Something isn't working label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] stuck on dds.deseq2() #338

[BUG] stuck on dds.deseq2() #338

Erikado4 commented Nov 19, 2024 •

edited

Loading

BorisMuzellec commented Nov 20, 2024

Erikado4 commented Nov 20, 2024 •

edited by BorisMuzellec

Loading

Erikado4 commented Nov 20, 2024 •

edited

Loading

BorisMuzellec commented Nov 21, 2024 •

edited

Loading

Erikado4 commented Nov 21, 2024

Erikado4 commented Nov 21, 2024

Erikado4 commented Nov 21, 2024

BorisMuzellec commented Nov 29, 2024

[BUG] stuck on dds.deseq2() #338

[BUG] stuck on dds.deseq2() #338

Comments

Erikado4 commented Nov 19, 2024 • edited Loading

BorisMuzellec commented Nov 20, 2024

Erikado4 commented Nov 20, 2024 • edited by BorisMuzellec Loading

Erikado4 commented Nov 20, 2024 • edited Loading

BorisMuzellec commented Nov 21, 2024 • edited Loading

Erikado4 commented Nov 21, 2024

Erikado4 commented Nov 21, 2024

Erikado4 commented Nov 21, 2024

BorisMuzellec commented Nov 29, 2024

Erikado4 commented Nov 19, 2024 •

edited

Loading

Erikado4 commented Nov 20, 2024 •

edited by BorisMuzellec

Loading

Erikado4 commented Nov 20, 2024 •

edited

Loading

BorisMuzellec commented Nov 21, 2024 •

edited

Loading