-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty batch error loading training data into memory #33
Comments
batch
error loading training data into memory
batch
error loading training data into memory
Hey! Can you maybe paste the dataspec.yaml here and make sure the chromosome names are consistent across bigwig/bed/fasta files? |
Hi! Thanks for the response.
However, I have tried with a reduced dataspec as well (also moved the peak bed file into the inputs directory and zipped, just in case that was somehow an issue):
|
Try to debug the dataset by loading the bpnet.datasets.StrandedProfile using the file spec. You can obtain dataspec Also, maybe try switching to absolute paths (although I doubt this will help). I think I know what could possibly be wrong: Note that you should also adjust the held-out chromosomes here. |
So I can successfully load the yaml with
|
I did some digging and it seems the issue is coming from the
|
Hi there,
I am currently trying to train BPNet on some human ChIP-seq data we have for an RNA Pol III transcription factor. We have multiple cell lines that I would like to use as tasks for the model. I guess it's important to note that Pol III has very few targets in the human genome (and so does this TF we are working with, most of which overlap with Pol III targets), which mostly include tRNA genes, 5S rRNA and a few others. In total we get around 400 peaks in our stem cell model (similar to what others have seen), and we see great resolution and enrichment at expected sites, so we are sure the data is good.
I have set up BPNet in a conda environment and am running it from a notebook using that environment as the kernel. I am able to train on the test chip-seq data packaged with BPNet using this setup. However, when I try to use our data I get the following error:
I manually edited that
data_utils.py
script from kipoi to print thebatch
variable. Turns out its an empty list, hence theIndexError
. While with the BPNet test data it is correctly filled with one-hot data for the sequences and count data. I have checked the input data and can't find anything obviously wrong, it all looks quite similar to the BPNet data. I generated the stranded bigwig files as per the FAQ section of the example notebook, and summit files come from macs2.Any help would be much appreciated!
Drew
The text was updated successfully, but these errors were encountered: