Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in step 3. Combining and annotating the blast files with orthogroup info ... #148

Open
htorrado opened this issue Apr 5, 2024 · 11 comments

Comments

@htorrado
Copy link

htorrado commented Apr 5, 2024

Hello jtlovell,

Thanks for all your work on this package!

I'm using GENESPACE v1.3.1 in a conda environment with R 4.1.2 and have it successfully for your test dataset so it should all be working as intended.
When I use my own dataset, it all starts off well, e.g. all geneIDS are recognized (all exactly match), etc. but then in step 3, I receive the error message below and was hoping you may have any ideas or suggestions how I could resolve this and proceed.

Combining and annotating the blast files with orthogroup info ...# Chunk 1 / 1 (10:47:03) ...
Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  Item 1 of input is not a data.frame, data.table or list
In addition: Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  all scheduled cores encountered errors in user code

When I try to run it with just one core the error becomes:

Combining and annotating the blast files with orthogroup info ...
Error in vecseq(f, len, if (allow.cartesian  notjoin  !anyDuplicated(f__,  :
  Join results in 4687192 rows; more than 907698 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

The second error message seems related with the merge on step 2.3 on the annotate_blast command (merge with bed information). I tried to add "allow.cartesian" to that merge function but the full R session gets killed if it's just 1 core and if it's parallelized (using 10 cores) I receive the following error message:

Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  scheduled cores 1, 2 did not deliver results, all values of the jobs will be affected

This may lead to a merge file that is not compatible with some of your code below... ?

Thank you very much in advance!!
Best,
Héctor

@jkfo002
Copy link

jkfo002 commented Apr 9, 2024

I meet same problem with @htorrado, dose it has any suggestion?

@jtlovell
Copy link
Owner

jtlovell commented Apr 9, 2024

What version are y'all using? This is an error that popped up every now and then with duplicated gene IDs in <v1.1, but I had hoped I'd resolved it.
The difference in errors between 1 and >1 cores is a function of how R handles parallelization. You are right, its an issue with the merge.

@htorrado
Copy link
Author

I'm using GENESPACE v1.3.1 in R 4.1.2

Thanks for your help!

@jkfo002
Copy link

jkfo002 commented Apr 10, 2024

I'm using GENESPACE v1.3.1 and R 4.2.0.

I have tried to filter fragment scaffolds and now it seemed run successfully, does it could be this reason?

@jtlovell
Copy link
Owner

@jkfo002 thanks for troubleshooting that. GENESPACE should deal with these without an issue. This is clearly a bug and needs to be fixed. Would you mind sharing your input /bed and /peptide directories from the run that caused the error? If so, please send me an email and we'll set up a private data transfer. email: jlovell[at]hudsonalpha[dot]org

@jkfo002
Copy link

jkfo002 commented Apr 12, 2024

@jtlovell Sorry for late reply, I have sent the data to you. By the way, could GENESPACE construct gene synteny in local region for multiple genome?

@jtlovell
Copy link
Owner

np. I'll try to get to it next week.
Re: local synteny ... do you mean something like this (Fig. 5.2 here).

@jkfo002
Copy link

jkfo002 commented Apr 18, 2024

Actually....no, I think I need a zoom in on the small region of chromosome and see the synteny of gene cluster (maybe). @jtlovell

@goshng
Copy link

goshng commented May 18, 2024

I have the same issue. Any luck?

@goshng
Copy link

goshng commented May 18, 2024

For me, I have the error in the test example as well. Here is a tail of the output. Thank you!

        ...human  : 468 genes in 15 OGs hit > 8 unique places
        ##############
        Annotation summaries (after exclusions):
        ...chicken: 17433 genes in 15257 OGs || 2158 genes in 423 arrays
        ...human  : 20205 genes in 15979 OGs || 3460 genes in 853 arrays

############################
3. Combining and annotating the blast files with orthogroup info ...
        # Chunk 1 / 1 ...
Error in rbindlist(mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  Item 1 of input is not a data.frame, data.table or list
In addition: Warning message:
In mclapply(1:nrow(chnk), mc.cores = nCores, function(i) { :
  all scheduled cores encountered errors in user code
> ls()
[1] "genomeRepo"   "gpar"         "gsParam"      "parsedPaths"  "path2mcscanx"
[6] "rawFiles"     "wd"
> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

...

other attached packages:
[1] GENESPACE_1.3.1

@jtlovell
Copy link
Owner

What orthofinder version are you using? I've seen this issue pop up from other users but have been unable to recreate it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants