-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persisten temp folder not getting removed #11
Comments
I suggest using https://docs.python.org/3/library/tempfile.html which will automatically clean up for you at end of process. I note that assemblerflow manually attempts to clean it up with |
@kristyhoran The temp folder is persisting because for some reason there was an error. Standard chewBBACA behavior is to not remove the temp folder when an error occurs because if it occurs in the middle of an allele call it allows the user to resume that allele call, instead of starting over. For this reason using python tempfile would remove this feature, do you agree @tseemann ? @kristyhoran considering your error, are you giving chewBBACA |
@mickaelsilva Thanks for your reply, The -g input is the directory, not a list of files. I am trying to use chewBBACA in a nextflow pipeline, which runs multiple isolates in parallel, using a symlink to the schema directory. I have attempted --fr, which raises the error temp folder not found and --fc which raises the error mentioned above. This is not a problem if I run chewbbaca as a standalone command, it is only a problem when using it within a nextflow pipeline, since there are multiple processes accessing the same schemaDir. I have also tried to remove the temp folder manually at the start of the pipeline, however, since other processes are potentially accessing the folder at the same time this also raises an error. One possible solution would be to copy the schema Dir into the process working dir, but this seems redundant and also very time consuming when working with a large number of isolates. I understand the attractiveness of being able to resume an allele call instead of starting over and this is not probably a problem for using chewbbaca alone, but it does raise the issue of running it within a pipeline. I am not sure what the optimal solution would be, but appreciate any suggestions or help you may have. Regards Kristy |
The key questions are:
|
Just a quick note that we are working on this and this issue is not forgotten |
Good to hear that there is work being done on this. If it is indeed being released soon, then I can skip my patches that I was working on with using a .lockfile to ensure that only one chewbbaca instance was using the database at any one time. But a question arises regarding how you are implementing this. Often when batch analysing data there will be ~identical isolates from a cluster; if two processes are running concurrently and the first one finds a new allele and subsequently commits it to the file database in a final batch step, and then the second process, which is just seconds behind the first one, reaches the same "new" allele - then it will assign a new allele number to the same allele that process 1 hasn't had the time to write to the files yet. Or do you have a check for this somehow? There is a beauty in the simple file based approach, but I fear data corruption is looming. Or maybe I misunderstood, and you are abandoning the files for a database instead? |
Hello Jonas |
OK, patching with a lock file for now then. Thank you for the update |
We will close this issue when we have the new version implemented. |
A long overdue update on this. Although we are still working on decoupling allele identification and commitment of new alleles to the local database (more for the purpose of allowing user to run test batches or sets of genomes of uncertain provenance and quality without "polluting" their database), I would also like to call you attention to chewie-NS. The idea here is that every user will have its how local instance of chewBBACA that can be synced with the public or private instance of chewie-NS when necessary or required. This means that even within the same institution different users can work independently and only adopt a common nomenclature through synchronization with chewie-NS when needed. This approach addresses the issue raised by @tseemann although a future version of chewBBACA will indeed be able to perform allele call without changes to the local database as suggested above. |
Another quick update on this issue. For those of you interested in this please explore the option of running chewBBACA without identifying or storing novel alleles. You can do this by exploring the |
hi @jacarrico,
I've been trying to implement chewBBACA in a nextflow pipeline, using a schema generated with PrepExternalSchema (separate to the nextflow pipeline). When I run AlleleCall a temp folder is created in my schemaDirectory, this seems to lead to error in subsequent runs with this schema. It raises a ValueError
ValueError: '/listeria_db/lmo1074.fasta' is not in list
it should be noted that lmo1074.fasta is not an actual file in the listeria_db directory. So I am a). unsure where this file name even comes from and b). why the temp folder is persisting. I am using chewBBACA version 2.0.8.Thanks in advance for your time, I appreciate any help that you can give me.
Regards
Kristy Horan
The text was updated successfully, but these errors were encountered: