Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entrez search support? #8

Open
peterjc opened this issue Oct 24, 2018 · 3 comments
Open

Entrez search support? #8

peterjc opened this issue Oct 24, 2018 · 3 comments

Comments

@peterjc
Copy link

peterjc commented Oct 24, 2018

This is outside the current scope of the tool, but would you consider adding NCBI Entrez search support as an alternative to supplying the accessions directly?

e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=opuntia%5BORGN%5D+accD&retmax=10&idtype=acc

(In Biopython, handle = Entrez.esearch(db="nucleotide", retmax=10, term="opuntia[ORGN] accD", idtype="acc") or similar)

This currently gives three accessions, EF590893.1, EF590892.1, HQ620723.1, which I can download with:

$ ncbi-acc-download EF590893.1 EF590892.1 HQ620723.1

I would like to be able to do something this to achieve the same result:

$ ncbi-acc-download -search "opuntia[ORGN] accD" -retmax 10
@peterjc
Copy link
Author

peterjc commented Nov 19, 2018

Test example,

$ conda install entrez-direct
$ esearch -db nucleotide -query "its1 AND Phytophthora[Organism] AND 150:800[Sequence Length]"
 | efetch -format fasta > ncbi_sample.fasta
$ grep -c "^>" ncbi_sample.fasta 
2246
$ grep "^>" /tmp/ncbi_sample.fasta  | head
>MG255148.1 Phytophthora palmivora isolate TARI p98158 18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence
>LC159493.1 Phytophthora drechsleri genes for ITS1, 5.8S rRNA, ITS2, partial and complete sequence, isolate: PhWa20140918-2
>LC159492.1 Phytophthora drechsleri genes for ITS1, 5.8S rRNA, ITS2, 28S rRNA, partial and complete sequence, isolate: PhWa20140918-1
>LS479897.1 Phytophthora capsici genomic DNA sequence contains 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2, 28S rRNA gene, strain LL2480
>LS479193.1 Phytophthora infestans genomic DNA sequence contains 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2, 28S rRNA gene, strain XD15
>LS479173.1 Phytophthora infestans genomic DNA sequence contains 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2, 28S rRNA gene, strain 80029
>LS479172.1 Phytophthora infestans genomic DNA sequence contains 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2, 28S rRNA gene, strain 88069
>LS479171.1 Phytophthora infestans genomic DNA sequence contains 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2, 28S rRNA gene, strain XA-4
>LS479169.1 Phytophthora infestans genomic DNA sequence contains 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2, 28S rRNA gene, strain DN111
>LS479127.1 Phytophthora infestans genomic DNA sequence contains 18S rRNA gene, ITS1, 5.8S rRNA gene, ITS2, 28S rRNA gene, strain XD1314

@kblin
Copy link
Owner

kblin commented Jan 4, 2019

Just realized I forgot to comment on this, sorry about that. I'm not sure that this is the direction I want to go with the tool. I'll have to think about this a bit more.

@peterjc
Copy link
Author

peterjc commented Jan 4, 2019

That's fine - I appreciate this is a shift in focus.

I can do what I want to easily with entrez-direct, but it is not reliable. For example, it frequently gives partial downloads (not all the records) at busy times, and based on continuous integration results, does not return an error code in this situation.

I was thinking expanding your tool made sense because of your existing sanity checking (number of records returned, basic formatting, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants