-
Notifications
You must be signed in to change notification settings - Fork 2
/
Genbank_query.pm
executable file
·1339 lines (822 loc) · 40.6 KB
/
Genbank_query.pm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
package main;
our $SEE;
package Genbank_query;
use strict;
require Exporter;
our @ISA = qw (Exporter);
our @EXPORT = qw (ESearch EPost ESummary EFetch ELink);
=head1 NAME
package Genbank_query
=cut
=head1 DESCRIPTION
This package includes the E-utilities provided by Genbank.
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
=over 4
=item Utility Listing
*B<ESearch:> Searches and retrieves primary IDs (for use in EFetch, ELink and ESummary) and term translations, and optionally retains results for future use in the user\'s environment.
*B<EPost>: Posts a file containing a list of primary IDs for future use in the user\'s environment to use with subsequent search strategies.
*B<ESummary:> Retrieves document summaries from a list of primary IDs or from the user\'s environment.
*B<EFetch:> Retrieves records in the requested format from a list of one or more primary IDs or from the user\'s environment.
*B<ELink:> Checks for the existence of an external or Related Articles link from a list of one or more primary IDs. Retrieves primary IDs and relevancy scores for links to Entrez databases or Related Articles; creates a hyperlink to the primary LinkOut provider for a specific ID and database, or lists LinkOut URLs and Attributes for multiple IDs.
=back
User System Requirements
Do not overload NCBI\'s systems. Users intending to send numerous queries and/or retrieve large numbers of records from Entrez should comply with the following:
* Run retrieval scripts on weekends or between 9 PM and 5 AM ET weekdays for any series of more than 100 requests.
* Make no more than one request every 3 seconds.
* NCBI\'s Disclaimer and Copyright notice must be evident to users of your service. NLM does not claim the copyright on the abstracts in PubMed; however, journal publishers or authors may. NLM provides no legal advice concerning distribution of copyrighted materials, consult your legal counsel.
Database primary IDs:
Genome Genome ID
Nucleotide GI number
OMIM MIM number
PopSet GI number
Protein GI number
PubMed PMID
Structure MMDB ID
Taxonomy TAXID
=cut
;
######################################################################################################################
=head1 DESCRIPTION
ESearch
Last updated: July 21, 2003
ESearch: Searches and retrieves primary IDs (for use in EFetch, ELink and ESummary) and term translations, and optionally retains results for future use in the user\'s environment.
* URL Parameters
o Database
o History Web Environment Query_key Tool E-mailAddress
+ PubMed
# Search Terms Search Field Relative Dates Date Ranges Date Type Display Numbers Retrieval Mode Retrieval Type Examples Sample XML Retrieval
+ Journals
* User System Requirements
* Entrez DTDs
* Demonstration Program
* Announcement Mailing List
* Help Desk
URL parameters:
Utility parameters may be case sensitive, therefore, use lower case characters in all parameters except for WebEnv.
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
Database:
db=database name
Current database values by category:
Literature databases:
omim - The OMIM database including the collective data is the property of the Johns Hopkins University, which holds the copyright.
pubmed - Journal publishers hold the copyright on the abstracts in PubMed. NLM provides no legal advice concerning distribution of copyrighted materials.
journals
Sequence databases:
genome
nucleotide
protein
popset
3D database:
structure
Taxonomy database:
taxonomy
History: Requests utility to maintain results in user\'s environment. Used in conjunction with WebEnv.
usehistory=y
Web Environment: Value previously returned in XML results from ESearch or EPost. This value may change with each utility call. If WebEnv is used, History search numbers can be included in an ESummary URL, e.g., term=cancer+AND+%23X (where %23 replaces # and X is the History search number).
WebEnv=WgHmIcDG]B\`>> etc.
Query_key: The value used for a history search number or previously returned in XML results from ESearch or EPost.
query_key=6
Note: WebEnv is similar to the cookie that is set on a user\'s computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary or EFetch URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query, but a sample URL would be as follows:
http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed
&WebEnv=%3D%5DzU%5D%3FIJIj%3CC%5E%5DA%3CT%5DEACgdn%3DF%5E%3Eh
GFA%5D%3CIFKGCbQkA%5E_hDFiFd%5C%3D
&query_key=6&retmode=html&rettype=medline&retmax=15
Tool: A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant \'tool\' argument for all requests using the utilities.
tool=
E-mail Address: If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.
email=
PubMed:
Search terms: This command uses search terms or phrases with or without Boolean operators. See the PubMed Help for information about search term qualification.
term=search strategy
For example:
term=asthma[mh]+OR+hay+fever[mh]
You may also qualify search terms using field=qualifier.
Search Field: Use this command to specify a specific search field.
field=
PubMed fields: affl, auth, ecno, jour, iss, mesh, majr, mhda, page, pdat, ptyp, si, subs, subh, tiab, word, titl, lang, uid, fltr, vol
Relative Dates: Limit items a number of days immediately preceding today\'s date.
reldate=
For example:
reldate=90
reldate=365
Date Ranges: Limit results bounded by two specific dates. Both mindate and maxdate are required if date range limits are applied using these variables.
mindate=
maxdate=
For example:
mindate=2001
maxdate=2002/01/01
Date Type: Limit dates to a specific date field based on database.
datetype=
For example:
datetype=edat
Display Numbers:
retstart=x (x= sequential number of the first record retrieved - default=0 which will retrieve the first record)
retmax=y (y= number of items retrieved)
Retrieval Mode:
retmode=xml
Use your web browser\'s View Page Source function to display results.
Retrieval Type:
rettype=
PubMed values:
count
uilist (default)
Examples:
Search in PubMed for the term cancer for the entrez date from the last 60 days and retrieve the first 100 IDs and translations using the history parameter:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=60&datetype=edat&retmax=100&usehistory=y
Search in PubMed for the journal PNAS Volume 97, and retrieve 6 IDs starting at ID 7 using a tool parameter:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retstart=6&retmax=6&tool=biomed3
Search in Journals for the term obstetrics:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=journals&term=obstetrics
=cut
;
=over 4
=item ESearch()
B<Description:> Search specified GenBank Database using search parameters.
B<Parameters:> ($UserAgent, $URLvals_href)
$UserAgent is a LWP::UserAgent object
$URLvals_href is a reference to a hash containing values for URL keys. Available keys include:
db
usehistory
WebEnv
query_key
tool
email
term
field
reldate
mindate
maxdate
datetype
retstart
retmax
retmode
rettype
see details above for expected contents.
B<Returns:> $text
$text includes the unparsed data returned from GenBank based on the search request.
=back
=cut
sub ESearch {
my ($ua, $URLvals_href) = @_;
my $base_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?";
my @valid_keys = qw(db
usehistory
WebEnv
query_key
tool
email
term
field
reldate
mindate
maxdate
datetype
retstart
retmax
retmode
rettype);
return (&process_request($ua, $base_url, \@valid_keys, $URLvals_href));
}
#######################################################################################################
=head1 DESCRIPTION
EPost
Updated: July 21, 2003
EPost: Posts a file containing a list of UIs for future use in the user\'s environment to use with subsequent search strategies.
* URL Parameters
o Database
o Record Identifier Retrieval Mode Web Environment Query_key Tool E-mail Address
+ PubMed
# Example
+ Protein, Nucleotide, Structure, Genome, PopSet, OMIM, Taxonomy, Books, ProbeSet, 3D Domains, UniSTS, Domains, SNP, Journals, UniGene
* User System Requirements
* Entrez DTDs
* Demonstration Program
* Announcement Mailing List
* Help Desk
URL parameters:
Utility parameters may be case sensitive, therefore, use lower case characters in all parameters except for WebEnv.
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?
Database:
db=database name
Current database values by category:
Literature databases:
omim - The OMIM database including the collective data is the property of the Johns Hopkins University, which holds the copyright.
pubmed - Journal publishers hold the copyright on the abstracts in PubMed. NLM provides no legal advice concerning distribution of copyrighted materials.
Sequence databases:
genome
nucleotide
protein
popset
*sequences- Composite name including nucleotide, protein, popset and genome.
3D database:
structure
Taxonomy database:
taxonomy
*Not yet available
Record Identifier: UIs required if web environment (i.e., WebEnv=) is not used.
id=11877539,11822933,11871444
Current values:
PubMed ID
MEDLINE UI
GI number
MMDB ID
TaxID
MIM number
Retrieval Mode:
retmode=xml (default)
Note: Use your web browser\'s View Page Source function to display results.
Web Environment: Value previously returned in XML results from ESearch. Web environment is required in place of a primary ID result list.
WebEnv=WgHmIcDG]B\`>> etc.
Query_key: The value used for a history search number or previously returned in XML results from ESearch or EPost.
query_key=6
Note: WebEnv is similar to the cookie that is set on a user\'s computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an EPost URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.
Tool: A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant \'tool\' argument for all requests using the utilities.
tool=
E-mail Address: If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.
email=
PubMed Example:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=pubmed&id=11237011
=cut
=over 4
=item EPost()
B<Description:> Posts a file containing a list of UIs for future use in the user\'s environment to use with subsequent search strategies.
B<Parameters:> ($UserAgent, $URLvals_href)
$UserAgent is a LWP::UserAgent object
$URLvals_href is a reference to a hash containing values for URL keys. Available keys include:
db
id
retmode
WebEnv
query_key
tool
email
see details above for expected contents.
B<Returns:> $text
$text includes the unparsed data returned from GenBank based on the post request.
=back
=cut
####
sub EPost {
my ($ua, $URLvals_href) = @_;
my $base_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?";
my @valid_keys = qw(db
id
retmode
WebEnv
query_key
tool
email);
return (&process_request($ua, $base_url, \@valid_keys, $URLvals_href));
}
;
#################################################################################################################
=head1 DESCRIPTION
ESummary
Last update: July 21, 2003
ESummary: Retreives document Summaries from a list of primary IDs or from the user's environment.
* URL Parameters
o Database
o History Web Environment Query_key Tool E-mail Address
+ PubMed
# Record Identifier Display Numbers Retrieval Mode Sample XML Retrieval Examples
+ Journals
+ Sequence Databases Examples
* User System Requirements
* Entrez Database ESummary Fields
* Entrez DTDs
* Demonstration Program
* Announcement Mailing List
* Help Desk
URL parameters:
Utility parameters may be case sensitive, therefore, use lower case characters in all parameters except for WebEnv.
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?
Database:
db=database name
Current database values: pubmed, protein, nucleotide, structure, genome, pmc, omim, taxonomy, books, probeset, domains, unists, cdd, snp, journals, unigene, popset
omim - The OMIM database including the collective data is the property of the Johns Hopkins University, which holds the copyright.
pubmed - Journal publishers hold the copyright on the abstracts in PubMed. NLM provides no legal advice.
sequence
History: Requests utility to maintain results in History server, used in conjunction with WebEnv.
usehistory=y
Web Environment: Value previously returned in XML results from ESearch and EPost and used with ESummary in place of primary ID result list.
WebEnv=WgHmIcDG]B`>> etc.
Query_key: The value used for a history search number or previously returned in XML results from ESearch or EPost.
query_key=6
Note: WebEnv is similar to the cookie that is set on a user's computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.
Tool: A string with no internal spaces that identifies the resource which is using Entrez links (e.g. tool=igm or tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant 'tool' argument for all requests using the utilities.
tool=
E-mail Address: If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.
email=
PubMed
Record Identifier: Required if WebEnv is not used.
id=12345,92932
Current values:
PubMed ID
MEDLINE UI
Display Numbers: Used when the results from EPost or ESearch are maintained in the user's environment. The maximum number of retrieved records is 10,000.
retstart=x (x= sequential number of the first record retrieved - default=0 which will retrieve the first record)
retmax=y (y= number of records retrieved - default=20)
Retrieval Mode:
retmode=xml
Note: Use your web browser's View Page Source function to display results.
Example:
In PubMed display records for PMIDs 11850928 and 11482001 in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=11850928,11482001&retmode=xml
In Journals display records for journal IDs 27731,439,735,905:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=journals&id=27731,439,735,905
Sequence Databases
Record Identifier: Required if WebEnv is not used.
id=28864546,28800981
Current values:
GI number
MMDB ID (Structure database)
TAX ID (Taxonomy database)
Example:
In Protein display records for GIs 28800982 and 28628843 in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=protein&id=28800982,28628843&retmode=xml
In Nucleotide display records for GIs 28864546 and 28800981 in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=28864546,28800981&retmode=xml
In Structure display records for MMDB IDs 19923 and 12120 in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=structure&id=19923,12120&retmode=xml
In Taxonomy display records for TAXIDs 9913 and 30521 in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=taxonomy&id=9913,30521&retmode=xml
In UniSTS display records for IDs 254085 and 254086 in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=unists&id=254085,254086&retmode=xml
=cut
=over 4
=item ESummary()
B<Description:> Posts a file containing a list of UIs for future use in the user\'s environment to use with subsequent search strategies.
B<Parameters:> ($UserAgent, $URLvals_href)
$UserAgent is a LWP::UserAgent object
$URLvals_href is a reference to a hash containing values for URL keys. Available keys include:
db
usehistory
WebEnv
query_key
tool
email
id
retstart
retmax
retmode
see details above for expected contents.
B<Returns:> $text
$text includes the unparsed data returned from GenBank based on the summary request.
=back
=cut
sub ESummary {
my ($ua, $URLvals_href) = @_;
my $base_url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?";
my @valid_keys = qw(db
usehistory
WebEnv
query_key
tool
email
id
retstart
retmax
retmode
);
return (&process_request($ua, $base_url, \@valid_keys, $URLvals_href));
}
#############################################################################################################
=head1 DESCRIPTION
EFetch Overview
Last updated: July 21, 2003
EFetch: Retrieves records in the requested format from a list of one or more UIs or from user\'s environment. Click on a database below to display database specific documentation.
* URL Parameters
o Database
o Web Environment Query_key Tool E-mail Address
+ PubMed Journals
+ Protein, Nucleotide, Taxonomy, Genome, PopSet
* User System Requirements
* Entrez DTDs
* Demonstration Program
* Announcement Mailing List
* Help Desk
URL parameters:
Utility parameters may be case sensitive, therefore, use lower case characters in all parameters except for WebEnv.
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
EFetch for the Sequence Databases
Last Updated: July 21, 2003
EFetch documenation is also available for the Literature, and Taxonomy databases.
EFetch: Retrieves records in the requested format from a list of one or more unique identifiers.
* URL Parameters
* User System Requirements
* Entrez DTDs
* Demonstration Program
* Announcement Mailing List
URL parameters:
Utility parameters may be case sensitive, therefore, use lower case characters in all parameters except for WebEnv.
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
Database
db=nucleotide
Current database values by category:
Sequence databases:
genome
nucleotide
protein
popset
sequences - Composite name including nucleotide, protein, popset and genome.
Web Environment: History link value previously returned in XML results from ESearch and used with EFetch in place of primary ID result list.
WebEnv=WgHmIcDG], etc.
Query_key: The value used for a history search number or previously returned in XML results from Esearch or EPost.
query_key=6
Note: WebEnv is similar to the cookie that is set on a user's computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.
Tool: A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant 'tool' argument for all requests using the utilities.
tool=
E-mail Address: If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.
email=
B<Sequence Databases>
Record Identifier: IDs required if WebEnv is not used.
id=123,U12345,U12345.1,gb|U12345|
Current values:
NCBI sequence number (GI)
genome ID
accession
accession.version
fasta
seqid
Display Numbers:
retstart=x (x= sequential number of the first id retrieved - default=0 which will retrieve the first record)
retmax=y (y= number of items retrieved)
Sequence Strand, Start, Stop and Complexity Parameters
strand= what strand of DNA to show (1=plus or 2=minus)
seq_start= show sequence starting from this base number
seq_stop= show sequence ending on this base number
complexity= gi is often a part of a biological blob, containing other gis
complexity regulates the display:
0 - get the whole blob
1 - get the bioseq for gi of interest (default in Entrez)
2 - get the minimal bioseq-set containing the gi of interest
3 - get the minimal nuc-prot containing the gi of interest
4 - get the minimal pub-set containing the gi of interest
Retrieval Mode:
retmode=output format
Current values:
xml
html
text
asn.1
Retrieval Type:
rettype=output types based on database
Current values:
native (full record)
fasta
gb
gbwithparts
est
gss
gp
uilist
Type descriptions:
native Default format for viewing sequences
fasta FASTA view of a sequence
gb GenBank view for sequences, constructed sequences will be shown as contigs (by pointing to its parts). Valid for nucleotides.
gbwithparts GenBank view for sequences, the sequence will always be shown. Valid for nucleotides
est EST Report. Valid for sequences from dbEST database.
gss GSS Report. Valid for sequences from dbGSS database.
gp GenPept view. Valid for proteins.
seqid To convert list of gis into list of seqids
acc To convert list of gis into list of accessions
Not all Retrieval Modes are possible with all Retrieval Types.
Sequence Options:
native fasta gb gbwithparts est gss gp seqid acc
xml x x* x* TBI TBI TBI x* TBI TBI
text x x x* x* x* x* x* x x
html x x x* x* x* x* x* x x
asn.1 x n/a n/a n/a n/a n/a n/a x n/a
x = retrieval mode available
* - existence of the mode depends on gi type
TBI - to be implemented (not yet available)
n/a - not available
Examples:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&complexity=0&rettype=fasta
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb&seq_start=1&seq_stop=9
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.cgi?db=nucleotide&id=5&rettype=fasta&seq_start=1&seq_stop=9&strand=2
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=popset&id=12829836&rettype=gp
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp
Entrez display format GBSeqXML:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb&retmode=xml
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=8&rettype=gp&retmode=xml
Entrez display format TinySeqXML:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=fasta&retmode=xml
B<EFetch for Literature Databases>
Last updated: July 21, 2003
EFetch: Retrieves records in the requested format from a list of one or more UIs or the user's environment.
EFetch documentation is also available for the Sequence, and Taxonomy databases.
* URL Parameters
o Literature Databases
o Web Environment Query_key Tool E-mail Address
+ PubMed
# Record Identifier Display Numbers Retrieval Mode Retrieval Type PubMed Retrieval Options Examples
+ Journals
* User System Requirements
* Entrez DTDs
* Demonstration Program
* Announcement Mailing List
* Help Desk
URL parameters:
Utility parameters may be case sensitive, therefore, use lower case characters in all parameters except for WebEnv.
Base URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
Literature Database
db=pubmed
pubmed - Journal publishers hold the copyright on the abstracts in PubMed. NLM provides no legal advice concerning distribution of copyrighted materials.
journals
Web Environment: Value previously returned in XML results from ESearch and EPost and used with EFetch in place of a primary UI result list.
WebEnv=WgHmIcDG], etc.
Query_key: The value used for a history search number or previously returned in XML results from ESearch or EPost.
query_key=6
Note: WebEnv is similar to the cookie that is set on a user's computers when accessing PubMed on the web. If the parameter usehistory=y is included in an ESearch URL both a WebEnv (cookie string) and query_key (history number) values will be returned in the results. Rather then using the retrieved PMIDs in an ESummary URL you may simply use the WebEnv and query_key values to retrieve the records. WebEnv will change for each ESearch query.
Tool: A string with no internal spaces that identifies the resource which is using Entrez links (e.g., tool=flybase). This argument is used to help NCBI provide better service to third parties generating Entrez queries from programs. As with any query system, it is sometimes possible to ask the same question different ways, with different effects on performance. NCBI requests that developers sending batch requests include a constant 'tool' argument for all requests using the utilities.
tool=
E-mail Address: If you choose to provide an email address, we will use it to contact you if there are problems with your queries or if we are changing software interfaces that might specifically affect your requests. If you choose not to include an email address we cannot provide specific help to you, but you can still sign up for utilities-announce to receive general announcements.
email=
PubMed
Record Identifier: UIs required if WebEnv is not used.
id=11877539, 11822933,11871444
Current values:
PubMed ID
MEDLINE UI
Display Numbers: Used when the results from EPost or ESearch are maintained in the user's environment. The maximum number of retrieved records is 10,000.
retstart=x (x= sequential number of the first id retrieved - default=0 which will retrieve the first record)
retmax=y (y= number of items retrieved - default=20)
Retrieval Mode:
retmode=output format
Current values:
xml
html
text
asn.1
Use your web browser's View Page Source function to display results in xml retrieval mode.
Retrieval Type:
rettype=output types based on database
Current values:
uilist
abstract
citation
medline
full (journals only)
Not all Retrieval Modes are possible with all Retrieval Types.
PubMed Options:
uilist abstract citation medline
xml x x* x* x*
text x x x x
html x x x x
asn.1 n/a x* x* x
x = retrieval mode available
*returned retrieval type is the complete record in the retrieval mode
n/a - not available
Examples:
In PubMed display PMIDs 12091962 and 9997 in html retrieval mode and abstract retrieval type:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=12345,9997&retmode=html&rettype=abstract
In PubMed display PMIDs from history statement in html retrieval mode and medline retrieval type (where x is replaced by WebEnv and query_key values):
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&WebEnv=xxxx&query_key=x&retmode=html&rettype=medline
In PubMed display PMIDs in xml retrieval mode:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11748933,11700088&retmode=xml
In Journals display records for journal IDs 22682,21698,1490:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=journals&id=22682,21698,1490&rettype=full
=cut
=over 4
=item EFetch()
B<Description:> Retrieves records in the requested format from a list of one or more UIs or from user\'s environment.
B<Parameters:> ($UserAgent, $URLvals_href)
$UserAgent is a LWP::UserAgent object
$URLvals_href is a reference to a hash containing values for URL keys. Available keys include:
db
WebEnv
query_key
tool
email
id
retstart
retmax
strand
seq_start
seq_stop
complexity
retmode
rettype
see details above for expected contents.
B<Returns:> $text
$text includes the unparsed data returned from GenBank based on the fetch request.
=back
=cut
sub EFetch {
my ($ua, $URLvals_href) = @_;