Releases: GW-HIVE/filtered_nt
Filtered_NT v7.0
What's Changed
- Initialized RTD Dependencies by @AIbrahimv2 in #1
- Created html file by @AIbrahimv2 in #2
- Created docstrings by @AIbrahimv2 in #3
- Completed Docstrings by @AIbrahimv2 in #4
- Configured index.rst by @AIbrahimv2 in #5
- Python 2 -> Python 3 (#6) by @HadleyKing in #9
- Filtered nt v7.0 by @HadleyKing in #12
Full Changelog: 6.0...7.0
Filtered_NT v6.0
Filtered_NTv6.0-release notes
Downloaded Files
-
nt file downloaded on 5/28/2018
ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/48,023,477 sequences
-
names.dump downloaded on 6/20/2018
ftp://ftp.ncbi.nih.gov/pub/taxonomy/2,650,458 names 1,787,228 scientific name
-
ac2taxid files
ftp://ftp.ncbi.nih.gov/pub/taxonomy/#records file name
40,230,716 nucl_gss.accession2taxid
133,572,323 nucl_gb.accession2taxid
76,986,031 nucl_est.accession2taxid
422,080,996 nucl_wgs.accession2taxid
15,312,524 dead_nucl.accession2taxid
72,418,230 dead_prot.accession2taxid
66,864,528 dead_wgs.accession2taxid
76,986,031 nucl_est.accession2taxid
133,572,323 nucl_gb.accession2taxid
40,230,716 nucl_gss.accession2taxid
422,080,996 nucl_wgs.accession2taxid
417,656 pdb.accession2taxid
511,384,936 prot.accession2taxid
Filter statistics
Number of unique taxonomy ids that are in black list is 59,765.
Sequences from a given black list of sources were removed. This list of sources, number of
associated taxonomic IDs and number of removed sequences is given below.
blackListTaxonomyName #taxids #removed sequences
===================== ======= ====================
unidentified 17 5237
uncultured 2 27936
unknown 0 0
unspecified 1 1
unclassified 45047 751699
other sequence 14533 26903
phage 164 1078
environmental sample 0 0
vector 1 1
==================== ======== ==========
total 59,795 812,855
The number of sequences in this Filtered_NT release is 47,210,622.