Skip to content

Releases: GW-HIVE/filtered_nt

Filtered_NT v7.0

09 Aug 00:55
Compare
Choose a tag to compare

What's Changed

Full Changelog: 6.0...7.0

Filtered_NT v6.0

22 Jul 19:59
Compare
Choose a tag to compare

Filtered_NTv6.0-release notes

Downloaded Files

  1. nt file downloaded on 5/28/2018
    ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/

    48,023,477 sequences

  2. names.dump downloaded on 6/20/2018
    ftp://ftp.ncbi.nih.gov/pub/taxonomy/

     2,650,458	names
     1,787,228	scientific name
    
  3. ac2taxid files
    ftp://ftp.ncbi.nih.gov/pub/taxonomy/

    #records file name
    40,230,716 nucl_gss.accession2taxid
    133,572,323 nucl_gb.accession2taxid
    76,986,031 nucl_est.accession2taxid
    422,080,996 nucl_wgs.accession2taxid
    15,312,524 dead_nucl.accession2taxid
    72,418,230 dead_prot.accession2taxid
    66,864,528 dead_wgs.accession2taxid
    76,986,031 nucl_est.accession2taxid
    133,572,323 nucl_gb.accession2taxid
    40,230,716 nucl_gss.accession2taxid
    422,080,996 nucl_wgs.accession2taxid
    417,656 pdb.accession2taxid
    511,384,936 prot.accession2taxid

Filter statistics

Number of unique taxonomy ids that are in black list is 59,765.

Sequences from a given black list of sources were removed. This list of sources, number of
associated taxonomic IDs and number of removed sequences is given below.

blackListTaxonomyName #taxids #removed sequences
    ===================== ======= ====================
    unidentified 		   17 	  5237
    uncultured 		    2    27936
    unknown 		    0 	     0
    unspecified 		    1        1 
    unclassified 		45047   751699
    other sequence 		14533 	 26903
    phage 			  164 	  1078
    environmental sample 	    0 	     0 
    vector                      1        1 
    ==================== ======== ==========
    total                  59,795   812,855

The number of sequences in this Filtered_NT release is 47,210,622.