Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate ways to reduce dischord between taxonomies in different databases #144

Open
ColinKhoury opened this issue Apr 26, 2018 · 1 comment

Comments

@ColinKhoury
Copy link
Collaborator

For each database we normalized species names against the GBIF Taxonomic Backbone, using the GBIF Species Lookup Tool (GBIF 2018c) and the GBIF Species API v1 (GBIF 2018b). A 96.0% match was found between GBIF Taxonomic Backbone and WEP, and a 93.3% match was found between the Backbone and the GENESYS and the Global Crop Wild Relative Occurrence Database.

We should explore ways to fix taxonomy so that matches are as close to 100% as possible. TNRS?

@danipilze
Copy link
Contributor

danipilze commented Apr 27, 2018

Sure, but please take in count we'll never reach the 100% since not everything in the taxonomy field can be verifiable as a valid taxonomy or even a word that looks like a taxonomy.

Here there are some highlight of the unmatched taxonomy:

image

image

Also, in the unmatched count I'm putting the matches that wasn't on species level (or lower), so the records document only to genus (or higher) level are counted there.

  • personal note: Maybe I should clarify that for the paper.

  • personal note: Also, I should try to query the API twice when we are queering a hybrids, since they may be starting the specific epithet sometimes with 'x' or with the '×' sign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants