-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine warnings on get_gbif_taxonomy #39
Comments
e.g. here is a place I would want a "louder" warning: I search "Leioproctus carinatus" (a valid species name) and the selected sp. was "capillatus," which I believe is a separate but also valid species name. This is different from "Hoplitis rubicrus" which matches with itself... the warning is still helpful but I'm less concerned in the case that the identical species name is matched with itself than if a totally different species matches :-) |
Thanks for the feedback. More specific and louder warnings as well as possibilities to interact directly with the function would be great. As you probably figured, the problem is caused by fuzzy matching producing a match with the wrong valid taxon. If using option I considered switching off fuzzy matching by default, but misspellings are very frequent in data and would not be addressed otherwise (see #38). The function |
Oh, I definitely think fuzzy matching is desirable (That's why I'm using your function!). I just think it would be nice to distinguish the two cases as when i do this in a minute with 600,000 rows of hand-entered data, I'm going to have a lot of warnings but really want to focus on the ones that are likely going to mess me up. Thanks! |
Another place the warnings could be clearer: I have a mix of valid and invalid names that are receiving a mix of these two warnings: |
Loving get_gbif_taxonomy so far.
I just ran it on a list my colleagues maintain of about 20,000 "valid" names of hymenopteran species. About 16 came back with the warning " Selected first of multiple equally ranked concepts!". Of these, the majority meet the following condition:
scientificName == scientificNameStd.
However, the ones that do not (at the treshold I used) seem likely to be mis-matched. It would be super helpful, I think, to provide a different warning on these two cases, as when going through and manually checking results, it's great to have warnings in cases where the automation probably worked, but it's also nice to be able to focus easily on the ones most likely to be a problem.Thanks!
The text was updated successfully, but these errors were encountered: