You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Importing (big) occurrence downloads in R means fighting constantly against parsing failures. This is due to the fact that some fields have NAs in the first rows, sometimes hundreds of thousands.
One trick is to increase the number of rows R uses to guess type (parameter guess_max in read_delim() function). However, if the number of rows with NA is very high, parsing failures have to be solved by defining the type you expect to get. Doing it everytime for each file is time consumming. My idea is to write the specifications for each file occurrence data field. They are 237, as far my experience with occurrence downloads says to me. I already made a list of almost 90 fields few days ago. I put them together in a gist: https://gist.github.com/damianooldoni/01da78e5e55617798804db1804434754. I know, it's boring (very boring!) but it saves time in the future. @peterdesmet : What do you think about putting it in trias package?
The text was updated successfully, but these errors were encountered:
Adding it as such to TrIAS is one option, since you're almost there. Or, you have a look at finch, which is an R package for reading Darwin Core files. Maybe it is already implemented there and if not, that might be a nice addition.
Based on discussion in ropensci-archive/finch/issues/25, I would add the parsing types as R file in TrIAS package at the moment. What do you think? I will do it after PR #21 is done.
Importing (big) occurrence downloads in R means fighting constantly against parsing failures. This is due to the fact that some fields have NAs in the first rows, sometimes hundreds of thousands.
One trick is to increase the number of rows R uses to guess type (parameter
guess_max
inread_delim()
function). However, if the number of rows with NA is very high, parsing failures have to be solved by defining the type you expect to get. Doing it everytime for each file is time consumming. My idea is to write the specifications for each file occurrence data field. They are 237, as far my experience with occurrence downloads says to me. I already made a list of almost 90 fields few days ago. I put them together in a gist: https://gist.github.com/damianooldoni/01da78e5e55617798804db1804434754. I know, it's boring (very boring!) but it saves time in the future.@peterdesmet : What do you think about putting it in trias package?
The text was updated successfully, but these errors were encountered: