Specify type columns occurrence data beforehand #25

damianooldoni · 2018-12-07T11:15:47Z

Importing (big) occurrence downloads in R means fighting constantly against parsing failures. This is due to the fact that some fields have NAs in the first rows, sometimes hundreds of thousands.
One trick is to increase the number of rows R uses to guess type (parameter guess_max in read_delim() function). However, if the number of rows with NA is very high, parsing failures have to be solved by defining the type you expect to get. Doing it everytime for each file is time consumming. My idea is to write the specifications for each file occurrence data field. They are 237, as far my experience with occurrence downloads says to me. I already made a list of almost 90 fields few days ago. I put them together in a gist: https://gist.github.com/damianooldoni/01da78e5e55617798804db1804434754. I know, it's boring (very boring!) but it saves time in the future.
@peterdesmet : What do you think about putting it in trias package?

The text was updated successfully, but these errors were encountered:

peterdesmet · 2018-12-08T20:10:56Z

Adding it as such to TrIAS is one option, since you're almost there. Or, you have a look at finch, which is an R package for reading Darwin Core files. Maybe it is already implemented there and if not, that might be a nice addition.

damianooldoni · 2018-12-08T23:09:44Z

Nice! Thanks. I will get a look and I will let you know.

damianooldoni · 2018-12-13T13:48:08Z

Based on discussion in ropensci-archive/finch/issues/25, I would add the parsing types as R file in TrIAS package at the moment. What do you think? I will do it after PR #21 is done.

peterdesmet · 2018-12-18T08:03:13Z

Ok. Or maybe easiest to read all columns as text and only recast when necessary?

damianooldoni mentioned this issue Dec 13, 2018

Parsing occurrence text files in DwC archive ropensci-archive/finch#25

Closed

damianooldoni self-assigned this Dec 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify type columns occurrence data beforehand #25

Specify type columns occurrence data beforehand #25

damianooldoni commented Dec 7, 2018 •

edited

Loading

peterdesmet commented Dec 8, 2018

damianooldoni commented Dec 8, 2018

damianooldoni commented Dec 13, 2018

peterdesmet commented Dec 18, 2018

Specify type columns occurrence data beforehand #25

Specify type columns occurrence data beforehand #25

Comments

damianooldoni commented Dec 7, 2018 • edited Loading

peterdesmet commented Dec 8, 2018

damianooldoni commented Dec 8, 2018

damianooldoni commented Dec 13, 2018

peterdesmet commented Dec 18, 2018

damianooldoni commented Dec 7, 2018 •

edited

Loading