Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dl_read_d1_data() function #93

Closed
wants to merge 1 commit into from
Closed

Add dl_read_d1_data() function #93

wants to merge 1 commit into from

Conversation

drkrynstrng
Copy link
Contributor

I added a convenience function dl_read_d1_data() to both download and read data, while keeping the original functions. It's a simple wrapper around those functions. Fixes #92. We may want to think of a better function name though, and we will probably want a similar function to download and read all data in a package.

@brunj7
Copy link
Collaborator

brunj7 commented Nov 20, 2018

Hi @drkrynstrng ,

Thank you for the PR! The wrapper to directly ingest the data into R is a great idea; but before implementing it, we should look into how to use the DataONE file format type to trigger an appropriate function to read the data. In other words make read_d1_data() smarter and able to select an appropriate import function.

Another approach could be to split the wrapper functions into read_d1_tabular, read_d1_raster, ...

Let me know if you want to chat more about that!

@drkrynstrng
Copy link
Contributor Author

Ah, right. Yeah, it makes sense to figure that out first. File formats could be identified either from file extensions or system metadata, and then we could map formats to import functions using if...else statements. The code for the arcticdatautils::guess_format_id() function would be useful here: https://github.com/NCEAS/arcticdatautils/blob/master/R/formats.R

@mbjones
Copy link
Member

mbjones commented Nov 21, 2018

A dispatch method would best use the most specific information for deciding which loading function to use. The DataONE FormatId is the most specific, and should be preferred over mime type and file extension, which could be used as a fallback. Here's a small excerpt from the format list, which shows how formatId helps differentiate versions that share the same extension:

    <objectFormat>
        <formatId>CF-1.2</formatId>
        <formatName>NetCDF Climate and Forecast Metadata Convention, version 1.2</formatName>
        <formatType>DATA</formatType>
        <mediaType name="application/netcdf"/>
        <extension>nc</extension>
    </objectFormat>
    <objectFormat>
        <formatId>CF-1.3</formatId>
        <formatName>NetCDF Climate and Forecast Metadata Convention, version 1.3</formatName>
        <formatType>DATA</formatType>
        <mediaType name="application/netcdf"/>
        <extension>nc</extension>
    </objectFormat>
    <objectFormat>
        <formatId>CF-1.4</formatId>
        <formatName>NetCDF Climate and Forecast Metadata Convention, version 1.4</formatName>
        <formatType>DATA</formatType>
        <mediaType name="application/netcdf"/>
        <extension>nc</extension>
    </objectFormat>
    <objectFormat>
        <formatId>application/MATLAB-v7.3</formatId>
        <formatName>Mathworks MATLAB version 7.3 (R2006b or later) binary file - HDF5 compatible</formatName>
        <formatType>DATA</formatType>
        <mediaType name="application/octet-stream"/>
        <extension>mat</extension>
    </objectFormat>
    <objectFormat>
        <formatId>application/MATLAB-v7</formatId>
        <formatName>Mathworks MATLAB version 7 (R14 or later) binary file</formatName>
        <formatType>DATA</formatType>
        <mediaType name="application/octet-stream"/>
        <extension>mat</extension>
    </objectFormat>
    <objectFormat>
        <formatId>application/MATLAB-v6</formatId>
        <formatName>Mathworks MATLAB version 6 (R8 or later) binary file</formatName>
        <formatType>DATA</formatType>
        <mediaType name="application/octet-stream"/>
        <extension>mat</extension>
    </objectFormat>

I also spoke to @brunj7 about how a cast method like this might be best done in its own package so that it can be reused in many places. We could refactor later, but let's chat about the best route forward at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants