Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

[Experimental] Add sample datapackage.jsonld #101

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rht
Copy link

@rht rht commented Jan 16, 2017

  • Contains build script for reproducibility
  • resources is normalized
  • Largely based on datapackage.json
  • Pending: add schema for the data skeleton within the file itself
  • arxiv.org's buildScript still pending (cc @davidar)

Closes: #86, #18, #32, #35

"References": {
  "npm": {
    "spec": "https://docs.npmjs.com/files/package.json",
    "example": "https://github.com/npm/npm/blob/latest/package.json" },
  "ipfs/archives": "https://github.com/ipfs/archives/issues/45",
  "cgtd": {
    "spec": "",
    "example": "https://github.com/ga4gh/cgtd/blob/master/tests/ALL/ALL-US.json" },
  "frictionlessdata": {
    "spec": "http://specs.frictionlessdata.io/data-packages/",
    "example": "https://github.com/datasets/gdp/blob/master/datapackage.json" },
  "json-schema": {
    "spec": "http://json-schema.org/documentation.html",
    "example": "http://json-schema.org/example2.html" },
  "nix": {
    "spec": "http://nixos.org/nix/manual/#ch-expression-language",
    "example": "https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/package-management/nix/default.nix" },
  "dat": {
    "spec": "http://docs.datproject.org/",
    "example": "" }
}

@rht
Copy link
Author

rht commented Jan 16, 2017

Since github.com/cdnjs/cdnjs is a code repo, #82 is re-added via gx.

rm -r archive.org
",
"size": "1 GB",
"resources": {
Copy link
Author

@rht rht Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"resources" is the nomenclature used in datapackage.json.

@@ -0,0 +1,38 @@
{
"name": "scholarpedia.org",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUIRED in all of the references.

@rht
Copy link
Author

rht commented Jan 16, 2017

The most effective way to find the common ground, I think, is by packaging datasets that have already been packaged in these various standards and see which choice covers the most cases.

@rht
Copy link
Author

rht commented Jan 17, 2017

@flyingzumwalt since you're the captain of this repo, RFCR? Is this not in the intended direction?

@rht
Copy link
Author

rht commented Jan 17, 2017

I put 2 version of manifest file. One is the output of ipfs add (as shown in the example output in ipfs/notes#205 (comment)), the other is in ipld format inside the the "resources" field.

@flyingzumwalt
Copy link
Contributor

All of the issues you're closing pre-date my involvement. I will have to do a bit of reading in order to provide comments. That will have to wait until after the data.gov sprint. I'm currently maxed out dealing with that work.

@flyingzumwalt flyingzumwalt self-requested a review January 17, 2017 17:06
@rht
Copy link
Author

rht commented Jan 17, 2017

This PR packages the data with the scale of 10e5 orders of magnitude smaller than data.gov. Delivering data.gov depends on having the manifest/datapackage.json/packfile implemented, which can be done in parallel with smaller datasets.

@rht
Copy link
Author

rht commented Jan 17, 2017

The issues this PR close basically contain which datasets have been published to ipfs. The main concern is the datapackage.json format and the packmanifest format. I don't know how to make this simpler to put: I decouple the task of having to spec these format from having to prepare a 300TB data in the first place.

@eminence
Copy link
Collaborator

Just as a comment, here is the datapackage.json file I put together for the RFC archive:

{
    "last-synch": "2016-12-27T16:04:20.322511", 
    "name": "ipfs-ietf-rfc-archive", 
    "license": {
        "url": "http://trustee.ietf.org/license-info/IETF-TLP-4.pdf", 
        "type": "other-open"
    }, 
    "title": "IETF RFC Archive", 
    "sources": [
        {
            "web": "https://www.rfc-editor.org/retrieve/", 
            "name": "RFC Editor"
        }
    ], 
    "resources": [
        {
            "path": "rfc-index.txt", 
            "name": "rfc-index"
        }, 
        {
            "path": "rfc-data/", 
            "name": "rfc-data"
        }, 
        {
            "path": "update.py", 
            "name": "update-script"
        }
    ], 
    "ipfs-github-issue": "https://github.com/ipfs/archives/issues/18"
}

I based my format on http://specs.frictionlessdata.io/data-package/.

What are your thoughts about this format, or the differences between this and your proposal?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants