Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement JSON-LD Integration Across Services #844

Open
6 tasks done
maxyzli opened this issue Nov 1, 2024 · 4 comments · May be fixed by #847
Open
6 tasks done

Implement JSON-LD Integration Across Services #844

maxyzli opened this issue Nov 1, 2024 · 4 comments · May be fixed by #847
Assignees
Labels
enhancement New feature or request

Comments

@maxyzli
Copy link
Collaborator

maxyzli commented Nov 1, 2024

Library Service:

  • Introduce and implement the JSON-LD schema.

Index Service:

  • Test: ensure that the JSON-LD validation at the /validate route functions correctly.
  • Test: verify that parsing JSON-LD data does not alter the profile hash.

DataProxy Services:

  • Process incoming data into JSON-LD format.
  • Validate JSON-LD data.
  • Store validated data and send to Index.
@maxyzli maxyzli self-assigned this Nov 1, 2024
@geoffturk geoffturk added the enhancement New feature or request label Nov 1, 2024
@geoffturk geoffturk moved this from Backlog to To do in Murmurations Development Nov 1, 2024
@geoffturk
Copy link
Member

geoffturk commented Nov 1, 2024

Murm Services currently takes in data from the following sources:

  1. KVM via their API
  2. CSV files via an import from Tools

That data is then transformed into JSON format and validated against the schema(s) linked to it, and then stored in the dataproxy.

For example, using the people_schema-v0.1.0 schema, the following data:

name,geolocation.lat,geolocation.lon
Geoff Turk,48.88111,2.38296

... would be transformed into the following JSON:

{
  "linked_schemas": ["people_schema-v0.1.0"],
  "name": "Geoff Turk",
  "geolocation": {
    "lat": 48.88111,
    "lon": 2.38296
  }
}

... and validated against the schema.

We want to take that JSON output of the profile and transform it into JSON-LD format:

{
  "@context": "https://library.murmurations.network/jsonld/people_schema.jsonld",
  "@type": "Person",
  "name": "Geoff Turk",
  "geolocation": {
    "@type": "GeoCoordinates",
    "lat": 48.88111,
    "lon": 2.38296
  },
  "linked_schemas": ["people_schema-v0.1.0"]
}

Rather than repeatedly including the JSON-LD context in each profile, we want to link to the context from the library.

https://library.murmurations.network/jsonld/people_schema.jsonld

{
    "@context": {
      "@vocab": "https://schema.org/",
      "murm": "https://murmurations.network/ns/",
      "name": "name",
      "geolocation": "location",
      "lat": "latitude",
      "lon": "longitude",
      "linked_schemas": "murm:linkedSchemas"
  }
}

Note the above context is just an example. The full context still needs to be mapped out and added to the library.

One thing to point out is that the @type field is not included in the context, but is instead included in the profile data. This is because the @type field has to be located within the data file; it can't be stored in the context.

To work around this, we can embed the type in the metadata of the schema or field that requires a @type declaration. For example, in the people_schema-v0.1.0 schema, the @type field can be declared as:

{
  "metadata": {
    "@type": "Person",
    "@context": "https://library.murmurations.network/jsonld/people_schema.jsonld",
    "schema": {
      "name": "people_schema-v0.1.0",
      ... etc ...
    }
  }
}

And the geolocation field can be declared as:

{
  "metadata": {
    "@type": "GeoCoordinates",
    "field": {
      "name": "geolocation",
      "version": "1.0.0"
    },
    ... etc ...
  }
}

There would need to be a preprocessing step to add the @type field to the profile data based on the metadata. Note also the inclusion of the @context field in the schema metadata, which would also need to be added to the profile during processing.

This is just one possible way to implement the JSON-LD transformation. It's worth exploring other options to see if there are any better approaches.

@maxyzli
Copy link
Collaborator Author

maxyzli commented Nov 7, 2024

Based on the discussed content, the designed steps are as follows:

  1. Add a JSON-LD route to the library and use the same schemaparser to import JSON-LD files into the Library Service. Ideally, each schema will have a corresponding JSON-LD.

  2. Place the JSON-LD information into the metadata of the existing schemas.

  3. Generate the JSON-LD obtained from CSV and KVM files using the context and type from the schema’s metadata. Then store it within the profile.

  4. Ensure that the JSON-LD files can be read normally when placed in the index. Data tagging needs to be cleared so that the hash values remain unchanged.

@maxyzli
Copy link
Collaborator Author

maxyzli commented Nov 8, 2024

This is just one possible way to implement the JSON-LD transformation. It's worth exploring other options to see if there are any better approaches.

Question: If schema.org doesn’t provide the fields we need, such as murm:linkedSchemas, which we plan to place under the domain https://murmurations.network/ns/, should we create a separate repository to manage this? Or do you have another approach in mind for implementing this?

We will need to add a context.jsonld file at https://murmurations.network/ns/context.jsonld. Here is an example:

{
  "@context": {
    "@vocab": "https://schema.org/",
    "murm": "https://murmurations.network/ns/",
    "linkedSchemas": {
      "@id": "murm:linkedSchemas",
      "@type": "@id",
      "description": "A list of schemas against which a profile must be validated (schema names must be alphanumeric with underscore(_) spacers and dash(-) semantic version separator, e.g., my_data_schema-v1.0.0)"
    }
  }
}

@geoffturk
Copy link
Member

should we create a separate repository to manage this?

My current thinking is that we should separate the context from validation, so yes, we should create a separate repo for context-related info, which can be deployed independently from changes to validation parameters. I'll set this up once I get feedback from @olisb on what to name it (either with a third level domain or something in the path of the root domain).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

2 participants