Skip to content

3. Loading Data

Paula Gearon edited this page Jun 30, 2021 · 7 revisions

Data can be inserted into Asami using either entities or statements.

Entities

Entities are just the name given to objects that are defined as maps of keys to values. This is a common format for data in JSON or EDN formats, which is often available in files or from data APIs.

EDN

Native EDN data can be loaded as a single object, or a sequence of objects.

As an example, consider the following EDN file named data.edn:

[{:id "bennet"
  :type "family"
  :name "Bennet"
  :children [{:name "Jane"}
             {:name "Elizabeth"}
             {:name "Mary"}
             {:name "Catherine"}
             {:name "Lydia"}]}
 {:id "bingley"
  :type "family"
  :name "Bingley"
  :children [{:name "Charles"}
             {:name "Caroline"}
             {:name "Louisa" :surname "Hurst"}]}
 {:id "fitzwilliam"
  :type "family"
  :name "Fitzwilliam"
  :children [{:name "Catherine" :surname "de Bourgh"}
             {:name "Anne" :surname "Darcy"}]}]

This can be loaded by parsing the file as EDN, and transacting it as :tx-data. As it is a small file, a call to clojure.core/slurp is one way to load the file as a string, where it can be parsed and inserted.

(require '[asami.core :as d])
(require '[clojure.edn :as edn])
(def data (edn/read-string (slurp "data.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-data data}))

JSON

JSON files can be loaded in the same way as EDN, typically converting keys to Clojure keywords automatically. The JSON equivalent to the data above would be:

[{"id": "bennet",
  "type": "family",
  "name": "Bennet",
  "children": [{"name": "Jane"},
               {"name": "Elizabeth"},
               {"name": "Mary"},
               {"name": "Catherine"},
               {"name": "Lydia"}]},
 {"id": "bingley",
  "type": "family",
  "name": "Bingley",
  "children": [{"name": "Charles"},
               {"name": "Caroline"},
               {"name": "Louisa", "surname": "Hurst"}]}
 {"id": "Fitzwilliam",
  "type": "family",
  "name": "Fitzwilliam",
  "children": [{"name": "Catherine", "surname": "de Bourgh"},
               {"name": "Anne", "surname": "Darcy"}]}]

Both EDN and JSON can be parsed directly from a file instead of as a string, and this is faster and more memory efficient. This example loads JSON data using the Cheshire JSON library:

(require '[asami.core :as d])
(require '[cheshire.core :as json]) 
(require '[clojure.java.io :as io]) 
(def data (json/parse-stream (io/reader "data.json") true))
(d/transact conn {:tx-data data})) 

Raw JSON

Note that the JSON example above used a true parameter when parsing a stream. This automatically converts keys into keywords. While this is preferred, Asami can also work with raw strings as keywords. This may be necessary if some of the strings contain characters that are not legal in Clojure keywords. For instance, the file page.json contains:

{ "@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
  "@type": "sc:Canvas",
  "height": 3223,
  "images": [
    { "@id": "https://www.loc.gov/resource/dcmsiabooks.prideprejudice00aust_5/seq-25/",
      "@type": "oa:Annotation",
      "motivation": "sc:painting",
      "on": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
      "resource": {
        "@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025/full/pct:100/0/default.jpg",
        "@type": "dctypes:Image",
        "format": "image/jpeg",
        "height": 3223,
        "service": {
          "@context": "http://iiif.io/api/image/2/context.json",
          "@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
          "profile": "http://iiif.io/api/image/2/level2.json"},
        "width": 2040}}],
  "label": "Page 25",
  "metadata": [{"label": "Library of Congress Resource URL",
                "value": "https://www.loc.gov/resource/dcmsiabooks.prideprejudice00aust_5/?sp=25"}],
  "related": "https://www.loc.gov/resource/dcmsiabooks.prideprejudice00aust_5/?sp=25",
  "service": {
    "@context": "http://iiif.io/api/image/2/context.json",
    "@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025",
    "profile": "http://iiif.io/api/image/2/level2.json"},
  "thumbnail": {
    "@id": "https://tile.loc.gov/image-services/iiif/service:gdc:dcmsiabooks:pr:id:ep:re:ju:di:ce:00:au:st:_5:prideprejudice00aust_5:prideprejudice00aust_5_0025/full/pct:12.5/0/default.jpg",
    "format": "image/jpeg",
    "height": 402,
    "width": 255},
  "width": 2040}

This can be loaded the same as before, except without converting keys to keywords:

(require '[asami.core :as d])
(require '[cheshire.core :as json]) 
(require '[clojure.java.io :as io]) 
(def data (json/parse-stream (io/reader "page.json")))
(def conn (d/connect "asami:mem://page"))
(d/transact conn {:tx-data data})) 

Note that this file contained a single object rather than a sequence. This is still valid and is treated as a sequence containing a single entity.

The resulting data now uses strings as attributes, which will change the format of queries. For instance, to ask the above data for the height and width of all images:

(d/q '[:find ?height ?width
       :where
       [?image "format" "image/jpeg"]
       [?image "height" ?height]
       [?image "width" ?width]]
       conn)

Statements

Asami statements can be represented directly as :db/add operations. These can also appear in a transaction sequence.

For instance, this data structure has 2 entities which refer to the same 3rd entity:

[{:id "Charles"
  :name "Charles"
  :home {:id "scarborough"
         :town "Scarborough"
         :county "Yorkshire"}}
 {:id "jane"
  :name "Jane"
  :home {:id "scarborough"}}]

This can be represented by adding the following statements:

[[:db/add :tg/node-1000 :id "charles"]
 [:db/add :tg/node-1000 :name "Charles"]
 [:db/add :tg/node-1001 :id "scarborough"]
 [:db/add :tg/node-1001 :town "Scarborough"]
 [:db/add :tg/node-1001 :county "Yorkshire"]
 [:db/add :tg/node-1000 :home :tg/node-1001]
 [:db/add :tg/node-1002 :id "jane"]
 [:db/add :tg/node-1002 :name "Jane"]
 [:db/add :tg/node-1002 :home :tg/node-1001]]

This uses 3 keywords to represent the objects.

It is also possible to mix these statements with entities:

[[:db/add :tg/node-1000 :id "charles"]
 [:db/add :tg/node-1000 :name "Charles"]
 [:db/add :tg/node-1001 :id "scarborough"]
 [:db/add :tg/node-1001 :town "Scarborough"]
 [:db/add :tg/node-1001 :county "Yorkshire"]
 [:db/add :tg/node-1000 :home :tg/node-1001]
 [:db/add :tg/node-1002 :id "jane"]
 [:db/add :tg/node-1002 :name "Jane"]
 [:db/add :tg/node-1002 :home :tg/node-1001]
 {:id "Elizabeth" :sister {:id "jane"}}]

Loading a file containing a sequence like this can be done by:

(require '[asami.core :as d])
(require '[clojure.edn :as edn])
(require '[clojure.java.io :as io]) 
(def data (edn/read (io/reader "adds.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-data data}))

Triples

An alternative to :db/add statements is when all the data is in a triple form. In this case, they can be sent directly using :tx-triples instead of :tx-data.

The above data in triple form would appear as:

[[:tg/node-1000 :id "charles"]
 [:tg/node-1000 :name "Charles"]
 [:tg/node-1001 :id "scarborough"]
 [:tg/node-1001 :town "Scarborough"]
 [:tg/node-1001 :county "Yorkshire"]
 [:tg/node-1000 :home :tg/node-1001]
 [:tg/node-1002 :id "jane"]
 [:tg/node-1002 :name "Jane"]
 [:tg/node-1002 :home :tg/node-1001]]

Loading is almost the same, with the change in the parameter label:

(require '[asami.core :as d])
(require '[clojure.edn :as edn])
(require '[clojure.java.io :as io]) 
(def data (edn/read (io/reader "triples.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-triples data}))

Internal Nodes

Nodes that indicate entities are often represented using Asami Internal Nodes. These are serialized in EDN as #a/n[1234] where the number can be any positive long integer. Loading data that contains these elements requires a reader that is found in asami.graph/node-reader.

An alternative to the triples above could include these internal nodes:

[[#a/n[1000] :id "charles"]
 [#a/n[1000] :name "Charles"]
 [#a/n[1001] :id "scarborough"]
 [#a/n[1001] :town "Scarborough"]
 [#a/n[1001] :county "Yorkshire"]
 [#a/n[1000] :home #a/n[1001]]
 [#a/n[1002] :id "jane"]
 [#a/n[1002] :name "Jane"]
 [#a/n[1002] :home #a/n[1001]]]

This would then be loaded by specifying the reader:

(require '[asami.core :as d])
(require '[asami.graph :as g])
(require '[clojure.edn :as edn])
(require '[clojure.java.io :as io]) 
(def data (edn/read {:readers g/node-reader} (io/reader "triples.edn")))
(def conn (d/connect "asami:mem://data"))
(d/transact conn {:tx-triples data}))

Import/Export

Asami can also export data which can be imported to another store. The specific database must be specified, rather than the connection

(require '[asami.core :as d])
;; load up existing data
(def conn (d/connect "asami:local://existing"))
;; export this data to a file
(spit "export.edn" (d/export-str (d/db conn)))

The data can be imported into another database connection. In this case, it must be imported to a connection, where it will update the most recent database:

(require '[asami.core :as d])
(def conn2 (d/connect "asami:mem://newdata"))
(d/import-data conn2 (slurp "export.edn"))

Database-to-Database

Data may also be sent directly from one database directly to another. For instance, the value of a local database from some time in the past could be sent to an in memory database to experiment with:

(require '[asami.core :as d])
(def conn-existing (d/connect "asami:local://existing"))
(def conn-new (d/connect "asami:mem://new"))
;; get a database from conn-existing for some time in the past
(def past-data (d/as-of (d/db conn-existing) #inst "2021-06-28T23:08:16.949-00:00"))
(d/import-data conn-new (d/export-data past-data))
Clone this wiki locally