Skip to content

4. Transactions

Paula Gearon edited this page Feb 20, 2022 · 3 revisions

Contents

Transactions are the mechanism for modifying a database. No previous versions of the database are modified. Instead a new database is created with the changes incorporated. Both the old and the new databases share the majority of their data, so this process is fast and efficient in storage.

Transactions are executed against the Connection by which the database is accessed. This updates the latest database available in the connection, and all the previous databases remain available through the history.

Transact

The function used to modify a database is asami.core/transact. This takes 2 parameters:

  • The connection of the database to change.
  • The transaction data.

Asami supports both the original and updated forms of transaction data:

  • A sequence of entity structures and :db/add, :db/retract statements.
  • A map containing a key of :tx-data with a value of the above transaction sequence, or a key of :tx-triples which contains raw statements to be added.

The transact function returns a future (Clojure) or a delay (ClojureScript) which refer to the results of the transaction. The contents of these objects can be accessed using the deref function, or the @ macro.

The data that is returned is a map with the following:

  • :db-before   The value of the database before the transaction.
  • :db-after   The value of the database after the transaction.
  • :tx-data   The sequence of graph modifications, each expressed as a Datom.
  • :tempids   A map of entity IDs that were specified for insertion, and the final node values that they eventually referred to.

Setup

Create a database and get it ready to transact data into it.

(require '[asami.core :as d])
(def db-uri "asami:mem://test-data")
(d/create-database db-uri)
(def conn (d/connect db-uri))

Transacting

Define some data and add it to the database:

(def data
  [{:message "Hi Dad" :time "2020-07-29T04:23:19.622-00:00"}
   {:message "Hello Daughter. Why are you still up?" :time "2020-07-29T04:23:53.906-00:00"}
   {:message "I'm writing docs for Asami" :time "2020-07-29T04:24:44.966-00:00"}
   {:message "What's Asami?" :time "2020-07-29T04:25:11.836-00:00"}])
(def tx (d/transact conn data))

Dereferencing the response and looking at the structure with (clojure.pprint/pprint @tx)

{:db-before
 {:graph {:spo {}, :pos {}, :osp {}},
  :history [],
  :timestamp #inst "2020-07-29T04:26:26.050-00:00"},
 :db-after ....... ;; !!!BIG SCARY DATA STRUCTURE!!! DON'T WORRY ABOUT THIS PART!
 :tx-data
 (#datom [:tg/node-10498 :db/ident :tg/node-10498 1 true]
  #datom [:tg/node-10498 :tg/entity true 1 true]
  #datom [:tg/node-10498 :message "Hi Dad" 1 true]
  #datom [:tg/node-10498 :time "2020-07-29T04:23:19.622-00:00" 1 true]
  #datom [:tg/node-10499 :db/ident :tg/node-10499 1 true]
  #datom [:tg/node-10499 :tg/entity true 1 true]
  #datom [:tg/node-10499 :message "Hello Daughter. Why are you still up?" 1 true]
  #datom [:tg/node-10499 :time "2020-07-29T04:23:53.906-00:00" 1 true]
  #datom [:tg/node-10500 :db/ident :tg/node-10500 1 true]
  #datom [:tg/node-10500 :tg/entity true 1 true]
  #datom [:tg/node-10500 :message "I'm writing docs for Asami" 1 true]
  #datom [:tg/node-10500 :time "2020-07-29T04:24:44.966-00:00" 1 true]
  #datom [:tg/node-10501 :db/ident :tg/node-10501 1 true]
  #datom [:tg/node-10501 :tg/entity true 1 true]
  #datom [:tg/node-10501 :message "What's Asami?" 1 true]
  #datom [:tg/node-10501 :time "2020-07-29T04:25:11.836-00:00" 1 true]),
 :tempids
 #:tg{:node-10498 :tg/node-10498,
      :node-10499 :tg/node-10499,
      :node-10500 :tg/node-10500,
      :node-10501 :tg/node-10501}}

These structures will be explained below.

Transaction Data Structure

All data in a graph is of the form:

node-1 edge node-2

These tuples describe 2 nodes, and a directed-labeled edge between them. The second node can be a scalar value (such as a number or a string). These can be treated as nodes in most cases, but should not have edges coming out of them.

In-memory graphs use keywords in the tg namespace to represent graph nodes.

With this approach, the first node can represent an object, and the edge/node-2 pairs can represent attribute/values for the object. When node-2 references another rather than a scalar, then this can be an edge between objects.

Tuples like this are represented as Datoms. A Datom is an operation that describes these 3 elements, and whether or not they were added or removed from the graph. For instance:

#datom [:tg/node-10502 :relates-to :tg/node-10499 1 true]

The first 3 elements are the first node, the edge, and the second node. The next value is the transaction number when the operation was executed. The final value is true to indicate that the data is added, or false to indicate that it is removed.

tx-data

The data passed to a transaction is a sequence of 3 types of elements:

  • A vector of the form: [:db/add node-1 edge node-2 ]
  • A vector of the form: [:db/retract node-1 edge node-2 ]
  • An entity object.

Entities are just a Clojure/ClojureScript map-style object, containing attributes and values. They can be nested, containing arrays or other entities.

Entity IDs

Because an entity is represented as a single node in the database, it may be necessary to reference the entity by the node. A pseudo property called :db/id is used to represent the ID of an entity.

For instance, if an inventory item has a node of :tg/node-1001 then the entity can be represented as:

{:db/id :tg/node-1001
 :inventory/label "Widget"
 :inventory/part-nr "THX-1138"
 :inventory/stock-count 2187}

This entity will then be represented with the following tuples:

:tg/node-1001 :inventory/label "Widget"
:tg/node-1001 :inventory/part-nr "THX-1138"
:tg/node-1001 :inventory/stock-count 2187

Temporary IDs

While it is possible to create entity nodes and use them, these nodes are usually automatically allocated for each entity to represent them in the database. However, it may be necessary to address a previously created entity, so as to create references between them. This is done by allocating a temporary ID to an entity. Once the entity is allocated a node, all subsequent references to the temporary ID will be updated to refer to the node.

This also applies to edges created by :db/add statements:

{:db/id -1
 :inventory/label "Widget"
 :inventory/part-nr "THX-1138"
 :inventory/stock-count 2187}
{:db/id -2
 :inventory/label "Doohicky"
 :inventory/part-nr "AA-23"
 :inventory/stock-count 5
 :inventory/replaces -1}
[:db/add -1 :inventory/replaced-by -2]

These 2 entities will be represented with tuples like the following:

:tg/node-1031 :inventory/label "Widget"
:tg/node-1031 :inventory/part-nr "THX-1138"
:tg/node-1031 :inventory/stock-count 2187
:tg/node-1032 :inventory/label "Doohicky"
:tg/node-1032 :inventory/part-nr "AA-23"
:tg/node-1032 :inventory/stock-count 5
:tg/node-1032 :inventory/replaces :tg/node-1031
:tg/node-1031 :inventory/replaced-by :tg/node-1032

Identity Values

Entities can have a value that is associated with their identity via the :db/ident property. Unlike :db/id this is a real attribute that is placed on the entity, although it is hidden when entities are retrieved via the entity function. As an alternative, Asami also supports an :id property. This serves the same role as :db/ident but is visible to the user and will be returned by the entity function.

Like the :db/id pseudo attribute, the :db/ident and :id attributes can be used to identify an entity during a transaction, or when retrieving an entity. This form of ID is often very useful for addressing entities in subsequent transactions. The :db/ident or :id attributes can be any data type.

The entities in the previous example can be rewritten to use :db/ident or :id instead. However, using these ident values in :db/add statements has greater potential to accidentally update a different entity than the one intended, so caution should be exercised when using these statements.

{:db/ident "widget"
 :inventory/label "Widget"
 :inventory/part-nr "THX-1138"
 :inventory/stock-count 2187}
{:db/ident :doohicky
 :inventory/label "Doohicky"
 :inventory/part-nr "AA-23"
 :inventory/stock-count 5
 :inventory/replaces {:db/ident "widget"}}
[:db/add "widget" :inventory/replaced-by :doohicky]
:tg/node-1031 :db/ident "widget"
:tg/node-1031 :inventory/label "Widget"
:tg/node-1031 :inventory/part-nr "THX-1138"
:tg/node-1031 :inventory/stock-count 2187
:tg/node-1032 :db/ident :doohicky
:tg/node-1032 :inventory/label "Doohicky"
:tg/node-1032 :inventory/part-nr "AA-23"
:tg/node-1032 :inventory/stock-count 5
:tg/node-1032 :inventory/replaces :tg/node-1031
:tg/node-1031 :inventory/replaced-by :tg/node-1032]

:tempids

As the datoms are generated for insertion, a map is created to connect temporary :db/id values to the allocated node, as well as any :db/ident values that refer to nodes that get created. This map is then returned as the :tempids field in the result from transact.

Using this field it is possible to get the internal node of a created entity.

(def data [{:db/id -1 :name "Fitzwilliam" :home "Pemberley"}
           {:db/id -2 :name "Elizabeth" :home "Longbourn"}])
(def tx (d/transact conn {:tx-data data}))

;; get the IDs
(def will (get (:tempids @tx) -1))
(def liz (get (:tempids @tx) -2))

;; retrieve the Fitzwilliam entity and print
(prn will)
(pprint (d/entity (d/db conn) will))

This displays the internal node ID for the Fitzwilliam entity, and then the retrieved entity:

:tg/node-10504
{:name "Fitzwilliam", :home "Pemberley"}

Multi-cardinality Properties

All properties in Asami are multi-cardinality (this differs from Datomic, in which properties are declared as either single, or multi-cardinality).

Entities are asserted using map structures, which do not allow the same attribute to be specified multiple times. This is managed by using a set of values for the property. The following example specifies multiple values for the :nickname property:

(require '[asami.core :as d])
(def db-uri "asami:mem://entity-data")
(d/create-database db-uri)
(def conn (d/connect db-uri))
(def data [{:db/ident "Catherine"
            :name "Catherine"
            :nickname #{"Cath", "Cathy", "Cate"}}])
(def tx (d/transact conn {:tx-data data}))

This will create triples like the following:

:tg/node-10896 :tg/entity true
:tg/node-10896 :db/ident "Catherine"
:tg/node-10896 :name "Catherine"
:tg/node-10896 :nickname "Cathy"
:tg/node-10896 :nickname "Cate"
:tg/node-10896 :nickname "Cath"

When retrieving entities, these values will be returned in a set:

(d/entity (d/db conn) "Catherine")

{:name "Catherine", :nickname #{"Cathy" "Cate" "Cath"}}

Entity Updates

Asami has no schema, and with an Open World Assumption it allows all attributes to be multi-arity. This means that transacting new data for an entity will lead to it being added to the entity, rather than modifying anything that is there.

This can be shown with some simple entities:

(require '[asami.core :as d])
(def db-uri "asami:mem://entity-data")
(d/create-database db-uri)
(def conn (d/connect db-uri))
(def data [{:db/ident "will" :name "William" :home "Pemberley"}
           {:db/ident "liz" :name "Lizzy" :home "Longbourn"}])
(def tx (d/transact conn {:tx-data data}))

The first entity should be named "Fitzwilliam", but loading this data only adds to the entity, and does not replace it. Entities represent multiple values for an attribute as a set of values since each attribute is only allowed to appear once.

To see the raw data, we can query for the names on this entity:

(d/q '[:find [?n ...] :where [?p :name ?n] [?p :db/ident "will"]] (d/db conn))
(d/entity (d/db conn) "will")
(def tx (d/transact conn {:tx-data [{:db/ident "will" :name "Fitzwilliam"}]}))
(d/q '[:find [?n ...] :where [?p :name ?n] [?p :db/ident "will"]] (d/db conn))
(d/entity (d/db conn) "will")

The first query and entity show:

("William")

{:name "William", :home "Pemberley"}

While the second and entity show:

("William" "Fitzwilliam")

{:name #{"William" "Fitzwilliam"}, :home "Pemberley"}

We often want to update an attribute, rather than insert an additional value. To accomplish this, Asami accepts an annotation to uploads to indicate that an attribute is to be updated. This is done by appending the quote character (') to the attribute name. Using this, we can repeat the above operation to update Lizzy's name to Elizabeth.

(d/q '[:find [?n ...] :where [?p :name ?n] [?p :db/ident "liz"]] (d/db conn))
(def tx (d/transact conn {:tx-data [{:db/ident "liz" :name' "Elizabeth"}]}))
(d/q '[:find [?n ...] :where [?p :name ?n] [?p :db/ident "liz"]] (d/db conn))
(d/entity (d/db conn) "liz")

Note how the transacted data has an attribute of :name' rather than :name.

The first query shows names of: ("Lizzy")

The second query shows: ("Elizabeth")

Retrieving the entity shows that is still only a single name:

{:home "Longbourn", :name "Elizabeth"}

Multiple fields may be loaded at once. Those which are annotated with a quote will be replaced, while all others are simply inserted.

Replacement Annotations

The use of a quote on a keyword is called a Replacement Annotation. It is safe to use a replacement annotation on an attribute that does not yet exist, making it possible to use them liberally even when creating an entity. It is not possible to annotate the :db/id or :db/ident attributes.

Replacement annotations result in the database finding existing datoms that use the annotated attribute and removing those datoms before inserting the new one. Because of this, we can use this to fix the problem of Fitzwilliam having 2 names after the last section:

(d/q '[:find [?n ...] :where [?p :name ?n] [?p :db/ident "will"]] (d/db conn))
(def tx (d/transact conn {:tx-data [{:db/ident "will" :name' "Fitzwilliam" :county "Derbyshire" :income' 10000}]}))
(d/q '[:find [?n ...] :where [?p :name ?n] [?p :db/ident "will"]] (d/db conn))
(d/entity (d/db conn) "will")

The first query returns: ("William" "Fitzwilliam")

After the transaction, we can see the names returned are: ("Fitzwilliam")

The final entity is:

{:home "Pemberley",
 :name "Fitzwilliam",
 :county "Derbyshire",
 :income 10000}

We can see the operations that made this change:

=> (pprint (:tx-data @tx))
(#datom [:tg/node-10498 :name "William" 4 false]
 #datom [:tg/node-10498 :name "Fitzwilliam" 4 false]
 #datom [:tg/node-10498 :name "Fitzwilliam" 4 true]
 #datom [:tg/node-10498 :county "Derbyshire" 4 true]
 #datom [:tg/node-10498 :income 10000 4 true])

We can see that the first 2 statements remove the existing values for the :name attribute, and then replace them with the new value (which happened to match one of the previous values).

We can also see that the :income attribute was added normally with no existing statements to be removed.

Append Annotation

A second form of update annotation is also provided for modifying an arrays in an entity. This annotation is called an Appending Annotation and is done by adding the + character to the end of the attribute.

If an attribute refers to an array, then this is a single value (which just happens to contain multiple values). Adding another value for that attribute will create multiple attributes for the entity, and not add to the array. If Lydia was not mentioned after eloping, then we could try to add her as an additional sister:

(def data [{:db/ident "jane" :name "Jane" :sisters ["Elizabeth" "Mary" "Catherine"]}])
(def tx (d/transact conn {:tx-data data}))
(def data [{:db/ident "jane" :sisters "Lydia"}])
(def tx (d/transact conn {:tx-data data}))
(d/entity (d/db conn) "jane")

This results in an entity of:

{:name "Jane", :sisters #{["Elizabeth" "Mary" "Catherine"] "Lydia"}}

This is almost certainly not what is wanted. Note that the :sisters attribute is a set of 2 values: the list of 3 strings, and a single string. Similarly, using the ' annotation would replace the array with the single string.

To add Lydia to the array, we need the + annotation. We can try this for Mary:

(def data [{:db/ident "mary" :name "Mary" :sisters ["Jane" "Elizabeth" "Catherine"]}])
(def tx (d/transact conn {:tx-data data}))
(def data [{:db/ident "mary" :sisters+ "Lydia"}])
(def tx (d/transact conn {:tx-data data}))
(d/entity (d/db conn) "mary")

This creates Mary with 3 sisters, and then updates the array to include Lydia. Retrieving this modified entity gives:

{:name "Mary", :sisters ("Jane" "Elizabeth" "Catherine" "Lydia")}

Annotation Rationale

Being an open world model, Asami is trying to allow flexibility and avoid prescribing how structures must exist. At the same time, Asami is trying to sensibly convert flexible data found in the graph to return entities.

One approach to avoiding the use of annotations is to introduce a schema. This is already used by other databases which are explicitly built for this model. A schema needs to be read from the database at least once, and requires extra validation of data. Schemas also:

  • Allow greater scope for data validation (pro)
  • Permit inferencing for insert and update operations (pro)
  • Restrict type flexibility (attributes can only refer to one data type) (con)
  • Require 2 transactions for new attributes (one to update the schema, one to use the attribute) (con)
  • Restrict attributes to a limited set (do not permit reified attributes) (con)

There are benefits and detriments to using a schema. Like RDF, Asami is schemaless.

One outcome of this choice is the need for these annotations. They allow the used to specify exactly how they want to update an entity, rather than using heuristics to choose.

Consider an attribute of :friends, and the possible ways it might appear in an attribute:

{:name "Kristy" :friends ["Claudia" "Mary"]}

If a new friend of "Stacey" is to be added, then possible approaches are:

  • Append "Stacey" to the array.
  • Replace the array with a single value of "Stacey".
  • Assume that the first array is a group of friends, and Stacey will be in a new group. This may be single or an array.

The first one is the most likely to be correct, but there may be cases where it is not. More confusingly, a single value would be treated differently if there was no prior value for the :friends attribute, making the behavior inconsistent and state dependent.

By providing annotations, the user makes these choices explicitly:

  • {:friends "Kristy"} means that a new :friends attribute is added, even if one already exists.
  • {:friends' "Kristy"} means that any existing values for the :friends attribute will be removed, and replaced with "Kristy".
  • {:friends+ "Kristy"} means that the :friends attribute is an array, and "Kristy" will be appended. If no such array exists, then one is created.