Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: threatgrid/asami
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 1.2.2
Choose a base ref
...
head repository: threatgrid/asami
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref

Commits on Aug 21, 2020

  1. Started implementation

    Paula Gearon committed Aug 21, 2020
    Copy the full SHA
    ccf13c0 View commit details

Commits on Aug 27, 2020

  1. Implemented the codec for the JVM

    Paula Gearon committed Aug 27, 2020
    Copy the full SHA
    8ea920a View commit details
  2. Release 1.2.2

    Paula Gearon committed Aug 27, 2020
    Copy the full SHA
    b4f075a View commit details

Commits on Aug 28, 2020

  1. Changing decoding to be buffer based

    Paula Gearon committed Aug 28, 2020
    Copy the full SHA
    3d98539 View commit details
  2. implemented file mapping

    Paula Gearon committed Aug 28, 2020
    Copy the full SHA
    dc02475 View commit details
  3. Initial implementation of paged reading

    Paula Gearon committed Aug 28, 2020
    Copy the full SHA
    5015c0c View commit details
  4. Moved encoder/decoder into correct directory

    Paula Gearon committed Aug 28, 2020
    Copy the full SHA
    032552b View commit details
  5. Moved flat into the correct directory

    Paula Gearon committed Aug 28, 2020
    Copy the full SHA
    8c071ca View commit details

Commits on Aug 31, 2020

  1. Added tests for paging

    Paula Gearon committed Aug 31, 2020
    Copy the full SHA
    5338d96 View commit details

Commits on Sep 1, 2020

  1. Testing mapped file reads

    Paula Gearon committed Sep 1, 2020
    Copy the full SHA
    5867519 View commit details
  2. Updated reader to use a paged reader

    Paula Gearon committed Sep 1, 2020
    Copy the full SHA
    40016c1 View commit details
  3. Moved flat.clj to flat_file.clj

    Paula Gearon committed Sep 1, 2020
    Copy the full SHA
    f2ffd35 View commit details
  4. Testing paged flat file storage

    Paula Gearon committed Sep 1, 2020
    Copy the full SHA
    dadbfe6 View commit details

Commits on Sep 2, 2020

  1. Implemented and tested saving to a flat file

    Paula Gearon committed Sep 2, 2020
    Copy the full SHA
    2049df2 View commit details
  2. Added comment explaining file operations

    Paula Gearon committed Sep 2, 2020
    Copy the full SHA
    021501c View commit details
  3. Copy the full SHA
    f04d769 View commit details
  4. Addressed a write/read race

    Paula Gearon committed Sep 2, 2020
    Copy the full SHA
    3ee6608 View commit details
  5. Added graph-transact with transient graphs

    Paula Gearon committed Sep 2, 2020
    Copy the full SHA
    9adfaa0 View commit details
  6. streamlined read-byte path

    Paula Gearon committed Sep 2, 2020
    Copy the full SHA
    33f84f5 View commit details

Commits on Sep 3, 2020

  1. Fixed range bug in read-bytes-into

    Paula Gearon committed Sep 3, 2020
    Copy the full SHA
    cfd8158 View commit details
  2. Copy the full SHA
    c9951b3 View commit details
  3. Copy the full SHA
    75dd93d View commit details
  4. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    aded1ae View commit details
  5. Removed transient updates

    Paula Gearon committed Sep 3, 2020
    Copy the full SHA
    9f42f06 View commit details
  6. Updating to 1.2.3

    Paula Gearon committed Sep 3, 2020
    Copy the full SHA
    e2e2c42 View commit details
  7. Merged with fix to deps number

    Paula Gearon committed Sep 3, 2020
    Copy the full SHA
    30be1f5 View commit details

Commits on Sep 4, 2020

  1. Updating

    Paula Gearon committed Sep 4, 2020
    Copy the full SHA
    70e9a3e View commit details
  2. Extra checking when selecting a graph field

    Paula Gearon committed Sep 4, 2020
    Copy the full SHA
    56eda41 View commit details

Commits on Sep 5, 2020

  1. Added tests for nil data

    Paula Gearon committed Sep 5, 2020
    Copy the full SHA
    717b291 View commit details
  2. Fixed references to naga namespace in test data

    Paula Gearon committed Sep 5, 2020
    Copy the full SHA
    bffedef View commit details
  3. updated changelog

    Paula Gearon committed Sep 5, 2020
    Copy the full SHA
    79ee295 View commit details

Commits on Sep 8, 2020

  1. New version to drop references to dead code

    Paula Gearon committed Sep 8, 2020
    Copy the full SHA
    c825e32 View commit details
  2. Doc changes for 1.2.5 release

    Paula Gearon committed Sep 8, 2020
    Copy the full SHA
    2cd1d2b View commit details
  3. Copy the full SHA
    0e3424f View commit details
  4. Removed transient updates

    Paula Gearon committed Sep 8, 2020
    Copy the full SHA
    b878369 View commit details
  5. Remove unneeded require

    Paula Gearon committed Sep 8, 2020
    Copy the full SHA
    feebd87 View commit details

Commits on Sep 15, 2020

  1. Copy the full SHA
    79dde51 View commit details
  2. Copy the full SHA
    567c140 View commit details

Commits on Sep 16, 2020

  1. Copy the full SHA
    ad48dbe View commit details

Commits on Sep 19, 2020

  1. Copy the full SHA
    fbc9af3 View commit details

Commits on Sep 22, 2020

  1. Fixed the as-connection function to return the connection and not the…

    … database
    Paula Gearon committed Sep 22, 2020
    Copy the full SHA
    6d511b3 View commit details

Commits on Sep 23, 2020

  1. Starting to port block implementation

    Paula Gearon committed Sep 23, 2020
    Copy the full SHA
    4f0fb24 View commit details
  2. Added more management

    Paula Gearon committed Sep 23, 2020
    Copy the full SHA
    973784e View commit details

Commits on Sep 24, 2020

  1. Updated to accept a write function for block implementations that don…

    …'t autowrite
    Paula Gearon committed Sep 24, 2020
    Copy the full SHA
    4912e83 View commit details

Commits on Sep 26, 2020

  1. Adding to tests. Expanding BlockManager api

    Paula Gearon committed Sep 26, 2020
    Copy the full SHA
    6bd932e View commit details

Commits on Sep 29, 2020

  1. Block file and block manager tests

    Paula Gearon committed Sep 29, 2020
    Copy the full SHA
    8b279cd View commit details

Commits on Sep 30, 2020

  1. Compiling

    Paula Gearon committed Sep 30, 2020
    Copy the full SHA
    b4ae851 View commit details
  2. fixed parameter passing

    Paula Gearon committed Sep 30, 2020
    Copy the full SHA
    3b065ac View commit details
  3. Copy the full SHA
    4cc0392 View commit details
  4. Renamed test namespaces

    Paula Gearon committed Sep 30, 2020
    Copy the full SHA
    1c888f1 View commit details
Showing with 46,242 additions and 6,153 deletions.
  1. +1 −0 .gitignore
  2. +1 −1 .travis.yml
  3. +294 −2 CHANGELOG.md
  4. +97 −22 README.md
  5. +13,713 −0 book.txt
  6. +14 −0 deps.edn
  7. +33 −25 project.clj
  8. +5,331 −5,331 resources/test/data.edn
  9. +190 −0 src-native/asami/main.clj
  10. +4 −3 src/asami/analytics.cljc
  11. +247 −0 src/asami/cache.cljc
  12. +170 −110 src/asami/common_index.cljc
  13. +245 −181 src/asami/core.cljc
  14. +35 −0 src/asami/durable/block/block_api.cljc
  15. +163 −0 src/asami/durable/block/bufferblock.clj
  16. +291 −0 src/asami/durable/block/file/block_file.clj
  17. +13 −0 src/asami/durable/block/file/util.clj
  18. +23 −0 src/asami/durable/block/file/voodoo.clj
  19. +72 −0 src/asami/durable/codec.cljc
  20. +73 −0 src/asami/durable/common.cljc
  21. +49 −0 src/asami/durable/common_utils.cljc
  22. +357 −0 src/asami/durable/decoder.clj
  23. +212 −0 src/asami/durable/decoder.cljs
  24. +441 −0 src/asami/durable/encoder.clj
  25. +157 −0 src/asami/durable/encoder.cljs
  26. +329 −0 src/asami/durable/flat_file.clj
  27. +261 −0 src/asami/durable/graph.cljc
  28. +37 −0 src/asami/durable/macros.cljc
  29. +174 −0 src/asami/durable/pool.cljc
  30. +233 −0 src/asami/durable/resolver.cljc
  31. +285 −0 src/asami/durable/store.cljc
  32. +448 −0 src/asami/durable/tree.cljc
  33. +604 −0 src/asami/durable/tuples.cljc
  34. +193 −0 src/asami/entities.cljc
  35. +24 −0 src/asami/entities/general.cljc
  36. +187 −0 src/asami/entities/reader.cljc
  37. +278 −0 src/asami/entities/writer.cljc
  38. +89 −7 src/asami/graph.cljc
  39. +63 −33 src/asami/index.cljc
  40. +52 −1 src/asami/internal.cljc
  41. +101 −65 src/asami/memory.cljc
  42. +47 −22 src/asami/multi_graph.cljc
  43. +56 −0 src/asami/peer.clj
  44. +50 −11 src/asami/planner.cljc
  45. +242 −0 src/asami/projection.cljc
  46. +249 −81 src/asami/query.cljc
  47. +54 −0 src/asami/sandbox.cljc
  48. +16 −2 src/asami/storage.cljc
  49. +85 −0 test-native/asami/main_test.clj
  50. +29 −29 test/asami/{test_analytics.cljc → analytics_test.cljc}
  51. +891 −0 test/asami/api_test.cljc
  52. +68 −0 test/asami/cache_test.cljc
  53. +274 −17 test/asami/{test_core_query.cljc → core_query_test.cljc}
  54. +322 −65 test/asami/{test_api.cljc → durable/api_test.cljc}
  55. +117 −0 test/asami/durable/block/blockfile_test.clj
  56. +156 −0 test/asami/durable/block/blockmanager_test.cljc
  57. +32 −0 test/asami/durable/block/test_util.cljc
  58. +503 −0 test/asami/durable/block_tree_test.cljc
  59. +444 −0 test/asami/durable/codec_test.cljc
  60. +115 −0 test/asami/durable/flat_test.cljc
  61. +288 −0 test/asami/durable/graph_test.cljc
  62. +46 −0 test/asami/durable/idx_codec_test.cljc
  63. +207 −0 test/asami/durable/object_codec_test.clj
  64. +118 −0 test/asami/durable/pages_test.clj
  65. +237 −0 test/asami/durable/pool_test.cljc
  66. +422 −0 test/asami/durable/store_test.cljc
  67. +135 −0 test/asami/durable/test_utils.cljc
  68. +277 −0 test/asami/durable/transitive_test.cljc
  69. +525 −0 test/asami/durable/tuples_test.cljc
  70. +23 −0 test/asami/entities/helper_stub.cljc
  71. +463 −0 test/asami/entities/test_entity.cljc
  72. +29 −3 test/asami/{test_memory_index.cljc → memory_index_test.cljc}
  73. +197 −0 test/asami/multi_graph_test.cljc
  74. +3 −3 test/asami/{test_planner.cljc → planner_test.cljc}
  75. +75 −22 test/asami/{test_query_internals.cljc → query_internals_test.cljc}
  76. +16 −0 test/asami/race_condition_test.clj
  77. +0 −98 test/asami/test_multi_graph.cljc
  78. +126 −19 test/asami/{test_transitive.cljc → transitive_test.cljc}
  79. +13,713 −0 test/resources/pride_and_prejudice.txt
  80. +8 −0 update.sh
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -9,3 +9,4 @@ pom.xml.asc
/.nrepl-port
.hgignore
.hg/
/.cpcache
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
language: clojure
lein: 2.9.4
lein: 2.9.6

before_script:
- mkdir -p out/asami
296 changes: 294 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,259 @@
# Change Log

## [Unreleased]
## [2.2.4] - 2022-02-22
### Added
- The `:db/ident` and `:id` attributes may now use compound keys.

### Fixed
- Fixed a problem where resolving a pattern on disk will lead to an error if the resolution is empty.

## [2.2.3] - 2021-11-10
### Added
- The `asami.core/entity` function now accepts Connections as well as Databases. This will select the latest database from the connection.

### Changed
- The third argument to the `asami.storage.Database/entity` function is now required. This change should not affect most users, as this is an internal function and should be accessed with `asami.core/entity`, which has not changed.

### Fixed
- Using an entity `:id` will now correctly select and update entities in transactions.

## [2.2.2] - 2021-10-19
### Added
- Added a new `:input-limit` option to transact. When included, the transaction will attempt to keep at or below this number of triples.

## [2.2.1] - 2021-10-16
### Fixed
- Accepting java.time.Instant objects on the `since` and `as-of` database functions.

## [2.2.0] - 2021-10-10
### Added
- `transact-async` has been added. On the JVM this will return a future that has not yet been completed. There is currently no difference between `transact` and `transact-async` for ClojureScript.

## Changed
- Transaction results now includes datoms that reflect the changes to the database, and not the changes requested of the database.
- Calling `delete-database` on an in-memory database URL will clear any open connections of data. If data is transacted into the connection, it will be reattached to the database registry as if it had been freshly opened.

## [2.1.3] - 2021-09-10
### Added
- OR expressions no longer require the same variable bindings for each sub-expression

## [2.1.2] - 2021-08-18
### Added
- Queries now test that projected variables in the `:find` clause appear in the `:where` clause.
- In memory graphs now how transaction IDs (not yet exposed in public APIs).

### Fixed
- User plans now using the correct return type, so complex queries don't cause errors while testing.
- Multiple bindings in the `:in` clause will now combine correctly during testing.

## [2.1.1] - 2021-06-28
### Added
- Added `asami.core/export-str`. This is a shortcut to convert a data export to a string, suitable for `clojure.core/spit`, since the default will write this as a LazySeq label.
- `asami.core/import-data` now accepts an `opts` argument, the same way that `edn` readers do. This is used when reading from `edn` text.

### Changed
- Single-end bindings on transitive constraints applied to memory graphs are now using the unbound form internally to gain log(n) complexity.
- Cleanups to internal code.

### Fixed
- In memory transitive attribute constraints have been fixed for the case of an unbound constraint for a given attribute.
- Boolean values saved to disk no longer create errors when reading back `false`.

## [2.1.0] - 2021-06-22
### Added
- Transactions include resource usage, so resources (files) can be cleaned up after failure.

### Fixed
- Recursive entities now correctly cut the recursion when extracting entities from the graph.
- Shutdown now releases connections. This truncates files to used space only.

### Changed
- Nodes now serialized using vector notation and without a redundant space. Previous node deserialization still supported.
- Serializing numbers is now smaller.
- Arrays and maps can now be serialized as values. This will show up in the API soon.
- Indexes now share file resources. This has removed 4 files.

## [2.0.6] - 2021-06-10
### Added
- A new CLI for loading data and executing queries. This builds natively with GraalVM.
- Exposing `now` and `instant?` from `asami.core`
- Serialization of arrays and maps. This will enable upcoming features.
- An `:id` attribute can now be used as a synonym for `:db/ident`. These values are not removed from entities, unlike the `:db` attributes.

### Fixed
- Aggregates correctly group even when selections are not sensible.

### Changed
- Integrating entity code fully into Asami. This is no longer imported from Zuko.
- Minor efficiency improvements to in-memory updates and reads.
- Booleans are now encoded into IDs and no longer serialized.

## [2.0.5] - 2021-05-27
### Changed
- Entity conversion to statements is no longer recursive on arrays. This allows for larger arrays.

## [2.0.4] - 2021-05-07
### Fixed
- Fixed missing functions on read-only trees

## [2.0.3] - 2021-05-06
### Fixed
- Fixed problem where the internal node IDs were not initializing from saved transactions.
- Imports now update the node ID generator to allocate IDs after imported data.

## [2.0.2] - 2021-04-28
### Fixed
- Removed multiple extensions for filenames.
- Fixed grouping for some aggregate queries.

## [2.0.1] - 2021-04-20
### Fixed
- Entity references to top-level entities no longer delete the referenced entity when the reference changes.

### Added
- Expanded handling of graph conversion for query objects, including getting the latest database from a connection.

### Changed
- Updated to Zuko 0.6.4.
- Top level entities now include `:tg/owns` edges to all sub structures that they own.

## [2.0.0] - 2021-04-08
### Changed
- Updated to Zuko 0.6.2, core.cache 1.0.207, and ClojureScript 1.10.844.

## [2.0.0-alpha9] - 2021-04-02
### Added
- Locking the transaction file during writes to prevent multiple processes from trying to modify it concurrently.

### Changed
- Opening files no longer allows variations on paths using . and ..

## [2.0.0-alpha8] - 2021-03-30
### Fixed
- Addressed concurrency bugs found in the JVM, for both memory-based and durable storage.

### Changed
- Updated to Zuko 0.6.0. This introduces new portable i/o operations.

## [2.0.0-alpha7] - 2021-03-20
### Changed
- Updated to Zuko 0.5.1. This allows arbitrary keytypes for entities.

## [2.0.0-alpha6] - 2021-03-19
### Changed
- Updated to Zuko 0.5.0. This means that entities without a temporary ID do not map their new IDs back to themselves in the `:tempids` of transactions.
- Zuko no longer brings in the unneeded Cheshire and JacksonXML dependencies.
- Cleaned up reflection in the durable layer, with a 35% speed improvement.

## [2.0.0-alpha5] - 2021-03-18
### Added
- `count-triple` implemented to scan index tree with reduced block access.

## [2.0.0-alpha4] - 2021-03-17
### Added
- Supporting lookup refs in transactions (thanks to @mk)
- Supporting transitive attributes for durable graphs.

### Fixed
- Fixed some transitive attribute operations that did not handle zero-steps correctly.

### Changed
- Updated to Zuko 0.4.6. This adds lookup refs to entities in transactions.

## [2.0.0-alpha3] - 2021-03-10
### Fixed
- Fixed bug that ignored :db/retract statements.

## [2.0.0-alpha2] - 2021-03-09
### Added
- Internal node type. This avoids the need for interning keywords as nodes.
- Added the `asami.Peer` class. This is very early access.

### Changed
- Updated to Zuko 0.4.4. This shifts the function whitelist into Zuko, and reduces the number of functions referenced in ClojureScript.
- Added functions for `and` and `or`.

### Fixed
- Functions from `clojure.string` can now be accessed in Clojure.

## [2.0.0-alpha] - 2021-03-05
### Added
- Durable storage provisioned on mapped files.
- Projection styles now work on aggregates
- `count`, `count-distinct` and `sample` can work on wildcards.

### Fixed
- `count` now de-duplicates, and `count-distinct` introduced.

## [1.2.15] - 2021-02-19
### Fixed
- Bugfix for multigraph entities

## [1.2.14] - 2021-02-18
### Changed
- Removed Clojurescript from the dependency tree of the generated artifacts.

## [1.2.13] - 2021-02-03
### Added
- Some Trace and Debug level logging for transactions and queries.

### Changed
- Moved to Zuko 0.4.0.

## [1.2.12] - 2021-01-19
### Added
- Bindings and Filters are now restricted by default. Introduced `asami.query/*override-restrictions*` flag to avoid restrictions.
- Can now filter by operations that are retrieved or calculated per row.
- Added internal API for Connections to expose their transaction ID.
- Added extra API schema in the Connection sources

## [1.2.11] - 2021-01-12
### Fixed
- Updated schema definition of `core/transact`. This only affected code with schema validation.

## [1.2.10] - 2021-01-11
### Added
- New update-fn argument accepted in `asami.core/transact` to allow direct graph update operations.

## [1.2.9] - 2021-01-07
### Fixed
- Auto generated connection URIs were malformed, and have been fixed.

## [1.2.8] - 2020-12-14
### Changed
- Updated to Zuko 0.3.3 for performance improvement in loading entities.

## [1.2.7] - 2020-12-03
### Added
- Added support for variables to be used as functions in filters. Previously this was only possible in bindings.

## [1.2.6] - 2020-11-09
### Added
- Added `nested?` flag to the `entity` function.

## [1.2.5] - 2020-09-08
### Fixed
- Removed references to dead library code in tests.

## [1.2.4] - 2020-09-04
### Added
- Allowing naga.store/Storage to be used in a query.
- Added support for nil entries, via Zuko.

### Fixed
- Changing to Zuko 0.3.1 for bugfix.

## [1.2.3] - 2020-09-03
### Added
- Supporting empty arrays in entities (via Zuko update)

### Changed
- Change to internal APIs for improved transactions

## [1.2.2] - 2020-08-27
### Fixed
- Fixed use of macro as a value in CLJS

### Added
- `delete-database` function.

@@ -53,6 +306,45 @@
### Added
- Introduced Update Annotations

[Unreleased]: https://github.com/threatgrid/asami/compare/1.2.1...HEAD
[Unreleased]: https://github.com/threatgrid/asami/compare/2.2.3...HEAD
[2.2.4]: https://github.com/threatgrid/asami/compare/2.2.3...2.2.4
[2.2.3]: https://github.com/threatgrid/asami/compare/2.2.2...2.2.3
[2.2.2]: https://github.com/threatgrid/asami/compare/2.2.1...2.2.2
[2.2.1]: https://github.com/threatgrid/asami/compare/2.2.0...2.2.1
[2.2.0]: https://github.com/threatgrid/asami/compare/2.1.3...2.2.0
[2.1.3]: https://github.com/threatgrid/asami/compare/2.1.2...2.1.3
[2.1.2]: https://github.com/threatgrid/asami/compare/2.1.1...2.1.2
[2.1.1]: https://github.com/threatgrid/asami/compare/2.1.0...2.1.1
[2.1.0]: https://github.com/threatgrid/asami/compare/2.0.6...2.1.0
[2.0.6]: https://github.com/threatgrid/asami/compare/2.0.5...2.0.6
[2.0.5]: https://github.com/threatgrid/asami/compare/2.0.4...2.0.5
[2.0.4]: https://github.com/threatgrid/asami/compare/2.0.3...2.0.4
[2.0.3]: https://github.com/threatgrid/asami/compare/2.0.2...2.0.3
[2.0.2]: https://github.com/threatgrid/asami/compare/2.0.1...2.0.2
[2.0.1]: https://github.com/threatgrid/asami/compare/2.0.0...2.0.1
[2.0.0]: https://github.com/threatgrid/asami/compare/2.0.0-alpha9...2.0.0
[2.0.0-alpha9]: https://github.com/threatgrid/asami/compare/2.0.0-alpha8...2.0.0-alpha9
[2.0.0-alpha8]: https://github.com/threatgrid/asami/compare/2.0.0-alpha7...2.0.0-alpha8
[2.0.0-alpha7]: https://github.com/threatgrid/asami/compare/2.0.0-alpha6...2.0.0-alpha7
[2.0.0-alpha6]: https://github.com/threatgrid/asami/compare/2.0.0-alpha5...2.0.0-alpha6
[2.0.0-alpha5]: https://github.com/threatgrid/asami/compare/2.0.0-alpha4...2.0.0-alpha5
[2.0.0-alpha4]: https://github.com/threatgrid/asami/compare/2.0.0-alpha3...2.0.0-alpha4
[2.0.0-alpha3]: https://github.com/threatgrid/asami/compare/2.0.0-alpha2...2.0.0-alpha3
[2.0.0-alpha2]: https://github.com/threatgrid/asami/compare/2.0.0-alpha...2.0.0-alpha2
[2.0.0-alpha]: https://github.com/threatgrid/asami/compare/1.2.14...2.0.0-alpha
[1.2.15]: https://github.com/threatgrid/asami/compare/1.2.14...1.2.15
[1.2.14]: https://github.com/threatgrid/asami/compare/1.2.13...1.2.14
[1.2.13]: https://github.com/threatgrid/asami/compare/1.2.12...1.2.13
[1.2.12]: https://github.com/threatgrid/asami/compare/1.2.11...1.2.12
[1.2.11]: https://github.com/threatgrid/asami/compare/1.2.10...1.2.11
[1.2.10]: https://github.com/threatgrid/asami/compare/1.2.9...1.2.10
[1.2.9]: https://github.com/threatgrid/asami/compare/1.2.8...1.2.9
[1.2.8]: https://github.com/threatgrid/asami/compare/1.2.7...1.2.8
[1.2.7]: https://github.com/threatgrid/asami/compare/1.2.6...1.2.7
[1.2.6]: https://github.com/threatgrid/asami/compare/1.2.5...1.2.6
[1.2.5]: https://github.com/threatgrid/asami/compare/1.2.4...1.2.5
[1.2.4]: https://github.com/threatgrid/asami/compare/1.2.3...1.2.4
[1.2.3]: https://github.com/threatgrid/asami/compare/1.2.2...1.2.3
[1.2.2]: https://github.com/threatgrid/asami/compare/1.2.1...1.2.2
[1.2.1]: https://github.com/threatgrid/asami/compare/1.2.0...1.2.1
[1.2.0]: https://github.com/threatgrid/asami/compare/1.1.0...1.2.0
119 changes: 97 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,39 @@
> This repository is no longer being maintained. For ongoing development, please see: https://github.com/quoll/asami.
# asami [![Build Status](https://travis-ci.org/threatgrid/asami.svg?branch=main)](https://travis-ci.org/threatgrid/asami) [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](CODE_OF_CONDUCT.md)

An in-memory graph database, for Clojure and ClojureScript.
A graph database, for Clojure and ClojureScript.

The latest version is :

[![Clojars Project](http://clojars.org/org.clojars.quoll/asami/latest-version.svg)](http://clojars.org/org.clojars.quoll/asami)

Asami is a _schemaless_ database, meaning that data may be inserted with no predefined schema. This flexibility has advantages and disadvantages. It is easier to load and evolve data over time without a schema. However, functionality like upsert and basic integrity checking is not available in the same way as with a graph with a predefined schema.
## Goals
Asami is both similar to and different from other graph databases. Some of the goals of the project are:
- **Schema-less data**. Data can be loaded without prior knowledge of its structures.
- **Stable**. Storage uses immutable structures to ensure that writes cannot lead to data corruption.
- **Multiplatform**. Asami runs on the Java Virtual Machine and on JavaScript platforms (browsers, node.js, etc).
- **Ease of setup**. Asami managed storage requires no provisioning, and can be created in a single statement.
- **Plugable**. Storage is a pluggable system that allows for multiple storage types, both local and remote.
- **Analytics**. Graph analytics are provided by using internal mechanisms for efficiency.

Asami is a _schemaless_ database, meaning that data may be inserted with no predefined schema. This flexibility has advantages and disadvantages. It is easier to load and evolve data over time without a schema. However, functionality like upsert and basic integrity checking is not available in the same way as with a graph with a predefined schema. Optional schemas are on the roadmap to help with this.

Asami also follows an _Open World Assumption_ model, in the same way that [RDF](http://www.w3.org/TR/rdf-primer) does. In practice, this has very little effect on the database, beyond what being schemaless provides.

Asami has a query API that looks very similar to a simplified Datomic. More details are available in the [Query documentation](https://github.com/threatgrid/asami/wiki/Querying).
If you are new to graph databases, then please read our [Introduction page](https://github.com/threatgrid/asami/wiki/2.-Introduction).

Asami has a query API that looks very similar to a simplified Datomic. More details are available in the [Query documentation](https://github.com/threatgrid/asami/wiki/6.-Querying).

## Features
There are several other graph databases available in the Clojure ecosystem, with each having their own focus. Asami is characterized by the following:
- Clojure and ClojureScript: Asami runs identically in both systems.
- Schema-less: Asami does not require a schema to insert data.
- Query planner: Queries are analyzed to find an efficient execution plan. This can be turned off.
- Analytics: Supports fast graph traversal operations, such as transitive closures, and can identify subgraphs.
- Integrated with Loom: Asami graphs are valid Loom graphs, via [Asami-Loom](https://github.com/threatgrid/asami-loom).
- Schema-less: Asami does not require a schema to insert data.
- Open World Assumption: Related to being schema-less, Asami borrows semantics from [RDF](http://www.w3.org/TR/rdf-primer) to lean towards an open world model.
- Pluggable Storage: Like Datomic, storage in Asami can be implemented in multiple ways. There are currently 2 in-memory graph systems, with durable storage on the way.
- Pluggable Storage: Like Datomic, storage in Asami can be implemented in multiple ways. There are currently 2 in-memory graph systems, and durable storage available on the JVM.

## Usage
### Installing
@@ -28,7 +43,7 @@ Asami can be made available to clojure by adding the following to a `deps.edn` f
```clojure
{
:deps {
org.clojars.quoll/asami {:mvn/version "1.2.1"}
org.clojars.quoll/asami {:mvn/version "2.2.4"}
}
}
```
@@ -37,11 +52,14 @@ This makes Asami available to a repl that is launched with the `clj` or `clojure

Alternatively, Asami can be added for the Leiningen build tool by adding this to the `:dependencies` section of the `project.clj` file:
```clojure
[org.clojars.quoll/asami "1.2.1"]
[org.clojars.quoll/asami "2.2.4"]
```

### Important Note for databases before 2.1.0
Asami 2.1.0 now uses fewer files to manage data. This makes it incompatible with previous versions. To port data from an older store to a new one, use the `asami.core/export-data` function on a database on the previous version of Asami, and `asami.core/import-data` to load the data into a new connection.

### Running
The [Asami API](https://github.com/threatgrid/asami/wiki/Asami-API) tries to look a little like Datomic.
The [Asami API](https://github.com/threatgrid/asami/wiki/7.-Asami-API) tries to look a little like Datomic.

Once a repl has been configured for Asami, the following can be copy/pasted to test the API:
```clojure
@@ -68,21 +86,23 @@ Once a repl has been configured for Asami, the following can be copy/pasted to t
:movie/genre "animation/adventure"
:movie/release-year 1995}])

(d/transact conn {:tx-data first-movies})
@(d/transact conn {:tx-data first-movies})
```
The [`transact`](https://github.com/threatgrid/asami/wiki/Asami-API#transact) operation returns an object that can be _dereferenced_ (via `clojure.core/deref` or the `@` macro) to provide information about the state of the database before and after the transaction. (A _future_ in Clojure, or a _delay_ in ClojureScript). Note that the transaction data can be provided as the `:tx-data` in a map object if other paramters are to be provided, or just as a raw sequence without the wrapping map.
The [`transact`](https://github.com/threatgrid/asami/wiki/7.-Asami-API#transact) operation returns an object that can be _dereferenced_ (via `clojure.core/deref` or the `@` macro) to provide information about the state of the database before and after the transaction. (A _future_ in Clojure, or a _delay_ in ClojureScript). Note that the transaction data can be provided as the `:tx-data` in a map object if other parameters are to be provided, or just as a raw sequence without the wrapping map.

For more information about loading data and executing `transact` see the [Transactions documentation](https://github.com/threatgrid/asami/wiki/Transactions).
For more information about loading data and executing `transact` see the [Transactions documentation](https://github.com/threatgrid/asami/wiki/4.-Transactions).

With the data loaded, a database value can be retrieved from the database and then queried:
With the data loaded, a database value can be retrieved from the database and then queried.

**NB:** The `transact` operation will be executed asynchronously on the JVM. Retrieving a database immediately after executing a `transact` will not retrieve the latest database. If the updated database is needed, then perform the `deref` operation as shown above, since this will wait until the operation is complete.

```clojure
(def db (d/db conn))

(d/q '[:find ?movie-title
:where [?m :movie/title ?movie-title]] db)
```
This returns the a sequence of results, with each result being a sequence of the selected vars in the `:find` clause (just `?movie-title` in this case):
This returns a sequence of results, with each result being a sequence of the selected vars in the `:find` clause (just `?movie-title` in this case):
```
(["Explorers"]
["Demolition Man"]
@@ -97,7 +117,7 @@ A more complex query could be to get the title, year and genre for all movies af
[?m :movie/genre ?genre]
[(> ?year 1990)]] db)
```
Entities found in a query can be extracted back out as objects using the [`entity`](https://github.com/threatgrid/asami/wiki/Asami-API#entity) function. For instance, the following is a repl session that looks up the movies released in 1995, and then gets the associated entities:
Entities found in a query can be extracted back out as objects using the [`entity`](https://github.com/threatgrid/asami/wiki/7.-Asami-API#entity) function. For instance, the following is a repl session that looks up the movies released in 1995, and then gets the associated entities:
```clojure
;; find the entity IDs. This variation in the :find clause asks for a list of just the ?m variable
=> (d/q '[:find [?m ...] :where [?m :movie/release-year 1995]] db)
@@ -119,9 +139,26 @@ Entities found in a query can be extracted back out as objects using the [`entit
:genre "cyber-punk/action",
:release-year 1995})
```
See the [Query Documentation](https://github.com/threatgrid/asami/wiki/Querying) for more information on querying.
See the [Query Documentation](https://github.com/threatgrid/asami/wiki/6.-Querying) for more information on querying.

Refer to the [Entity Structure documentation](https://github.com/threatgrid/asami/wiki/5.-Entity-Structure) to understand how entities are stored and how to construct queries for them.

### Local Storage
The above code uses an in-memory database, specified with a URL of the form `asami:mem://dbname`. Creating a database on disk is done the same way, but with the URL scheme changed to `asami:local://dbname`. This would create a database in the `dbname` directory. Local databases do not use keywords as entity IDs, as keywords use up memory, and a local database could be gigabytes in size. Instead, these are `InternalNode` objects. These can be created with `asami.graph/new-node`, or by using the readers in `asami.graph`. For instance, if the above code were all done with a local graph instead of a memory graph:
```clojure
=> (d/q '[:find [?m ...] :where [?m :movie/release-year 1995]] db)
(#a/n "3" #a/n "4")

Refer to the [Entity Structure documentation](https://github.com/threatgrid/asami/wiki/Entity-Structure) to understand how entities are stored and how to construct queries for them.
;; get a single entity
=> (require '[asami.graph :as graph])
=> (d/entity db (graph/new-node 4))
#:movie{:title "Toy Story", :genre "animation/adventure/comedy", :release-year 1995}

;; nodes can also be read from a string, with the appropriate reader
=> (set! *data-readers* graph/node-reader)
=> (d/entity db #a/n "4")
#:movie{:title "Toy Story", :genre "animation/adventure/comedy", :release-year 1995}
```

### Updates
The _Open World Assumption_ allows each attribute to be multi-arity. In a _Closed World_ database an object may be loaded to replace those attributes that can only appear once. To do the same thing with Asami, annotate the attributes to be replaced with a quote character at the end of the attribute name.
@@ -144,7 +181,7 @@ Addressing nodes by their internal ID can be cumbersome. They can also be addres
(def sense (get (:tempids @tx) "sense"))
(d/entity (d/db conn) sense)
```
This returns the new movie. The `:db/ident` attribute does not appeaer in the entity:
This returns the new movie. The `:db/ident` attribute does not appear in the entity:
```clojure
#:movie{:title "Sense and Sensibility", :genre "drama/romance", :release-year 1996}
```
@@ -159,11 +196,13 @@ The release year of this movie is incorrectly set to the release in the USA, and
=> (d/entity (d/db conn) sense)
#:movie{:title "Sense and Sensibility", :genre "drama/romance", :release-year 1995}
```
More details are provided in [Entity Updates](https://github.com/threatgrid/asami/wiki/Transactions#entity-updates).
More details are provided in [Entity Updates](https://github.com/threatgrid/asami/wiki/4.-Transactions#entity-updates).

## Analytics
Asami also has some support for graph analytics. These all operate on the _graph_ part of a database value, which can be retrieved with the `asami.core/graph` function.

**NB:** `local` graphs on disk are not yet supported. These will be available soon.

Start by populating a graph with the cast of ["The Flintstones"](https://www.imdb.com/title/tt0053502/). So that we can refer to entities after they have been created, we can provide them with temporary ID values. These are just negative numbers, and can be used elsewhere in the transaction to refer to the same entity. We will also avoid the `:tx-data` wrapper in the transaction:
```clojure
(require '[asami.core :as d])
@@ -261,14 +300,50 @@ If functions are provided to Loom, then they can be used to provide labels for c
(loom-io/view (graph db) :fmt :pdg :alg :sfpd :edge-label edge-label :node-label node-label)
```

## Command Line Tool
A command line tool is available to load data into an Asami graph and query it. This requires [GraalVM CE 21.1.0](https://www.graalvm.org/) or later, and the [`native-image`](https://www.graalvm.org/reference-manual/native-image/#install-native-image) executable.
Leiningen needs to see GraalVM on the classpath first, so if there are problems with building, check to see if this is the case.

## TODO
- Currently implementing durable storage.
- More analytics to come!
To build from sources:

```bash
lein with-profile native uberjar
lein with-profile native native
```

This will create a binary called `asami` in the `target` directory. Execute with the `-?` flag for help:

```
$ ./target/asami -?
Usage: asami URL [-f filename] [-e query] [--help | -?]
-? | --help: This help
URL: the URL of the database to use. Must start with asami:mem://, asami:multi:// or asami:local://
-f filename: loads the filename into the database. A filename of "-" will use stdin.
Data defaults to EDN. Filenames ending in .json are treated as JSON.
-e query: executes a query. "-" (the default) will read from stdin instead of a command line argument.
Multiple queries can be specified as edn (vector of query vectors) or ; separated.
Available EDN readers:
internal nodes - #a/n "node-id"
regex - #a/r "[Tt]his is a (regex|regular expression)"
```

#### Example:
Loading a json file, and querying for keys (attributes) that are strings with spaces in them:

```bash
asami asami:mem://tmp -f data.json -e ':find ?a :where [?e ?a ?v][(string? ?a)][(re-find #a/r " " ?a)]'
```

The command will also work on `local` stores, which means that they can be loaded once and then queried multiple times.

## License

Copyright © 2016-2020 Cisco
Copyright © 2016-2021 Cisco Systems
Copyright © 2015-2022 Paula Gearon

Portions of src/asami/cache.cljc are Copyright © Rich Hickey

Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.
13,713 changes: 13,713 additions & 0 deletions book.txt

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions deps.edn
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
;; NOTE: This file is only used for development. Dependencies here are not Asami dependencies.
{:deps
{org.clojure/clojurescript {:mvn/version "1.10.879"}
org.clojure/clojure {:mvn/version "1.10.3"}
prismatic/schema {:mvn/version "1.1.12"}
org.clojure/core.cache {:mvn/version "1.0.217"}
org.clojars.quoll/zuko {:mvn/version "0.6.5"}
org.clojure/data.priority-map {:mvn/version "1.0.0"}
cheshire/cheshire {:mvn/version "5.10.0"}
tailrecursion/cljs-priority-map {:mvn/version "1.2.1"}
criterium/criterium {:mvn/version "0.4.6"}
time-literals/time-literals {:mvn/version "0.1.5"}
tick/tick {:mvn/version "0.4.24-alpha"}}
}
58 changes: 33 additions & 25 deletions project.clj
Original file line number Diff line number Diff line change
@@ -1,30 +1,38 @@
(defproject org.clojars.quoll/asami "1.2.2"
(defproject org.clojars.quoll/asami "2.2.4"
:description "An in memory graph store for Clojure and ClojureScript"
:url "http://github.com/threatgrid/asami"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.10.1"]
[org.clojure/clojurescript "1.10.773"]
:dependencies [[org.clojure/clojure "1.10.3"]
[prismatic/schema "1.1.12"]
[org.clojure/core.cache "0.8.2"]
[org.clojars.quoll/zuko "0.2.4"]]
:plugins [[lein-cljsbuild "1.1.7"]
[cider/cider-nrepl "0.24.0"]]
:cljsbuild {
:builds {
:dev
{:source-paths ["src"]
:compiler {
:output-to "out/asami/core.js"
:optimizations :simple
:pretty-print true}}
:test
{:source-paths ["src" "test"]
:compiler {
:output-to "out/asami/test_memory.js"
:optimizations :simple
:pretty-print true}}
}
:test-commands {
"unit" ["node" "out/asami/test_memory.js"]}
})
[org.clojure/core.cache "1.0.217"]
[org.clojars.quoll/zuko "0.6.5"]
[org.clojars.quoll/qtest "0.1.1"]
[org.clojure/data.priority-map "1.0.0"]
[tailrecursion/cljs-priority-map "1.2.1"]]
:plugins [[lein-cljsbuild "1.1.8"]]
:profiles {:dev {:dependencies [[org.clojure/clojurescript "1.11.4"]]}
:uberjar {:aot [asami.peer]}
:native {:plugins [[lein-shell "0.5.0"]]
:source-paths ["src" "src-native"]
:aot :all
:main asami.main
:dependencies [[cheshire "5.10.0"]]
:aliases {"native" ["shell"
"native-image" "--report-unsupported-elements-at-runtime"
"--initialize-at-build-time" "--no-server"
"-jar" "./target/${:uberjar-name:-${:name}-${:version}-standalone.jar}"
"-H:Name=./target/${:name}"
"-H:TraceClassInitialization=\"java.io.FilePermission\""]}}
:test-native {:source-paths ["src" "src-native" "test-native"]
:test-paths ["test-native"]
:dependencies [[cheshire "5.10.0"]]}}
:cljsbuild {:builds {:dev {:source-paths ["src"]
:compiler {:output-to "out/asami/core.js"
:optimizations :simple
:pretty-print true}}
:test {:source-paths ["src" "test"]
:compiler {:output-to "out/asami/test_memory.js"
:optimizations :simple
:pretty-print true}}}
:test-commands {"unit" ["node" "out/asami/test_memory.js"]}})
10,662 changes: 5,331 additions & 5,331 deletions resources/test/data.edn

Large diffs are not rendered by default.

190 changes: 190 additions & 0 deletions src-native/asami/main.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
(ns ^{:doc "Entry point for CLI"
:author "Paula Gearon"}
asami.main
(:require [asami.core :as asami]
[asami.graph :as graph]
[cheshire.core :as json]
[clojure.edn :as edn]
[clojure.java.io :as io]
[clojure.pprint :refer [pprint]]
[clojure.string :as s])
(:import [java.io PushbackReader]
[java.nio CharBuffer])
(:gen-class))

(set! *warn-on-reflection* true)

(def eof
(Object.))

(defn eof? [x]
(identical? x eof))

(def reader-opts
{:eof eof
:readers (assoc graph/node-reader 'a/r re-pattern)})

(defn process-args
[args]
(loop [result {:interactive? false
:query "-"}
[a & [arg & rargs :as rem] :as args] args]
(if-not (seq args)
result
(let [[r more] (cond
(#{"-e" "-q"} a) [(assoc result :query arg) rargs]
(= a "-f") [(assoc result :file arg) rargs]
(#{"-?" "--help"} a) [(assoc result :help true) rem]
(= a "--interactive") [(assoc result :interactive? true) rem]
(s/starts-with? a "asami:") [(assoc result :url a) rem])]
(recur r more)))))

(defn read-data-file
[f]
(if (s/ends-with? f ".json")
(json/parse-string (slurp f))
(let [text (slurp (if (= f "-") *in* f))]
(edn/read-string reader-opts text))))

(defn load-data-file
[conn f]
(:db-after @(asami/transact conn {:tx-data (read-data-file f)})))

(defn derive-database [{:keys [file url]}]
(let [conn (asami/connect url)]
(if file
(load-data-file conn file)
(asami/db conn))))

(gen-class
:name "asami.PBR"
:extends java.io.PushbackReader
:prefix "pbr-"
:init init
:state qstate
:constructors {[java.io.Reader clojure.lang.IDeref] [java.io.Reader]}
:exposes-methods {read readSuper})

(defn pbr-init
[reader query-acc]
[[reader] {:dbl-newline (atom false)
:query-acc query-acc}])

(defn pbr-read
"Returns characters from the reader for an edn parser.
When queries are not being accumulated, then pass through"
[^asami.PBR this]
(let [{:keys [dbl-newline query-acc]} (.qstate this)]
(if-not @query-acc
(.readSuper this) ;; not accumulating. Pass through.

;; check if this is returning 2 characters after a newline was pressed.
(if @dbl-newline
(do ;; The first character was already returned.
(reset! dbl-newline false)
(int \newline))
(let [c (.readSuper this)]
(case c ;; intercept \; and \newline characters
59 0 ;; \; character
10 (do ;; \newline character
(swap! dbl-newline not) ;; remember that a second character will be needed
0) ;; This is the first of 2 characters returned
c)))))) ;; pass through by default

(defn derive-input
"Convert command line options into the input stream required"
[{:keys [interactive? query]}]
(if (or (= query "-") interactive?)
*in*
(.getBytes ^String query)))

(defn separator?
"Check for a magic 'empty' symbol that is used to indicate a query separator"
[s]
(= (symbol "\0") s))

(defn repl [input db prompt]
;; function to print a prompt after every newline
(let [prompt-fn (if (some? prompt)
(fn [] (print prompt) (flush))
(constantly nil))
;; executes a parsed edn query and prints the result
execute (fn [query]
(try
(pprint (asami/q query db))
(catch Exception e
(printf "Error executing query %s: %s\n" (pr-str query) (ex-message e)))))
;; maintain state to accumulate queries not wrapped in an edn structure
query-acc (atom nil)
;; create the input reader, providing the query accumulation state
stream (asami.PBR. (io/reader input) query-acc)]
(loop []
;; prompt when necessary
(when-not @query-acc (prompt-fn))
;; get the next
(let [query (try (edn/read reader-opts stream) (catch Exception e e))]
(cond
(instance? Exception query)
(do (printf "Error: %s\n" (ex-message query))
(println "Type: " (type query))
(.printStackTrace ^Exception query)
nil)

(eof? query)
(when-let [q @query-acc]
(execute q)
nil)

(separator? query)
(do (execute @query-acc)
(reset! query-acc nil)
(recur))

:else
(do
(if @query-acc ;; check if a query is being accumulated
(swap! query-acc conj query) ;; add to the accumulated query

;; else, a complete query structure
(if (or (sequential? query) (map? query)) ;; check if this a complete query
(execute query)
(reset! query-acc [query]))) ;; otherwise, start accumulating a new query
(recur)))))))

(defn print-usage
[]
(println "Usage: asami URL [-f filename] [-e query] [--help | -?] [--interactive]\n")
(println)
(println "URL: The URL of the database to use. Must start with asami:mem://, asami:multi:// or asami:local://")
(println)
(println "Options:")
(println " -?, --help This help")
(println)
(println " -f FILENAME Loads the file FILENAME into the database. A FILENAME of \"-\" will use STDIN.")
(println " Data defaults to EDN. A FILENAME ending in .json is treated as JSON.")
(println " -e QUERIES Executes queries in the string QUERIES. QUERIES are specified as EDN and, thus, multiple queries may be separated with the usual EDN whitespace characters.")
(println " When this option is not provided, queries will read from STDIN instead of a command line argument.")
(println)
(println "Available EDN readers:")
(println " internal nodes - #a/n \"node-id\"")
(println " regex - #a/r \"[Tt]his is a (regex|regular expression)\""))

(defn -main
[& args]
(let [{:keys [help interactive? url] :as options} (process-args args)]
(when help
(print-usage)
(System/exit 0))

(when-not url
(println "Database URL must be specified")
(System/exit 1))

(let [db (derive-database options)
input (derive-input options)
prompt (when interactive?
(println "Unwrapped queries may be separated with ;")
"?- ")]
(repl input db prompt))

(System/exit 0)))
7 changes: 4 additions & 3 deletions src/asami/analytics.cljc
Original file line number Diff line number Diff line change
@@ -52,8 +52,9 @@
"Returns all subgraphs for a given graph"
[graph :- GraphType]
(letfn [(to-graph [entities]
(let [all-edges (resolve-triple graph '?s '?p '?o)
edges (filter (comp entities first) all-edges)]
(reduce (partial apply graph-add) (new-graph graph) edges)))]
(let [tx 0
edges (->> (resolve-triple graph '?s '?p '?o)
(filter (comp entities first)))]
(reduce (fn [g [s p o]] (graph-add g s p o tx)) (new-graph graph) edges)))]
(let [groups (subgraph-entities graph)]
(map to-graph groups))))
247 changes: 247 additions & 0 deletions src/asami/cache.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
; Copyright (c) Rich Hickey. All rights reserved.
; Copyright (c) Cisco Systems. All rights reserved.
; The use and distribution terms for this software are covered by the
; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
; which can be found in the file epl-v10.html at the root of this distribution.
; By using this software in any fashion, you are agreeing to be bound by
; the terms of this license.
; You must not remove this notice, or any other, from this software.

; Portions of this file have been copied from
; https://github.com/clojure/core.cache/blob/master/src/main/clojure/clojure/core/cache.clj

(ns ^{:doc "Duplicates LRU Cache functionality from clojure.core.cache.
Duplicated because that cache is not available for ClojureScript
and the namespace includes JVM specific functionality."
:author "Paula Gearon"}
asami.cache
(:require
#?(:clj [clojure.data.priority-map :as priority-map]
:cljs [tailrecursion.priority-map :as priority-map])))

;; (set! *warn-on-reflection* true)

(defprotocol CacheProtocol
"This is the protocol describing the basic cache capability."
(lookup [cache e]
[cache e not-found]
"Retrieve the value associated with `e` if it exists, else `nil` in
the 2-arg case. Retrieve the value associated with `e` if it exists,
else `not-found` in the 3-arg case.")
(has? [cache e]
"Checks if the cache contains a value associated with `e`")
(hit [cache e]
"Is meant to be called if the cache is determined to contain a value associated with `e`")
(miss [cache e ret]
"Is meant to be called if the cache is determined to **not** contain a value associated with `e`")
(evict [cache e]
"Removes an entry from the cache")
(seed [cache base]
"Is used to signal that the cache should be created with a seed. The contract is that said cache
should return an instance of its own type."))

(defn- build-leastness-queue
[base limit start-at]
(into (priority-map/priority-map)
(concat (take (- limit (count base)) (for [k (range (- limit) 0)] [k k]))
(for [[k _] base] [k start-at]))))

#?(:clj
(defmacro defcache
"Defines common functionality across cache types. Not needed for this single LRU cache,
but this macro is staying in case other caches are also ported.
A similar macro can be used in ClojureScript, but the macro needs to be in the same
namespace at the CacheProtocol, and while this works it also causes warnings when the
:clj protocol is parsed during compilation."
[type-name fields & specifics]
(let [[base & _] fields
base-field (with-meta base {:tag 'clojure.lang.IPersistentMap})]
`(deftype ~type-name [~@fields]
~@specifics

clojure.lang.ILookup
(valAt [this# key#]
(lookup this# key#))
(valAt [this# key# not-found#]
(if (has? this# key#)
(lookup this# key#)
not-found#))

java.lang.Iterable
(iterator [_#]
(.iterator ~base-field))

clojure.lang.IPersistentMap
(assoc [this# k# v#]
(miss this# k# v#))
(without [this# k#]
(evict this# k#))

clojure.lang.Associative
(containsKey [this# k#]
(has? this# k#))
(entryAt [this# k#]
(when (has? this# k#)
(clojure.lang.MapEntry. k# (lookup this# k#))))

clojure.lang.Counted
(count [this#]
(count ~base-field))

clojure.lang.IPersistentCollection
(cons [this# elem#]
(seed this# (conj ~base-field elem#)))
(empty [this#]
(seed this# (empty ~base-field)))
(equiv [this# other#]
(= other# ~base-field))

clojure.lang.Seqable
(seq [_#]
(seq ~base-field))))))

#?(:clj
(defcache LRUCache [cache lru tick limit]
CacheProtocol
(lookup [_ item]
(get cache item))
(lookup [_ item not-found]
(get cache item not-found))
(has? [_ item]
(contains? cache item))
(hit [_ item]
(let [tick+ (inc tick)]
(LRUCache. cache
(if (contains? cache item)
(assoc lru item tick+)
lru)
tick+
limit)))
(miss [_ item result]
(let [tick+ (inc tick)]
(if (>= (count lru) limit)
(let [k (if (contains? lru item)
item
(first (peek lru))) ;; minimum-key, maybe evict case
c (-> cache (dissoc k) (assoc item result))
l (-> lru (dissoc k) (assoc item tick+))]
(LRUCache. c l tick+ limit))
(LRUCache. (assoc cache item result) ;; no change case
(assoc lru item tick+)
tick+
limit))))
(evict [this key]
(if (contains? cache key)
(LRUCache. (dissoc cache key)
(dissoc lru key)
(inc tick)
limit)
this))
(seed [_ base]
(LRUCache. base
(build-leastness-queue base limit 0)
0
limit))
Object
(toString [_]
(str cache \, \space lru \, \space tick \, \space limit))))


#?(:cljs
(deftype LRUCache [cache lru tick limit]
CacheProtocol
(lookup [this key]
(get cache key))
(lookup [this key not-found]
(get cache key not-found))

(has? [_ item]
(println "Cache protocol has?")
(contains? cache item))
(hit [_ item]
(let [tick+ (inc tick)]
(LRUCache. cache
(if (contains? cache item)
(assoc lru item tick+)
lru)
tick+
limit)))
(miss [_ item result]
(let [tick+ (inc tick)]
(if (>= (count lru) limit)
(let [k (if (contains? lru item)
item
(first (peek lru))) ;; minimum-key, maybe evict case
c (-> cache (dissoc k) (assoc item result))
l (-> lru (dissoc k) (assoc item tick+))]
(LRUCache. c l tick+ limit))
(LRUCache. (assoc cache item result) ;; no change case
(assoc lru item tick+)
tick+
limit))))
(evict [this key]
(if (contains? cache key)
(LRUCache. (dissoc cache key)
(dissoc lru key)
(inc tick)
limit)
this))
(seed [_ base]
(LRUCache. base
(build-leastness-queue base limit 0)
0
limit))
Object
(toString [_]
(str cache \, \space lru \, \space tick \, \space limit))

ILookup
(-lookup [this key]
(get cache key))
(-lookup [this key not-found]
(println "ILookup has?")
(if (has? this key)
(get cache key)
not-found))

IIterable
(-iterator [_]
(-iterator cache))

IAssociative
(-assoc [this k v]
(miss this k v))
(-contains-key? [this k]
(println "IAssociative has?")
(has? this k))

IMap
(-dissoc [this k]
(evict this k))

ICounted
(-count [this]
(count cache))

IEmptyableCollection
(-empty [this]
(seed this (empty cache)))

IEquiv
(-equiv [this other]
(= other cache))

ISeqable
(-seq [_]
(seq cache))))


(defn lru-cache-factory
"Returns an LRU cache with the cache and usage-table initialied to `base` --
each entry is initialized with the same usage value.
This function takes an optional `:threshold` argument that defines the maximum number
of elements in the cache before the LRU semantics apply (default is 32)."
[base & {threshold :threshold :or {threshold 32}}]
{:pre [(number? threshold) (< 0 threshold)
(map? base)]}
(seed (LRUCache. {} (priority-map/priority-map) 0 threshold) base))
280 changes: 170 additions & 110 deletions src/asami/common_index.cljc

Large diffs are not rendered by default.

426 changes: 245 additions & 181 deletions src/asami/core.cljc

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions src/asami/durable/block/block_api.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
(ns ^{:doc "Defines the protocols for allocating an manipulating blocks"
:author "Paula Gearon"}
asami.durable.block.block-api)

(defprotocol Block
"An abstraction over a block of raw binary data of fixed length"
(get-id [this] "Returns the ID of the block.")
(get-byte [this offset] "Returns the byte at a given offset within the block.")
(get-int [this offset] "Returns the integer at a given offset within the block. Offset is in Integers.")
(get-long [this offset] "Returns the long at a given offset within the block. Offset is in Longs.")
(get-bytes [this offset len] "Returns the bytes at a given offset within the block.")
(get-ints [this offset len] "Returns the ints at a given offset within the block. Offset is in Integers.")
(get-longs [this offset len] "Returns the longs at a given offset within the block. Offset is in Longs.")
(put-byte! [this offset value] "Modifies the byte at a given offset within the block.")
(put-int! [this offset value] "Modifies the integer at a given offset within the block. Offset is in Integers.")
(put-long! [this offset value] "Modifies the long at a given offset within the block. Offset is in Longs.")
(put-bytes! [this offset len values] "Modifies the bytes at a given offset within the block.")
(put-ints! [this offset len values] "Modifies the ints at a given offset within the block. Offset is in Integers.")
(put-longs! [this offset len values] "Modifies the longs at a given offset within the block. Offset is in Longs.")
(put-block!
[this offset src]
[this offset src src-offset length] "Copies the contents of one block into this block.")
(copy-over! [this src src-offset] "Replace the contents of this block with another (starting at an offset on the source)."))

(defprotocol BlockManager
"A mutating object for allocating blocks"
(allocate-block! [this] "Allocate a new block from the manager's resources.")
(copy-block! [this block] "Allocates a new block, initialized with a copy of another block.")
(write-block [this block] "Writes a block into the managed resources. Flushing is not expected.")
(get-block [this id] "Returns the block associated with an ID.")
(get-block-size [this] "Returns the size of blocks allocated by this manager")
(copy-to-tx [this block] "Returns a block that is in the current transaction, possibly returning the current block"))

(defprotocol CountedBlocks
(get-block-count [this] "Returns the number of blocks that this object has allocated, or nil if not managed by this object."))
163 changes: 163 additions & 0 deletions src/asami/durable/block/bufferblock.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
(ns ^{:doc "Abstraction for blocks of raw data, keyed by ID. IDs represent the offset of the block."
:author "Paula Gearon"}
asami.durable.block.bufferblock
(:require [asami.durable.block.block-api :refer [Block put-block!]])
(:import [java.nio ByteBuffer IntBuffer LongBuffer]))

;; (set! *warn-on-reflection* true)

;; An implementation of Block that can have multiple readers,
;; but only a single writing thread
(defrecord BufferBlock
[id
^ByteBuffer bb ^IntBuffer ib ^LongBuffer lb
^ByteBuffer ro
size
byte-offset int-offset long-offset]

Block
(get-id [this] id)

(get-byte [this offset]
(.get ^ByteBuffer bb (int (+ byte-offset offset))))

(get-int [this offset]
(.get ^IntBuffer ib (int (+ int-offset offset))))

(get-long [this offset]
(.get ^LongBuffer lb (int (+ long-offset offset))))

(get-bytes [this offset len]
(let [^ByteBuffer tbb (.duplicate bb)
start (+ byte-offset offset)
arr (byte-array len)]
(doto tbb
(.position (int start))
(.limit (int (+ start len)))
(.get arr))
arr))

(get-ints [this offset len]
(let [^IntBuffer tib (.duplicate ib)
start (+ int-offset offset)
arr (int-array len)]
(doto tib
(.position (int start))
(.limit (int (+ start len)))
(.get arr))
arr))

(get-longs [this offset len]
(let [^LongBuffer tlb (.duplicate lb)
start (+ long-offset offset)
arr (long-array len)]
(doto tlb
(.position (int start))
(.limit (int (+ start len)))
(.get arr))
arr))

(put-byte! [this offset v]
(.put ^ByteBuffer bb (int (+ byte-offset offset)) (byte v))
this)

(put-int! [this offset v]
(.put ^IntBuffer ib (int (+ int-offset offset)) (int v))
this)

(put-long! [this offset v]
(.put ^LongBuffer lb (int (+ long-offset offset)) ^long v)
this)

;; a single writer allows for position/put

(put-bytes! [this offset len the-bytes]
(doto ^ByteBuffer bb
(.position (int (+ byte-offset offset)))
(.put ^bytes the-bytes (int 0) (int len)))
this)

(put-ints! [this offset len the-ints]
(doto ^IntBuffer ib
(.position (int (+ int-offset offset)))
(.put ^ints the-ints (int 0) (int len)))
this)

(put-longs! [this offset len the-longs]
(doto ^LongBuffer lb
(.position (int (+ long-offset offset)))
(.put ^longs the-longs (int 0) (int len)))
this)

(put-block!
[this offset {sbb :bb sbyte-offset :byte-offset :as src} src-offset length]
(let [p (+ sbyte-offset src-offset)
rsbb (.asReadOnlyBuffer ^ByteBuffer sbb)]
(doto rsbb
(.position (int p))
(.limit (int (+ p length))))
(doto ^ByteBuffer (.duplicate ^ByteBuffer bb)
(.position (int (+ byte-offset offset)))
(.put rsbb)))
this)

(put-block!
[this offset src]
(put-block! this offset src 0 (:size src)))

(copy-over!
[dest src offset]
(put-block! dest 0 src offset size)))


(defn- new-block
"Internal implementation for creating a BufferBlock using a set of buffers.
If lb is nil, then ib must also be nil"
[id ^ByteBuffer bb ib lb ro size byte-offset]
(assert (or (and ib lb) (not (or ib lb))) "int and long buffers must be provided or excluded together")
(let [ib (or ib (-> bb .rewind .asIntBuffer))
lb (or lb (-> bb .asLongBuffer))
ro (or ro (.asReadOnlyBuffer bb))
int-offset (bit-shift-right byte-offset 2)
long-offset (bit-shift-right byte-offset 3)]
(->BufferBlock id bb ib lb ro size byte-offset int-offset long-offset)))

(defn ^BufferBlock create-block
"Wraps provided buffers as a block"
([id size byte-offset byte-buffer ro-byte-buffer int-buffer long-buffer]
(new-block id byte-buffer int-buffer long-buffer ro-byte-buffer size byte-offset))
([id size byte-offset byte-buffer]
(new-block id byte-buffer nil nil nil size byte-offset)))


;; The following functions are ByteBuffer specfic,
;; and are not available on the general Block API

(defn ^ByteBuffer get-source-buffer
"Returns a read-only ByteBuffer for the block"
([^BufferBlock {:keys [ro bb]}] (or ro (.asReadOnlyBuffer ^ByteBuffer bb)))
([^BufferBlock b offset length]
(let [start (+ (:byte-offset b) offset)]
(doto ^ByteBuffer (get-source-buffer b)
(.limit (int (+ start length)))
(.position (int start))))))


(defn ^ByteBuffer copy-to-buffer! [^BufferBlock b ^ByteBuffer buffer offset]
"Copies the contents of a ByteBuffer into the block."
(let [pos (+ (:byte-offset b) offset)]
(.put buffer ^ByteBuffer (doto (.asReadOnlyBuffer ^ByteBuffer (:bb b))
(.position (int pos))
(.limit (int (+ pos (.remaining buffer))))))
buffer))

(defn ^ByteBuffer slice [^BufferBlock b offset size]
"Returns a portion of a block as a ByteBuffer"
(let [pos (+ (:byte-offset b) offset)]
(.slice (doto (.asReadOnlyBuffer ^ByteBuffer (:bb b))
(.position (int pos))
(.limit (int (+ pos size)))))))

(defn ^BufferBlock put-buffer! [^BufferBlock b offset ^ByteBuffer buffer]
(doto ^ByteBuffer (:bb b) (.position (int (+ (:byte-offset b) offset))) (.put buffer))
b)
291 changes: 291 additions & 0 deletions src/asami/durable/block/file/block_file.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,291 @@
(ns ^{:doc "A mapped file implementation of the Block abstraction"
:author "Paula Gearon"}
asami.durable.block.file.block-file
(:require [clojure.java.io :as io]
[asami.durable.common :refer [Transaction Closeable Forceable rewind! commit! close]]
[asami.durable.block.block-api :refer [CountedBlocks BlockManager copy-over! copy-block! allocate-block! get-id get-block-count]]
[asami.durable.block.bufferblock :refer [create-block]]
[asami.durable.block.file.voodoo :as voodoo]
[asami.cache :refer [lookup hit miss lru-cache-factory]])
(:import [java.io RandomAccessFile File]
[java.nio ByteBuffer IntBuffer LongBuffer MappedByteBuffer]
[java.nio.channels FileChannel FileChannel$MapMode]
[java.lang.ref SoftReference]))

;; (set! *warn-on-reflection* true)

(def region-size (* 8 1024 1024))

(def cache-size 1024)

(def retries 3)

(def ^:const null 0)

;; Each mapping is called a region, and will contain multiple blocks.
;; Blocks are expected to evenly divide into a region, though slack
;; space at the end of a region is permissible. The slack space will
;; be (mod region-size block-size).
;; - nr-blocks is the total number of blocks in a file
;; - block-size is the number of bytes in a block
;; - nr-mapped-regions is a cached value for the count of mapped-byte-buffers.
;; - mapped-byte-buffers is a seq of all regions.
;; - stride is the size of a region
;; - file is the File being mapped
;; - raf is the RandomAccessFile for the file
;; - fc is the FileChannel of the raf

(defrecord BlockFile [nr-blocks
block-size nr-mapped-regions
mapped-byte-buffers stride
file raf fc])

(declare set-nr-blocks!)

(defn open-block-file
"Opens a file for storing blocks. Returns a structure with the block file
and the RandomAccessFile that the block file uses. The file will need to be
closed when block files based on this initial block file are no longer needed.
When the init-nr-blocks is not nil, then it holds the recorded number of blocks
in the file."
[file block-size init-nr-blocks]
(let [file (io/file file)
raf (RandomAccessFile. file "rw")
^FileChannel fc (.getChannel raf)
nr-blocks (or init-nr-blocks (long (/ (.size fc) block-size)))
slack (mod region-size block-size)
stride (if (zero? slack) region-size (+ region-size (- block-size slack)))]
(set-nr-blocks! (->BlockFile 0 block-size 0 [] stride file raf fc) nr-blocks)))

(defn- system-cleanup
"Prompt the system to clean up outstanding objects, thereby releasing unique resources
for re-use. This is required for MappedByteBuffers as the Java NIO cannot release the
resources explicitly without putting a guard on every access (thereby compromising the
speed advantages of memory mapping) or allowing continuing access to memory that is
no longer accessible. Therefore, the resources must be released implicitly (by setting
all references null) and then calling this code to prompt the system to clean the
resources up. Depending on the host OS, this method may need to be called several times.
Linux typically only requires 1 or 2 invocations, while Windows regularly needs more than
2 and can require >6"
[]
(System/gc)
(try (Thread/sleep 100) (catch InterruptedException _))
(System/runFinalization))

(defn- retry-loop
"Retries a thunk, using a countdown and a cleanup thunk."
[action cleanup retries]
(loop [r retries]
(let [[response ex] (try [(action) nil] (catch Exception e [nil e]))]
(or response
(if (zero? r)
(throw ex)
(do
(cleanup)
(recur (dec r))))))))

(defn- file-size
"Gets the size of a block-file. Returns a size."
[{fc :fc}]
(.size ^FileChannel fc))

(defn- set-length!
"Sets the length of a block-file.
Returns the open block-file."
[{raf :raf :as block-file} ^long len]
(.setLength ^RandomAccessFile raf len)
block-file)

(defn- map-buffer
"Maps a buffer in a block-file. Returns a new block-file."
[{:keys [fc stride] :as block-file} region-nr]
(retry-loop
(fn []
(let [mbb (.map ^FileChannel fc FileChannel$MapMode/READ_WRITE (* region-nr stride) stride)]
(-> block-file
(update-in [:mapped-byte-buffers] conj mbb)
(assoc :nr-mapped-regions (inc region-nr)))))
system-cleanup
retries))

(defn map-file!
"Expands a block-file to one that is mapped to the required number of regions.
Returns a new block-file with the required mappings."
[{:keys [nr-mapped-regions stride mapped-byte-buffers] :as block-file} regions]
(let [mapped-size (if (> nr-mapped-regions 0) (+ (* (dec nr-mapped-regions) stride) stride) 0)
current-file-size (file-size block-file)
new-file-size (+ (* (dec regions) stride) stride)
_ (when (< current-file-size mapped-size)
(throw (ex-info (str "File has shrunk: " (:file block-file)))))
block-file (if (> current-file-size new-file-size)
(set-length! block-file new-file-size)
block-file)]

(loop [bf block-file region-nr nr-mapped-regions]
(if (>= region-nr regions)
bf
(recur (map-buffer bf region-nr) (inc region-nr))))))

(defn set-nr-blocks!
"Updates the number of blocks mapped in a block file. Returns the new block-file."
[{:keys [nr-blocks block-size nr-mapped-regions stride] :as block-file} new-nr]
(if (= new-nr nr-blocks)
block-file
(let [block-file (assoc block-file :nr-blocks new-nr)]
(if (< new-nr nr-blocks)
block-file
(let [regions (if (<= new-nr 0) 0 (inc (/ (* (dec new-nr) block-size) stride)))]
(if (> regions nr-mapped-regions)
(map-file! block-file regions)
block-file))))))

(defn get-nr-blocks
"Returns the number of blocks"
[{:keys [nr-blocks]}]
nr-blocks)

(defn force-file
"Ensures all cached data is written to disk. This returns synchronously after all data is written."
[{:keys [mapped-byte-buffers] :as block-file}]
(doseq [^MappedByteBuffer b mapped-byte-buffers] (.force b))
block-file)

(defn block-for
"Returns the byte buffer that references the given block."
[{:keys [nr-blocks block-size stride mapped-byte-buffers] :as block-file} block-id]
(when (< block-id 0) (throw (ex-info "Bad block ID" {:id block-id})))
(when (>= block-id nr-blocks)
(throw (ex-info "Block ID out of range" {:id block-id :max-id (dec nr-blocks)})))
(let [file-offset (* block-id block-size)
region-nr (int (/ file-offset stride))
offset (mod file-offset stride)]
(create-block block-id block-size offset (nth mapped-byte-buffers region-nr))))

(defn copy-block
"Allocates a new block with a copy of the original block."
[{:keys [mapped-byte-buffers block-size stride] :as block-file} {:keys [byte-offset ro] :as block} new-block-id]
(let [new-file-offset (* new-block-id block-size)
new-region-nr (int (/ new-file-offset stride))
new-byte-offset (mod new-file-offset stride)
^ByteBuffer new-buffer (nth mapped-byte-buffers new-region-nr)]
(.limit ^ByteBuffer ro (int (+ byte-offset block-size)))
(.position ^ByteBuffer ro (int byte-offset))
(.position new-buffer (int new-byte-offset))
(.put new-buffer ^ByteBuffer ro)
(create-block block-size new-byte-offset new-buffer)))

(defn unmap
"Throw away mappings. This is dangerous, as it invalidates all instances.
Only to be used when closing the file for good."
[{:keys [mapped-byte-buffers block-size nr-blocks raf] :as block-file}]
(set-length! block-file (* block-size nr-blocks))
(voodoo/release mapped-byte-buffers)
(.close ^RandomAccessFile raf))

(defn clear!
[{:keys [block-size stride mapped-byte-buffers file raf fc] :as block-file}]
(voodoo/release mapped-byte-buffers)
(set-length! block-file 0)
(->BlockFile 0 block-size 0 [] stride file raf fc))

(def LN2 (Math/log 2))

(defn log2 [x] (max 0 (/ (Math/log x) LN2)))

(defn pow2
"Raise 2 to the power of x, with a floor value of 1."
[x]
(if (<= x 0) 1 (bit-shift-left 1 x)))

(def power-increment
"Defines how many bits behind the region magnitude to increment the number of regions by.
4 bits behind means that it starts at incrementing by 1, until size 32. Then 2 until 64.
Then 4 until 128, and so on."
4)

(defn next-size-increment
"Determine the next number of blocks that the file should move up to.
The size increment of the file increases as the size of the file increases"
[{:keys [nr-blocks block-size stride] :as block-file}]
(let [blocks-per-region (long (/ stride block-size))
full-regions (long (/ nr-blocks blocks-per-region))
new-regions (pow2 (- (long (log2 full-regions)) power-increment))]
(* blocks-per-region (+ full-regions new-regions))))


(defrecord ManagedBlockFile [state]
BlockManager
(allocate-block! [this]
(let [{block-id :next-id} (vswap! state update :next-id inc)]
(when (>= block-id (:nr-blocks (:block-file @state)))
(vswap! state update :block-file #(set-nr-blocks! % (next-size-increment %))))
(block-for (:block-file @state) block-id)))

(copy-block! [this block]
(let [new-block (allocate-block! this)]
(copy-over! new-block block 0)))

;; this operation is a no-op
(write-block [this block] this)

(get-block [this id]
(let [s (deref state)]
(if (and (= null id) (= (:next-id s) -1)) ;; asking for the null block on an empty file
(allocate-block! this)
(let [^SoftReference block-ref (lookup (:block-cache s) id)]
(if-let [block (and block-ref
(if-let [b (.get block-ref)]
(do
(vswap! state update :block-cache hit id)
b)))]
block
(let [block (block-for (:block-file s) id)]
(vswap! state update :block-cache miss id (SoftReference. block))
block))))))

(get-block-size [this]
(:block-size (:block-file @state)))

(copy-to-tx [this block]
(if (<= (get-id block) (:commit-point @state))
(copy-block! this block)
block))

CountedBlocks
(get-block-count [this]
(get-nr-blocks (:block-file @state)))

Transaction
(rewind! [this]
(vswap! state #(assoc % :next-id (:commit-point %)))
this)

(commit! [this]
(vswap! state #(assoc % :commit-point (:next-id %)))
(force-file (:block-file @state))
this)

Forceable
(force! [this]
(force-file (:block-file @state)))

Closeable
(close [this]
(let [{:keys [block-file next-id]} @state]
(force-file block-file)
(unmap (assoc block-file :nr-blocks (inc next-id)))))

(delete! [this]
(let [{{file :file} :block-file} @state]
(.delete ^File file))))

(defn create-managed-block-file
[filename block-size nr-blocks]
(let [block-file (open-block-file filename block-size nr-blocks)
next-id (dec (:nr-blocks block-file))]
(when (and nr-blocks (= next-id nr-blocks))
(throw (ex-info "Inconsistent reopening of block file" {:set-blocks nr-blocks :file-blocks (:nr-blocks block-file)})))
(->ManagedBlockFile (volatile! {:block-file block-file
:next-id next-id
:commit-point next-id
:block-cache (lru-cache-factory {} :threshold cache-size)}))))
13 changes: 13 additions & 0 deletions src/asami/durable/block/file/util.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
(ns ^{:doc "Utilities for file access"
:author "Paula Gearon"}
asami.durable.block.file.util
(:import [java.io File]))

(defn temp-dir
"Gets the temporary directory, or the current directory if none is found."
[] (System/getProperty "java.io.tmpdir" "."))

(defn temp-file
"Returns a File refering to a temporary path, that has not been created."
[nm] (File. (temp-dir) nm))

23 changes: 23 additions & 0 deletions src/asami/durable/block/file/voodoo.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
(ns ^{:doc "Utilities to help clean up memory usage of mapped files on MS Windows"
:author "Paula Gearon"}
asami.durable.block.file.voodoo
(:require [clojure.string :as s])
(:import [java.security AccessController PrivilegedAction]))

(def windows? (s/includes? (s/lower-case (System/getProperty "os.name" "")) "win"))

(defn clean [obj]
(when obj
(AccessController/doPrivileged
(proxy [PrivilegedAction] []
(run [_]
(try
(let [get-cleaner-method (.getMethod (class obj) "cleaner" (make-array Class 0))
_ (.setAccessible get-cleaner-method true)
cleaner (.invoke get-cleaner-method obj (make-array Object 0))]
(.clean cleaner))
(catch Exception e (println "non-fatal buffer cleanup error"))))))))

(defn release [mapped]
(when windows?
(doseq [b mapped] (clean b))))
72 changes: 72 additions & 0 deletions src/asami/durable/codec.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
(ns ^{:doc "Common encoding and decoding values"
:author "Paula Gearon"}
asami.durable.codec
#?(:clj (:import [java.nio.charset Charset])))


(def ^:const byte-mask 0xFF)
(def ^:const data-mask 0x0FFFFFFFFFFFFFFF)
(def ^:const sbytes-shift 48)
(def ^:const len-nybble-shift 56)

(def utf8 #?(:clj (Charset/forName "UTF-8")
:cljs "UTF-8"))


;; Encapsualted IDs are IDs containing all of the information without requiring additional storage
;; The data type is contained in the top 4 bits. The remaining 60 bit hold the data:
;; Top 4 bits:
;; 1 0 0 0: long
;; 1 1 0 0: Date
;; 1 0 1 0: Instant
;; 1 1 1 0: Short String
;; 1 0 0 1: Short Keyword
;; 1 1 0 1: Internal Node - asami.graph/InternalNode
;; 1 0 1 1: boolean - This leaves a 58 bit space for something else

(def ^:const long-type-code 0x8)
(def ^:const date-type-code 0xC)
(def ^:const inst-type-code 0xA)
(def ^:const sstr-type-code 0xE)
(def ^:const skey-type-code 0x9)
(def ^:const node-type-code 0xD)
(def ^:const bool-type-code 0xB)

(def ^:const long-type-mask (bit-shift-left long-type-code 60))
(def ^:const date-type-mask (bit-shift-left date-type-code 60))
(def ^:const inst-type-mask (bit-shift-left inst-type-code 60))
(def ^:const sstr-type-mask (bit-shift-left sstr-type-code 60))
(def ^:const skey-type-mask (bit-shift-left skey-type-code 60))
(def ^:const node-type-mask (bit-shift-left node-type-code 60))
(def ^:const bool-type-mask (bit-shift-left bool-type-code 60))


(def ^:const boolean-false-bits bool-type-mask)
(def ^:const boolean-true-bits (bit-or bool-type-mask (bit-shift-left 0x8 56)))

;; Header/Body description
;; Header tries to use as many bits for length data as possible. This cuts into the bit available for type data.
;; Byte 0
;; 0xxxxxxx String type, length of up to 127.
;; 10xxxxxx URI type, length of up to 64
;; 1100xxxx Keyword type, length of up to 16
;; For these 3 types, all remaining bytes are the data body.
;; 1101xxxx Long value. xxxx encodes the number of bytes
;; 111ytttt Data is of type described in tttt.
;; Length is run-length encoded as follows:
;; When y=0
;; Byte 1
;; xxxxxxxx The length of the data, 0-255
;;
;; When y=1
;; Length is run-length encoded
;; Bytes 1-2
;; 0xxxxxxx xxxxxxxx Length of the data, 256-32kB
;; 1xxxxxxx xxxxxxxx Indicates a 4-byte length 32kB-32GB
;; Bytes 3-4
;; zzzzzzzz zzzzzzzz When Byte 1 started with 1, then included with bytes 1-2 to provide 32GB length

;; NOTE: reconsidering using the y bit from byte 0 to indicate
;; that byte 1 is extra type information. This would allow for
;; short numerical types, types of URL that start with http://
;; and https:// etc.
73 changes: 73 additions & 0 deletions src/asami/durable/common.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
(ns ^{:doc "A common namespace for protocols and constants that are referenced
from multiple files and/or between Clojure and ClojureScript files."
:author "Paula Gearon"}
asami.durable.common)

(def ^:const long-size "Number of bytes in a Long value"
#?(:clj Long/BYTES :cljs 8 #_BigInt64Array.BYTES_PER_ELEMENT))

(def ^:const int-size "Number of bytes in a Integer value"
#?(:clj Integer/BYTES :cljs 4 #_Int32Array.BYTES_PER_ELEMENT))

(def ^:const short-size "Number of bytes in a Short value"
#?(:clj Short/BYTES :cljs 2 #_Int16Array.BYTES_PER_ELEMENT))

(def ^:const max-long "Maximum value that can be safely represented as a long"
#?(:clj Long/MAX_VALUE :cljs (.-MAX_SAFE_INTEGER js/Number)))

(defprotocol Forceable
(force! [this] "Ensures that all written data is fully persisted"))

(defprotocol Lockable
(lock! [this] "Locks this resource. This may be a process lock or a thread lock, depending on the resource")
(unlock! [this] "Unlocks this resource."))

(defprotocol Closeable
(close [this] "Closes and invalidates all associated resources")
(delete! [this] "Remove any persistent resources"))

(defprotocol Transaction
(rewind! [this] "Revert to the last commit point. Any blocks allocated since the last commit will be invalid.")
(commit! [this] "Commits all blocks allocated since the last commit. These blocks are now read-only."))

(defprotocol TxData
(get-tx-data [this] "Returns the data for a transaction in a vector of long values"))

(defprotocol TxStore
(acquire-lock! [this] "Acquires a lock object for the transaction. Once acquired, this resouce MUST be freed!")
(append-tx! [this tx] "Writes a transaction record. The record is a seq of longs")
(get-tx [this id] "Retrieves a transaction record by ID")
(latest [this] "Retrieves the last transaction record")
(tx-count [this] "Retrieves the count of transaction records")
(find-tx [this timestamp] "Finds the transaction number for a timestamp"))

(defprotocol DataStorage
(find-object [pool id] "Retrieves an object by ID")
(find-id [pool object] "Retrieves an ID for an object")
(write! [pool object] "Retrieves an ID for an object, writing it if necessary. Returns a pair of the ID and the next version of the store. Idempotent.")
(at [pool t] "Retrieve the data at a particular transaction."))

(defprotocol Paged
(refresh! [this] "Refreshes the buffers")
(read-byte [this offset] "Returns a byte from underlying pages")
(read-short [this offset] "Returns a short from underlying pages. Offset in bytes.")
(read-long [this offset] "Returns a long from underlying pages. Offset in bytes. Unlike other data types, these may not straddle boundaries")
(read-bytes [this offset length] "Reads length bytes and returns as an array.")
(read-bytes-into [this offset bytes] "Fills a byte array with data from the paged object"))

(defprotocol FlatStore
(write-object! [this obj] "Writes an object to storage. Returns an ID")
(get-object [this id] "Reads and object from storage, based on an ID"))

(defprotocol FlatRecords
(append! [this v] "Writes a vector of long values. Returns an ID")
(get-record [this id] "Reads a record of long values from storage, based on an ID")
(next-id [this] "Returns the next ID that this store will return"))

(defprotocol TupleStorage
(tuples-at [this root] "Returns this tuples index at a different root")
(write-new-tx-tuple! [this tuple] "Adds a new tuple to the index in the current TX")
(write-tuple! [this tuple] "Adds a tuple to the index")
(delete-tuple! [this tuple] "Removes a tuple from the index. Returns both the index and the final element of the tuple")
(find-tuples [this tuple] "Finds a tuples seq, returning a co-ordinate")
(count-tuples [this tuple] "Finds and counts the size of a tuples seq"))
49 changes: 49 additions & 0 deletions src/asami/durable/common_utils.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
(ns ^{:doc "A common namespace for utility functions for storage. "
:author "Paula Gearon"}
asami.durable.common-utils
(:require [clojure.string :as string]
#?(:clj [asami.durable.block.file.block-file :as block-file])
#?(:clj [clojure.java.io :as io]))
#?(:clj (:import [java.io File])))


(def ^:const current-dir ".")
(def ^:const parent-dir "..")
(def ^:const directory-property "base.dir")
(def ^:const directory-env "ASAMI_BASE_DIR")

#?(:clj
(defn get-directory
([name] (get-directory name true))
([name test?]
(let [[root & path-elements] (string/split name #"/")]
(when (or (= parent-dir root) (some #{current-dir parent-dir} path-elements))
(throw (ex-info "Illegal path present in database name" {:database name})))
(let [clean-path (cons root (remove empty? path-elements))
clean-name (string/join File/separatorChar clean-path)
base-dir (or (System/getProperty directory-property) (System/getenv directory-env))
d (if base-dir
(io/file base-dir clean-name)
(io/file clean-name))]
(when test?
(if (.exists d)
(when-not (.isDirectory d)
(throw (ex-info (str "'" d "' already exists as a file") {:name name :path (.getAbsolutePath d)})))
(when-not (.mkdirs d)
(throw (ex-info (str "Unable to create directory '" clean-name "'") {:path (.getAbsolutePath d)})))))
d)))))

(defn create-block-manager
"Creates a block manager"
[name manager-name block-size nr-blocks]
#?(:clj
(let [d (get-directory name)]
(block-file/create-managed-block-file (.getPath (io/file d manager-name)) block-size nr-blocks))))

(defn named-storage
"A common function for opening storage with a given name. Must be provided with a storage constructor and the name.
The root id indicates an index root, and may be nil for an empty index.
The block count refers to the count of blocks in the storage."
[storage-constructor name root-id block-count]
#?(:clj
(storage-constructor name root-id block-count)))
357 changes: 357 additions & 0 deletions src/asami/durable/decoder.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,357 @@
(ns ^{:doc "Encodes and decodes data for storage. Clojure implementation"
:author "Paula Gearon"}
asami.durable.decoder
(:require [clojure.string :as s]
[asami.graph :as graph]
[asami.durable.common :refer [read-byte read-bytes read-short]]
[asami.durable.codec :refer [byte-mask data-mask sbytes-shift len-nybble-shift utf8
long-type-code date-type-code inst-type-code
sstr-type-code skey-type-code node-type-code
boolean-false-bits boolean-true-bits]])
(:import [clojure.lang Keyword BigInt]
[java.math BigInteger BigDecimal]
[java.net URI]
[java.time Instant]
[java.util Date UUID]
[java.nio ByteBuffer]
[java.nio.charset Charset]))

;; (set! *warn-on-reflection* true)

(defn decode-length
"Reads the header to determine length.
ext: if true (bit is 0) then length is a byte, if false (bit is 1) then length is in either a short or an int
pos: The beginning of the data. This has skipped the type byte.
returns: a pair of the header length and the data length."
[ext paged-rdr ^long pos]
(if ext
[Byte/BYTES (bit-and 0xFF (read-byte paged-rdr pos))]
(let [len (read-short paged-rdr pos)]
(if (< len 0)
(let [len2 (read-short paged-rdr pos)]
[Integer/BYTES (bit-or
(bit-shift-left (int (bit-and 0x7FFF len)) Short/SIZE)
len2)])
[Short/BYTES len]))))

(defn decode-length-node
"Reads the header to determine length.
data: The complete buffer to decode, including the type byte.
returns: the length, or a lower bound on the length"
[^bytes data]
(let [b0 (aget data 0)]
(cond ;; test for short format objects
(zero? (bit-and 0x80 b0)) b0 ;; short string
(zero? (bit-and 0x40 b0)) (bit-and 0x3F b0) ;; short URI
(zero? (bit-and 0x20 b0)) (bit-and 0x0F b0) ;; short keyword OR number
;; First byte contains only the type information. Give a large number = 63
:default 0x3F)))

;; Readers are given the length and a position. They then read data into a type

(defn read-str
[paged-rdr ^long pos ^long len]
(String. ^bytes (read-bytes paged-rdr pos len) ^Charset utf8))

(defn read-uri
[paged-rdr ^long pos ^long len]
(URI/create (read-str paged-rdr pos len)))

(defn read-keyword
[paged-rdr ^long pos ^long len]
(keyword (read-str paged-rdr pos len)))

(defn read-long
"Raw reading of big-endian bytes into a long"
^long [paged-rdr ^long pos ^long len]
(let [^bytes b (read-bytes paged-rdr pos len)]
(areduce b i ret 0 (bit-or (bit-shift-left ret Byte/SIZE) (bit-and 0xFF (aget b i))))))

;; decoders operate on the bytes following the initial type byte information
;; if the data type has variable length, then this is decoded first

(defn long-decoder
[ext paged-rdr ^long pos]
(let [b (ByteBuffer/wrap (read-bytes paged-rdr pos Long/BYTES))]
[(.getLong b 0) Long/BYTES]))

(defn double-decoder
[ext paged-rdr ^long pos]
(let [b (ByteBuffer/wrap (read-bytes paged-rdr pos Long/BYTES))]
[(.getDouble b 0) Long/BYTES]))

(defn string-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
[(read-str paged-rdr (+ pos i) len) (+ i len)]))

(defn uri-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
[(read-uri paged-rdr (+ pos i) len) (+ i len)]))

(defn bigint-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)
b (read-bytes paged-rdr (+ i pos) len)]
[(bigint (BigInteger. ^bytes b)) (+ i len)]))

(defn bigdec-decoder
[ext paged-rdr ^long pos]
(let [[s len] (string-decoder ext paged-rdr pos)]
[(bigdec s) len]))

(defn date-decoder
[ext paged-rdr ^long pos]
[(Date. ^long (first (long-decoder ext paged-rdr pos))) Long/BYTES])

(def ^:const instant-length (+ Long/BYTES Integer/BYTES))

(defn instant-decoder
[ext paged-rdr ^long pos]
(let [b (ByteBuffer/wrap (read-bytes paged-rdr pos instant-length))
epoch (.getLong b 0)
sec (.getInt b Long/BYTES)]
[(Instant/ofEpochSecond epoch sec) instant-length]))

(defn keyword-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
[(read-keyword paged-rdr (+ pos i) len) (+ i len)]))

(def ^:const uuid-length (* 2 Long/BYTES))

(defn uuid-decoder
[ext paged-rdr ^long pos]
(let [b (ByteBuffer/wrap (read-bytes paged-rdr pos uuid-length))
low (.getLong b 0)
high (.getLong b Long/BYTES)]
[(UUID. high low) uuid-length]))

(defn blob-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
[(read-bytes paged-rdr (+ i pos) len) (+ i len)]))

(defn xsd-decoder
[ext paged-rdr ^long pos]
(let [[s len] (string-decoder ext paged-rdr pos)
sp (s/index-of s \space)]
[[(URI/create (subs s 0 sp)) (subs (inc sp))] len]))

(defn default-decoder
"This is a decoder for unsupported data that has a string constructor"
[ext paged-rdr ^long pos]
(let [[s len] (string-decoder ext paged-rdr pos)
sp (s/index-of s \space)
class-name (subs s 0 sp)]
(try
(let [c (Class/forName class-name)
cn (.getConstructor c (into-array Class [String]))]
[(.newInstance cn (object-array [(subs s (inc sp))])) len])
(catch Exception e
(throw (ex-info (str "Unable to construct class: " class-name) {:class class-name}))))))

(declare typecode->decoder read-object-size)

(defn seq-decoder
"This is a decoder for sequences of data. Use a vector as the sequence."
[ext paged-rdr ^long pos]
;; read the length of the header and the length of the seq data
(let [[i len] (decode-length ext paged-rdr pos)
start (+ i pos)
end (+ start len)
;; get the 0 byte. This contain info about the types in the seq
b0 (read-byte paged-rdr start)
decoder (if (zero? b0)
;; heterogeneous types. Full header on every element. Read objects with size.
read-object-size
;; homogeneous types. The header is only written once
(if (= 0xD0 (bit-and 0xF0 b0)) ;; homogenous numbers
(let [num-len (bit-and 0x0F b0)] ;; get the byte length of all the numbers
;; return a function that deserializes the number and pairs it with the length
#(vector (read-long %1 %2 num-len) num-len))
(if-let [tdecoder (typecode->decoder (bit-and 0x0F b0))] ;; reader for type
;; the standard decoder already returns a deserialized value/length pair
#(tdecoder true %1 %2)
(throw (ex-info "Illegal datatype in array" {:type-code (bit-and 0x0F b0)})))))]
;; iterate over the buffer deserializing until the end is reached
(loop [s [] offset (inc start)]
(if (>= offset end)
[s (+ i len)] ;; end of the buffer, return the seq and the number of bytes read
(let [[o obj-len] (decoder paged-rdr offset)] ;; deserialize, then step forward
(recur (conj s o) (+ offset obj-len)))))))

(defn map-decoder
"A decoder for maps. Returns the map and the bytes read."
[ext paged-rdr ^long pos]
;; read the map as one long seq, then split into pairs
(let [[s len] (seq-decoder ext paged-rdr pos)
m (into {} (map vec (partition 2 s)))]
[m len]))

(def typecode->decoder
"Map of type codes to decoder functions. Returns object and bytes read."
{0 long-decoder
1 double-decoder
2 string-decoder
3 uri-decoder
4 seq-decoder
5 map-decoder
6 bigint-decoder
7 bigdec-decoder
8 date-decoder
9 instant-decoder
10 keyword-decoder
11 uuid-decoder
12 blob-decoder
13 xsd-decoder})

(def ^:const type-nybble-shift 60)

(def ^:const nybble-mask 0xF)
(def ^:const long-nbit 0x0800000000000000)
(def ^:const lneg-bits -0x1000000000000000) ;; 0xF000000000000000

(defn extract-long
"Extract a long number from an encapsulating ID"
^long [^long id]
(let [l (bit-and data-mask id)]
(if (zero? (bit-and long-nbit l))
l
(bit-or lneg-bits l))))

(defn as-byte
[n]
(if (zero? (bit-and 0x80 n))
(byte n)
(byte (bit-or -0x100 n))))

(defn extract-sstr
"Extract a short string from an encapsulating ID"
[^long id]
(let [len (bit-and (bit-shift-right id len-nybble-shift) nybble-mask)
abytes (byte-array len)]
(doseq [i (range len)]
(aset abytes i
(->> (* i Byte/SIZE)
(- sbytes-shift)
(bit-shift-right id)
(bit-and byte-mask)
as-byte
byte)))
(String. ^bytes abytes 0 len ^Charset utf8)))

(defn extract-node
[id]
(asami.graph.InternalNode. (bit-and data-mask id)))

(defn unencapsulate-id
"Converts an encapsulating ID into the object it encapsulates. Return nil if it does not encapsulate anything."
[^long id]
(when (> 0 id)
(case id
-0x5000000000000000 false ;; boolean-false-bits
-0x4800000000000000 true ;; boolean-true-bits
(let [tb (bit-and (bit-shift-right id type-nybble-shift) nybble-mask)]
(case tb
0x8 (extract-long id) ;; long-type-code
0xC (Date. (extract-long id)) ;; date-type-code
0xA (Instant/ofEpochMilli (extract-long id)) ;; inst-type-code
0xE (extract-sstr id) ;; sstr-type-code
0x9 (keyword (extract-sstr id)) ;; skey-type-code
0xD (extract-node id) ;; node-type-code
nil)))))

(defn encapsulated-node?
[^long id]
(let [top-nb (bit-and (bit-shift-right id type-nybble-shift) nybble-mask)]
(or (= top-nb skey-type-code) (= top-nb node-type-code))))

(defn type-info
"Returns the type information encoded in a header-byte"
[b]
(cond
(zero? (bit-and 0x80 b)) 2 ;; string
(zero? (bit-and 0x40 b)) 3 ;; uri
(zero? (bit-and 0x20 b)) 10 ;; keyword
;; if short uris are permitted in the future then change to the URI code (3) here
:default (bit-and 0xF b)))

(defn partials-len
"Determine the number of bytes that form a partial character at the end of a UTF-8 byte array.
The len argument is the defined length of the full string, but that may be greater than the bytes provided."
([^bytes bs] (partials-len bs (alength bs)))
([^bytes bs len]
(let [end (dec (min len (alength bs)))]
(when (>= end 0)
(loop [t 0]
(if (= 4 t) ;; Safety limit. Should not happen for well formed UTF-8
t
(let [b (aget bs (- end t))]
(if (zero? (bit-and 0x80 b)) ;; single char that can be included
t
(if (zero? (bit-and 0x40 b)) ;; extension char that may be truncated
(recur (inc t))
(cond
(= 0xC0 (bit-and 0xE0 b)) (if (= 1 t) 0 (inc t)) ;; 2 bytes
(= 0xE0 (bit-and 0xF0 b)) (if (= 2 t) 0 (inc t)) ;; 3 bytes
(= 0xF0 (bit-and 0xF8 b)) (if (= 3 t) 0 (inc t)) ;; 4 bytes
:default (recur (inc t)))))))))))) ;; this should not happen for well formed UTF-8

(defn string-style-compare
"Compare the string form of an object with bytes that store the string form of an object"
[left-s ^bytes right-bytes]
(let [rbc (alength right-bytes) ;; length of all bytes
full-length (decode-length-node right-bytes)
;; get the length of the bytes used in the string
rlen (min full-length (dec rbc))
;; look for partial chars to be truncated, starting at the end.
;; string starts 1 byte in, after the header, so start at inc of the string byte length
trunc-len (partials-len right-bytes (inc rlen))
right-s (String. right-bytes 1 (int (- rlen trunc-len)) ^Charset utf8)
;; only truncate the LHS if the node does not contain all of the string data
left-side (if (<= full-length (dec rbc))
left-s
(subs left-s 0 (min (count left-s) (count right-s))))]
(compare left-side right-s)))

(defn long-bytes-compare
"Compare data from 2 values that are the same type. If the data cannot give a result
then return 0. Operates on an array, expected to be in an index node."
[type-left left-header left-body left-object right-bytes]
(case (byte type-left)
2 (string-style-compare left-object right-bytes) ;; String
3 (string-style-compare (str left-object) right-bytes) ;; URI
10 (string-style-compare (subs (str left-object) 1) right-bytes) ;; Keyword
;; otherwise, skip the type byte in the right-bytes, and raw compare left bytes to right bytes
(or
(first (drop-while zero? (map compare left-body (drop 1 right-bytes)))) ;; includes right header and body
0)))

(defn read-object-size
"Reads an object from a paged-reader, at id=pos. Returns both the object and it's length."
[paged-rdr ^long pos]
(let [b0 (read-byte paged-rdr pos)
ipos (inc pos)]
(cond ;; test for short format objects
;; calculate the length for short format objects, and increment by 1 to include the intro byte
(zero? (bit-and 0x80 b0)) [(read-str paged-rdr ipos b0) (inc b0)]
(zero? (bit-and 0x40 b0)) (let [len (bit-and 0x3F b0)]
[(read-uri paged-rdr ipos len) (inc len)])
;; First byte contains only the type information. Increment the returned length to include b0
(= 0xE0 (bit-and 0xE0 b0)) (update ((typecode->decoder (bit-and 0x0F b0) default-decoder)
(zero? (bit-and 0x10 b0)) paged-rdr ipos)
1 inc)
;; high nybble is 1100 for keywords or 1101 for long number
:default (let [read-fn (if (zero? (bit-and 0x30 b0)) read-keyword read-long)
len (bit-and 0x0F b0)]
[(read-fn paged-rdr ipos len) (inc len)]))))

(defn read-object
"Reads an object from a paged-reader, at id=pos"
[paged-rdr ^long pos]
(first (read-object-size paged-rdr pos)))

;; the test for zero here is the y bit described in asami.durable.codec
;; This may need to change if the y bit is repurposed.
212 changes: 212 additions & 0 deletions src/asami/durable/decoder.cljs
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
(ns ^{:doc "Encodes and decodes data for storage. Clojure implementation"
:author "Joel Holdbrooks"}
asami.durable.decoder
(:require [clojure.string :as s]
[asami.durable.common :refer [read-byte read-bytes read-short]])
(:import [goog.math Long Integer]
[goog Uri]))

(def ^{:private true}
BYTE_BYTES 1)

(def ^{:private true}
SHORT_BYTES 1)

(def ^{:private true}
SHORT_SIZE 16)

(def ^{:private true}
INTEGER_BYTES 4)

(def ^{:private true}
LONG_BYTES 8)

;; temporary stub
(defn type-info [data] 0)

(defn decode-length
"Reads the header to determine length.
ext: if 0 then length is a byte, if 1 then length is in either a short or an int"
[ext paged-rdr ^long pos]
(if ext
(let [raw (read-byte paged-rdr pos)]
[BYTE_BYTES (bit-and 0xFF raw)])
(let [len (read-short paged-rdr pos)]
(if (< len 0)
(let [len2 (read-short paged-rdr pos)]
[INTEGER_BYTES (bit-or
(bit-shift-left (int (bit-and 0x7FFF len)) SHORT_SIZE)
len2)])
[SHORT_BYTES len]))))

;; Readers are given the length and a position. They then read data into a type

(defn bytes->str [bytes]
(let [decoder (js/TextDecoder.)]
(.decode decoder bytes)))

(defn read-str
[paged-rdr ^long pos ^long len]
(bytes->str (read-bytes paged-rdr pos len)))

(defn read-uri
[paged-rdr ^long pos ^long len]
(Uri/parse (read-str paged-rdr pos len)))

(defn read-keyword
[paged-rdr ^long pos ^long len]
(keyword (read-str paged-rdr pos len)))

;; decoders operate on the bytes following the initial type byte information
;; if the data type has variable length, then this is decoded first

(defn bytes->int
{:private true}
[bytes] ;; `bytes` is assumed to be a `js/Array` like object.
(let [bytes (to-array bytes)]
(bit-or (bit-shift-left (bit-and (aget bytes 0) 0xFF) 24)
(bit-shift-left (bit-and (aget bytes 1) 0xFF) 16)
(bit-shift-left (bit-and (aget bytes 2) 0xFF) 8)
(bit-shift-left (bit-and (aget bytes 3) 0xFF) 0))))

(defn bytes->long
[bytes] ;; `bytes` is assumed to be an `js/Array` like object.
(let [high-bits (bytes->int (.slice bytes 0 4))
low-bits (bytes->int (.slice bytes 4 8))]
(.fromBits Long low-bits high-bits)))

(defn long-decoder
[ext paged-rdr ^long pos]
(bytes->long (read-bytes paged-rdr pos LONG_BYTES)))

#_
(defn double-decoder
[ext paged-rdr ^long pos]
(let [b (ByteBuffer/wrap (read-bytes paged-rdr pos Long/BYTES))]
(.getDouble b 0)))

(defn string-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
(read-str paged-rdr (+ pos i) len)))

(defn uri-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
(read-uri paged-rdr (+ pos i) len)))

#_
(defn bigint-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)
b (read-bytes paged-rdr (+ i pos) len)]
(bigint (BigInteger. b))))

#_
(defn bigdec-decoder
[ext paged-rdr ^long pos]
(bigdec (string-decoder ext paged-rdr pos)))

(defn date-decoder
[ext paged-rdr ^long pos]
;; Note: `.toNumber` may not be safe here.
(js/Date. (.toNumber (long-decoder ext paged-rdr pos))))

(def ^:const instant-length (+ LONG_BYTES INTEGER_BYTES))

(defn instant-decoder
[ext paged-rdr ^long pos]
(let [b (read-bytes paged-rdr pos instant-length)
;; epoch (.getLong b 0) ;; Ignored for now.
sec (bytes->long b)]
(js/Date. (* 1000 sec))))

(defn keyword-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
(read-keyword paged-rdr (+ pos i) len)))

(def ^:const uuid-length
(* 2 LONG_BYTES))

(defn uuid-decoder
[ext paged-rdr ^long pos]
(let [b (read-bytes paged-rdr pos uuid-length)
hex (s/join (map (fn [b]
(let [hex (.toString b 16)]
(if (<= b 0xf)
(str 0 hex)
hex)))
b))]
(if-let [[_ a b c d e] (re-matches #"(.{8})(.{4})(.{4})(.{4})(.{12})" hex)]
(uuid (str a "-" b "-" c "-" d "-" e))
;; TODO: error handling
)))

#_
(defn blob-decoder
[ext paged-rdr ^long pos]
(let [[i len] (decode-length ext paged-rdr pos)]
(read-bytes paged-rdr (+ i pos) len)))

#_
(defn xsd-decoder
[ext paged-rdr ^long pos]
(let [s (string-decoder ext paged-rdr pos)
sp (s/index-of s \space)]
[(URI/create (subs s 0 sp)) (subs (inc sp))]))

(defn default-decoder
"This is a decoder for unsupported data that has a string constructor"
[ext paged-rdr ^long pos]
(throw (ex-info "Not implemented" {}))
#_
(let [s (string-decoder ext paged-rdr pos)
sp (s/index-of s \space)
class-name (subs s 0 sp)]
(try
(let [c (Class/forName class-name)
cn (.getConstructor c (into-array Class [String]))]
(.newInstance cn (object-array [(subs s (inc sp))])))
(catch Exception e
(throw (ex-info (str "Unable to construct class: " class-name) {:class class-name}))))))

(def typecode->decoder
"Map of type codes to decoder functions"
{0 long-decoder
;; 1 double-decoder
2 string-decoder
3 uri-decoder
;; 6 bigint-decoder
;; 7 bigdec-decoder
8 date-decoder
9 instant-decoder
10 keyword-decoder
11 uuid-decoder
;; 12 blob-decoder
;; 13 xsd-decoder
})

(defn long-bytes-compare
"Compare data from 2 values that are the same type. If the data cannot give a result
then return 0. Operates on an array, expected to be in an index node."
[type-left left-header left-body left-object right-bytes]
0)

(defn read-object
"Reads an object from a paged-reader, at id=pos"
[paged-rdr ^long pos]
(let [b0 (read-byte paged-rdr pos)
ipos (inc pos)]
(cond
(zero? (bit-and 0x80 b0)) (read-str paged-rdr ipos b0)
(zero? (bit-and 0x40 b0)) (read-uri paged-rdr ipos (bit-and 0x3F b0))
(zero? (bit-and 0x20 b0)) (read-keyword paged-rdr ipos (bit-and 0x1F b0))
:default ((typecode->decoder (bit-and 0x0F b0) default-decoder)
(zero? (bit-and 0x10 b0)) paged-rdr ipos))))

(defn unencapsulate-id [x])

(defn encapsulated-node? [id])

(defn decode-length-node [b])
441 changes: 441 additions & 0 deletions src/asami/durable/encoder.clj

Large diffs are not rendered by default.

157 changes: 157 additions & 0 deletions src/asami/durable/encoder.cljs
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
(ns ^{:doc "Encodes and decodes data for storage. ClojureScript implementation"
:author "Paula Gearon and Joel Holdbrooks"}
asami.durable.encoder
(:require [clojure.string :as s])
(:import [goog.math Long Integer]
[goog Uri]))

;; (set! *warn-on-reflection* true)

(def ^{:private true} LONG_BYTES 8)

(defn byte-array [size-or-seq]
(if (number? size-or-seq)
(js/Int8Array. (js/ArrayBuffer. size-or-seq))
(.from js/Int8Array size-or-seq)))

(def type->code
{Long (byte 0)
;; Double (byte 1)
js/String (byte 2)
Uri (byte 3) ;; 4 & 5 are reserved for http and https URLs
;; BigInt (byte 6)
;; BigInteger (byte 6)
;; BigDecimal (byte 7)
js/Date (byte 8)
;; Instant (byte 9)
Keyword (byte 10)
UUID (byte 11)
;; :blob (byte 12)
;; :xsd (byte 13)
;; :pojo (byte 14)
})

(defprotocol FlatFile
(header [this len] "Returns a byte array containing a header")
(body [this] "Returns a byte array containing the encoded data"))

(defn type-code
"Returns a code number for an object"
[o]
(throw (ex-info "Not implemented" {}))
#_
(if (bytes? o)
[(type->code :blob) identity]
(if-let [encoder (get @registered-xsd-types (type o))]
[(type->code :xsd) encoder]
(if (str-constructor? (type o))
[(type->code :pojo) (fn [obj] (.getBytes (str (.getName (type o)) " " obj) utf8))]
(throw (ex-info (str "Don't know how to encode a: " (type o)) {:object o}))))))

(defn general-header
"Takes a type number and a length, and encodes to the general header style.
Lengths 0-255 [2r1110tttt length]
Lengths 256-32k [2r1111tttt (low-byte length) (high-byte length)]
Lengths 32k-2G [2r1111tttt (byte0 length) (byte1 length) (byte2 length) (byte3 length)]"
[t len]
(cond
(<= len 0xFF)
(byte-array [(bit-or 0xE0 t) len])
(<= len 0x7FFF)
(byte-array [(bit-or 0xF0 t) (bit-shift-right len 8) (bit-and 0xFF len)])
:default
(byte-array [(bit-or 0xF0 t)
(bit-and 0xFF (bit-shift-right len 24))
(bit-and 0xFF (bit-shift-right len 16))
(bit-and 0xFF (bit-shift-right len 8))
(bit-and 0xFF len)])))

(defn int->bytes
{:private true}
[i]
[(bit-and (bit-shift-right i 24) 0xFF)
(bit-and (bit-shift-right i 16) 0xFF)
(bit-and (bit-shift-right i 8) 0xFF)
(bit-and (bit-shift-right i 0) 0xFF)])

(defn long->bytes
{:private true}
[l]
(byte-array (concat (int->bytes (.getHighBits l))
(int->bytes (.getLowBits l)))))

(defn str->bytes
{:private true}
[s]
(let [encoder (js/TextEncoder.)]
(.encode encoder s)))

(extend-protocol FlatFile
string
(header [this len]
(if (< len 0x80)
(byte-array [len])
(general-header (type->code js/String) len)))
(body [this]
(str->bytes this))

Uri
(header [this len]
(if (< len 0x40)
(byte-array [(bit-or 0x80 len)])
(general-header (type->code Uri) len)))
(body [this]
(str->bytes (.toString this)))

Keyword
(header [this len]
(if (< len 0x20)
(byte-array [(bit-or 0xC0 len)])
(general-header (type->code Keyword) len)))
(body [this]
(let [nms (namespace this)
n (name this)]
(str->bytes (if nms (str nms "/" n) n))))

js/Date
(header [this len]
(assert (= len LONG_BYTES))
(byte-array [(bit-or 0xE0 (type->code js/Date))]))
(body [this]
(body (.getTime this)))

UUID
(header [this len]
(byte-array [(bit-or 0xE0 (type->code UUID))]))
(body [^UUID this]
(let [[a b c d e] (.split (str this) "-")]
(let [least-significant-bits (.fromString Long (str a b c) 16)
most-significant-bits (.fromString Long (str d e) 16)]
(byte-array
(concat (int->bytes (.getHighBits least-significant-bits))
(int->bytes (.getLowBits least-significant-bits))
(int->bytes (.getHighBits most-significant-bits))
(int->bytes (.getLowBits most-significant-bits)))))))

number
(header [this len])
(body [this])

object
(header [this len]
(let [tc (or (type->code (type this))
(first (type-code this)))]
(general-header tc len)))
(body [this]
(if-let [tc (type->code (type this))]
(str->bytes (str this))
(if-let [[_ encoder] (type-code this)]
(encoder this)))))

(defn to-bytes
"Returns a tuple of byte arrays, representing the header and the body"
[o]
(let [b (body o)]
[(header o (.-length b)) b]))

(defn encapsulate-id [x])
329 changes: 329 additions & 0 deletions src/asami/durable/flat_file.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,329 @@
(ns ^{:doc "Manages a memory-mapped file that holds write once data"
:author "Paula Gearon"}
asami.durable.flat-file
(:require [clojure.java.io :as io]
[clojure.string :as string]
[asami.durable.common :refer [Paged refresh! read-byte read-bytes-into read-long
FlatStore write-object! get-object force!
TxStore Closeable Forceable get-tx tx-count long-size
FlatRecords]]
[asami.durable.common-utils :as common-utils]
[asami.durable.encoder :as encoder]
[asami.durable.decoder :as decoder])
(:import [java.io RandomAccessFile File]
[java.nio ByteBuffer]
[java.nio.channels FileChannel FileChannel$MapMode]))

;; (set! *warn-on-reflection* true)

(def read-only FileChannel$MapMode/READ_ONLY)

(def ^:const default-region-size "Default region of 1GB" 0x40000000)

(defprotocol Clearable
(clear! [this] "Clears out any resources which may be held"))

(defn read-setup
[{:keys [regions region-size] :as paged-file} offset byte-count]
(let [region-nr (int (/ offset region-size))
region-offset (mod offset region-size)]
;; the requested data is not currently mapped, so refresh
(when (>= region-nr (count @regions))
(refresh! paged-file))
(when (>= region-nr (count @regions))
(throw (ex-info "Accessing data beyond the end of file"
{:max (count @regions) :region region-nr :offset offset})))
(let [region (nth @regions region-nr)
region-size (.capacity ^ByteBuffer region)
end (+ byte-count region-offset)
[region region-size] (if (and (= (inc region-nr) (count @regions))
(>= end region-size))
(do
(refresh! paged-file)
(let [^ByteBuffer r (nth @regions region-nr)]
[r (.capacity r)]))
[region region-size])]
(when (> end region-size)
(when (= region-nr (dec (count @regions)))
(refresh! paged-file))
(when (>= region-nr (dec (count @regions)))
(throw (ex-info "Accessing trailing data beyond the end of file"
{:region-size region-size :region-offset region-offset}))))
[region region-offset])))

;; These functions do update the PagedFile state, but only to expand the mapped region.
(defrecord PagedFile [^RandomAccessFile f regions region-size]
Paged
(refresh! [this]
(letfn [(remap [mappings]
(let [existing (if-let [tail (last mappings)]
(if (< (.capacity ^ByteBuffer tail) region-size)
(butlast mappings)
mappings))
unmapped-offset (* region-size (count existing))
^FileChannel fchannel (.getChannel f)
_ (.force fchannel true)
flength (.length f)
new-maps (map
(fn [offset]
(.map fchannel read-only offset (min region-size (- flength offset))))
(range unmapped-offset flength region-size))]
(into [] (concat existing new-maps))))]
(swap! regions remap)))

(read-byte [this offset]
;; finds a byte in a region
(let [[region region-offset] (read-setup this offset 1)]
(.get ^ByteBuffer region (int region-offset))))

(read-short [this offset]
;; when the 2 bytes occur in the same region, read a short
;; if the bytes straddle regions, then read both bytes and combine into a short
(let [[region region-offset] (read-setup this offset 2)]
(if (= region-offset (dec region-size))
(short (bit-or (bit-shift-left (.get ^ByteBuffer region (int region-offset)) 8)
(bit-and 0xFF (read-byte this (inc offset)))))
(.getShort ^ByteBuffer region region-offset))))

(read-long [this offset]
;; Unlike other types, a long is required to exist entirely in a region
(let [[region region-offset] (read-setup this offset long-size)]
(.getLong ^ByteBuffer region region-offset)))

(read-bytes [this offset len]
(read-bytes-into this offset (byte-array len)))

(read-bytes-into [this offset bytes]
;; when the bytes occur entirely in a region, then return a slice of the region
;; if the bytes straddle 2 regions, create a new buffer, and copy the bytes from both regions into it
(let [region-nr (int (/ offset region-size))
region-offset (mod offset region-size)
array-len (count bytes)]
;; the requested data is not currently mapped, so refresh
(when (>= region-nr (count @regions))
(refresh! this))
(when (> array-len region-size)
(throw (ex-info "Data size beyond size limit"
{:requested array-len :limit region-size})))
(when (>= region-nr (count @regions))
(throw (ex-info "Accessing data beyond the end of file"
{:max (count @regions) :region region-nr :offset offset})))
(letfn [(read-bytes [attempt]
(let [region (nth @regions region-nr)
region-size (.capacity ^ByteBuffer region)]
(if (>= region-offset region-size)
(if (< attempt 1)
(do
(refresh! this)
(recur 1))
(throw (ex-info "Accessing trailing data beyond the end of file"
{:region-size region-size :region-offset region-offset})))

;; check if the requested data is all in the same region
(if (> (+ region-offset array-len) region-size)
(do ;; data straddles 2 regions
(when (>= (inc region-nr) (count @regions))
(throw (ex-info "Accessing data beyond the end of file"
{:max (count @regions) :region region-nr :offset offset})))
(let [nregion (nth @regions (inc region-nr))
fslice-size (- region-size region-offset)
nslice-size (- array-len fslice-size)]
(if (> nslice-size (.capacity ^ByteBuffer nregion))
(if (< attempt 1)
(do
(refresh! this)
(recur 1))
(throw (ex-info "Accessing data beyond the end of file"
{:size nslice-size :limit (.capacity ^ByteBuffer nregion)})))
(do
(doto (.asReadOnlyBuffer ^ByteBuffer region)
(.position (int region-offset))
(.get ^bytes bytes 0 (int fslice-size)))
(doto (.asReadOnlyBuffer ^ByteBuffer nregion)
(.get ^bytes bytes (int fslice-size) (int nslice-size)))
bytes))))
(do
(doto (.asReadOnlyBuffer ^ByteBuffer region)
(.position (int region-offset))
(.get ^bytes bytes))
bytes)))))]
(read-bytes 0))))
Clearable
(clear! [this] (reset! regions nil)))

(defn paged-file
"Creates a paged file reader"
([f] (paged-file f default-region-size))
([f region-size]
(let [p (->PagedFile f (atom nil) region-size)]
(refresh! p)
p)))

;; rfile: A file that will only be appended to
;; paged: A paged reader for the file
(defrecord FlatFile [^RandomAccessFile rfile f paged]
FlatStore
(write-object!
[this obj]
(let [id (.getFilePointer rfile)
[hdr data] (encoder/to-bytes obj)]
(.write rfile ^bytes hdr)
(.write rfile ^bytes data)
id))
(get-object
[this id]
(decoder/read-object paged id))

Forceable
(force! [this]
(.force (.getChannel ^RandomAccessFile rfile) true))

Closeable
(close [this]
(force! this)
(clear! paged)
(.close rfile))

(delete! [this]
(.delete ^File f)))


(defn tx-file-size
[^RandomAccessFile rfile tx-size]
(let [fsize (.getFilePointer rfile)]
(when-not (zero? (mod fsize tx-size))
(throw (ex-info "Corrupted transaction file" {:file-size fsize :tx-size tx-size})))
fsize))

(defrecord TxFile [^RandomAccessFile rfile f paged tx-size]
TxStore
(append-tx!
[this {:keys [timestamp tx-data] :as tx}]
(let [sz (.getFilePointer rfile)]
(.writeLong ^RandomAccessFile rfile ^long timestamp)
(doseq [t tx-data]
(.writeLong ^RandomAccessFile rfile ^long t))
(long (/ sz tx-size))))

(get-tx
[this id]
(let [offset (* tx-size id)
timestamp (read-long paged offset)
tx-data (mapv #(read-long paged (+ (* long-size %) offset)) (range 1 (/ tx-size long-size)))]
{:timestamp timestamp
:tx-data tx-data}))

(latest
[this]
(let [fsize (tx-file-size rfile tx-size)
id (dec (long (/ fsize tx-size)))]
(when (<= 0 id)
(get-tx this id))))

(tx-count
[this]
(long (/ (tx-file-size rfile tx-size) tx-size)))

(find-tx
[this timestamp]
(loop [low 0 high (tx-count this)]
(if (= (inc low) high)
low
(let [mid (long (/ (+ low high) 2))
mts (read-long paged (* tx-size mid))
c (compare mts timestamp)]
(cond
(zero? c) mid
(> 0 c) (recur mid high)
(< 0 c) (recur low mid))))))

(acquire-lock! [this] (.lock (.getChannel rfile)))

Forceable
(force! [this]
(.force (.getChannel rfile) true))

Closeable
(close [this]
(force! this)
(clear! paged)
(.close rfile))

(delete! [this]
(.delete ^File f)))

(defrecord RecordsFile [^RandomAccessFile rfile f paged record-size]
FlatRecords
(append!
[this data]
(assert (= (* long-size (count data)) record-size))
(let [sz (.getFilePointer rfile)]
(doseq [t data]
(.writeLong ^RandomAccessFile rfile ^long t))
(long (/ sz record-size))))

(get-record
[this id]
(let [offset (* record-size id)]
(mapv #(read-long paged (+ (* long-size %) offset))
(range (/ record-size long-size)))))

(next-id
[this]
(long (/ (.getFilePointer rfile) record-size)))

Forceable
(force! [this]
(.force (.getChannel rfile) true))

Closeable
(close [this]
(force! this)
(clear! paged)
(.close rfile))

(delete! [this]
(.delete ^File f)))

(defn- file-store
"Creates and initializes an append-only file and a paged reader."
[name fname size]
(let [directory (common-utils/get-directory name)
f (io/file directory fname)
raf (RandomAccessFile. f "rw")
file-length (.length raf)]
(when-not (zero? file-length)
(.seek raf file-length))
[raf f (paged-file raf size)]))

(defn flat-store
"Creates a flat file store. This wraps an append-only file and a paged reader."
[group-name name]
(let [[raf f paged] (file-store group-name name default-region-size)]
(->FlatFile raf f paged)))

(defn block-file
[group-name name record-size]
(let [region-size (* record-size (int (/ default-region-size record-size)))]
(file-store group-name name region-size)))

(defn tx-store
"Creates a transaction store. This wraps an append-only file and a paged reader."
[group-name name payload-size]
(let [tx-size (+ long-size payload-size)
[raf f paged] (block-file group-name name tx-size)]
(->TxFile raf f paged tx-size)))

(defn record-store
"Creates a record store. This wraps an append-only file and a paged reader.
The records size is measured in bytes."
[group-name name record-size]
(let [[raf f paged] (block-file group-name name record-size)]
(->RecordsFile raf f paged record-size)))

(defn store-exists?
"Checks if the resources for a file have been created already"
[group-name name]
(let [d (common-utils/get-directory group-name false)
f (io/file d name)]
(and (.exists d) (.exists f))))
261 changes: 261 additions & 0 deletions src/asami/durable/graph.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
(ns ^{:doc "The implements the Graph over durable storage"
:author "Paula Gearon"}
asami.durable.graph
(:require [asami.graph :as graph]
[asami.common-index :as common-index :refer [?]]
[asami.durable.common :as common :refer [TxData Transaction Closeable
find-tuples tuples-at write-new-tx-tuple!
write-tuple! delete-tuple!
find-object find-id write! at latest rewind! commit!
close delete! append! next-id max-long]]
[asami.durable.common-utils :as common-utils]
[asami.durable.pool :as pool]
[asami.durable.tuples :as tuples]
[asami.durable.resolver :as resolver :refer [get-from-index get-transitive-from-index]]
#?(:clj [asami.durable.flat-file :as flat-file])
[asami.durable.block.block-api :as block-api]
[zuko.node :as node]
[zuko.logging :as log :include-macros true]))

;; (set! *warn-on-reflection* true)

;; names to use when index files are all separate
(def spot-name "eavt")
(def post-name "avet")
(def ospt-name "veat")
(def tspo-name "teav") ;; a flat file transaction index

;; names to use when index files are shared
(def index-name "stmtidx.bin")
(def block-name "stmt.bin")

(declare ->BlockGraph)

(defn square [x] (* x x))
(defn cube [x] (* x x x))

(defrecord BlockGraph [spot post ospt tspo pool node-allocator id-checker tree-block-manager tuple-block-manager]
graph/Graph
(new-graph
[this]
(throw (ex-info "Cannot create a new graph without new storage parameters" {:type "BlockGraph"})))

(graph-add [this subj pred obj]
(when (zero? graph/*default-tx-id*)
(throw (ex-info "Transaction info is required for durable graphs" {:operation :graph-add})))
(graph/graph-add this subj pred obj graph/*default-tx-id*))
(graph-add
[this subj pred obj tx-id]
(let [[s new-pool] (write! pool subj)
[p new-pool] (write! new-pool pred)
[o new-pool] (write! new-pool obj)
stmt-id (next-id tspo)]
(if-let [new-spot (write-new-tx-tuple! spot [s p o stmt-id])]
;; new statement, so insert it into the other indices and return a new BlockGraph
(let [new-post (write-tuple! post [p o s stmt-id])
new-ospt (write-tuple! ospt [o s p stmt-id])
sid (append! tspo [tx-id s p o])]
(assert (= stmt-id sid))
;; ensure that any imported internal nodes were not outside of range
(graph/id-check subj id-checker)
(graph/id-check pred id-checker)
(graph/id-check obj id-checker)
;; return the updated graph
(assoc this
:spot new-spot
:post new-post
:ospt new-ospt
:pool new-pool))
;; The statement already existed. The pools SHOULD be identical, but check in case they're not
(if (identical? pool new-pool)
this
(do
(log/warn "A statement existed that used an element not found in the data pool")
(assoc this :pool new-pool))))))

(graph-delete
[this subj pred obj]
(or
(if-let [s (find-id pool subj)]
(if-let [p (find-id pool pred)]
(if-let [o (find-id pool obj)]
(let [[new-spot t] (delete-tuple! spot [s p o])]
(when t ;; serves as a proxy for (not (identical? spot new-spot))
(let [[new-post] (delete-tuple! post [p o s t])
[new-ospt] (delete-tuple! ospt [o s p t])]
;; the statement stays in tspo
(assoc this
:spot new-spot
:post new-post
:ospt new-ospt)))))))
this))

(graph-transact
[this tx-id assertions retractions]
(common-index/graph-transact this tx-id assertions retractions (volatile! [[] [] {}])))

(graph-transact
[this tx-id assertions retractions generated-data]
(common-index/graph-transact this tx-id assertions retractions generated-data))

(graph-diff
[this other]
(when-not (= (type this) (type other))
(throw (ex-info "Unable to compare diffs between graphs of different types" {:this this :other other})))
;; for every subject, look at the attribute-value sequence in the other graph, and skip that subject if they match
(let [subjects (map first (find-tuples spot []))]
(remove (fn [s] (= (find-tuples spot [s]) (find-tuples (:spot other) [s]))) subjects)))

(resolve-triple
[this subj pred obj]
(let [[plain-pred trans-tag] (common-index/check-for-transitive pred)
get-id (fn [e] (if (symbol? e) e (find-id pool e)))]
(or
(if-let [s (get-id subj)]
(if-let [o (get-id obj)]
(if plain-pred
(when-let [p (get-id plain-pred)]
(log/trace "transitive resolving [" s " " p " " o "]")
(get-transitive-from-index this trans-tag s p o))
(when-let [p (get-id pred)]
(log/trace "resolving [" s " " p " " o "]")
(get-from-index this s p o)))))
[])))

(count-triple
[this subj pred obj]
(let [[plain-pred trans-tag] (common-index/check-for-transitive pred)
get-id (fn [e] (if (symbol? e) e (find-id pool e)))]
(or
(if-let [s (get-id subj)]
(if-let [o (get-id obj)]
(if plain-pred
(when-let [p (get-id plain-pred)]
(log/trace "transitive counting [" s " " p " " o "]")
(let [varc (count (filter symbol? [s p o]))]
;; make some worst-case estimates rather than actual counts
(case varc
;; assuming every use of the predicate is in a chain between the ends
0 (resolver/count-from-index this '?s p '?o)
1 (if (symbol? p)
;; maximum is a chain of the entire graph between 2 points
(resolver/count-from-index this '?s '?p '?o)
;; maximum is a chain of every use of this predicate
(resolver/count-from-index this '?s p '?o))
2 (if (symbol? p)
;; maximum is an entire subgraph attached to the subject or object
(square (resolver/count-from-index this '?s '?p '?o))
;; maximum is every possible connection of nodes that use this predicate
;; factorials are too large, so use cube
(cube (resolver/count-from-index this '?s p '?o)))
;; this is every node connected to every node in the same subgraphs
;; cannot be resolved, so give an unreasonable number
3 max-long))
(count (get-transitive-from-index this trans-tag s p o)))
(when-let [p (get-id pred)]
(log/trace "counting [" s " " p " " o "]")
(resolver/count-from-index this s p o)))))
0)))

node/NodeAPI
(data-attribute [_ _] :tg/first)
(container-attribute [_ _] :tg/contains)
(new-node [_] (node-allocator))
(node-id [_ n] (graph/node-id n))
(node-type? [_ _ n] (graph/node-type? n))
(find-triple [this [e a v]] (graph/resolve-triple this e a v))

Transaction
(rewind! [this]
(when tree-block-manager (rewind! tree-block-manager))
(when tuple-block-manager (rewind! tuple-block-manager))
(let [spot* (rewind! spot)
post* (rewind! post)
ospt* (rewind! ospt)
;; tspo does not currently rewind
pool* (rewind! pool)]
(assoc this
:spot spot*
:post post*
:ospt ospt*
:pool pool*)))

(commit! [this]
(when tree-block-manager (commit! tree-block-manager))
(when tuple-block-manager (commit! tuple-block-manager))
(let [spot* (commit! spot)
post* (commit! post)
ospt* (commit! ospt)
;; tspo does not currently commit
pool* (commit! pool)]
(assoc this
:spot spot*
:post post*
:ospt ospt*
:pool pool*)))


TxData
(get-tx-data [this]
{:r-spot (:root-id spot)
:r-post (:root-id post)
:r-ospt (:root-id ospt)
:r-pool (:root-id pool)
:nr-index-node (block-api/get-block-count tree-block-manager)
:nr-index-block (block-api/get-block-count tuple-block-manager)
:nr-pool-node (block-api/get-block-count pool)})

Closeable
(close [this]
(doseq [resource [spot post ospt tspo pool]]
(close resource))
(when tree-block-manager (close tree-block-manager))
(when tuple-block-manager (close tuple-block-manager)))

(delete! [this]
(doseq [resource [spot post ospt tspo pool]]
(delete! resource))
(when tree-block-manager (delete! tree-block-manager))
(when tuple-block-manager (delete! tuple-block-manager))))

(defn graph-at
"Returns a graph based on another graph, but with different set of index roots. This returns a historical graph.
graph: The graph to base this on. The same index references will be used.
new-tx: An unpacked transaction, containing each of the tree roots for the indices."
[{:keys [spot post ospt] :as graph}
{:keys [r-spot r-post r-ospt r-pool] :as new-tx}]
(assoc graph
:spot (tuples-at spot r-spot)
:post (tuples-at post r-post)
:ospt (tuples-at ospt r-ospt)))

(defn new-block-graph
"Creates a new BlockGraph object, under a given name. If the resources for that name exist, then they are opened.
If the resources do not exist, then they are created.
name: the label of the location for the graph resources.
tx: The transaction record for this graph."
[name {:keys [r-spot r-post r-ospt r-pool]} node-allocator id-checker]
(let [spot-index (tuples/create-tuple-index name spot-name r-spot)
post-index (tuples/create-tuple-index name post-name r-post)
ospt-index (tuples/create-tuple-index name ospt-name r-ospt)
tspo-index #?(:clj (flat-file/record-store name tspo-name tuples/tuple-size-bytes) :cljs nil)
data-pool (pool/create-pool name r-pool nil)]
(->BlockGraph spot-index post-index ospt-index tspo-index data-pool node-allocator id-checker nil nil)))

(defn new-merged-block-graph
"Creates a new BlockGraph object, under a given name. If the resources for that name exist, then they are opened.
If the resources do not exist, then they are created.
name: the label of the location for the graph resources.
tx: The transaction record for this graph."
[name {:keys [r-spot r-post r-ospt r-pool nr-index-node nr-index-block nr-pool-node]} node-allocator id-checker]
;; NOTE: Tree nodes blocks must hold the tuples payload and the tree node header
(let [tree-block-manager (common-utils/create-block-manager name index-name tuples/tree-block-size nr-index-node)
tuple-block-manager (common-utils/create-block-manager name block-name tuples/block-bytes nr-index-block)
spot-index (tuples/create-tuple-index-for-managers "SPO" tree-block-manager tuple-block-manager r-spot)
post-index (tuples/create-tuple-index-for-managers "POS" tree-block-manager tuple-block-manager r-post)
ospt-index (tuples/create-tuple-index-for-managers "OSP" tree-block-manager tuple-block-manager r-ospt)
tspo-index #?(:clj (flat-file/record-store name tspo-name tuples/tuple-size-bytes) :cljs nil)
data-pool (pool/create-pool name r-pool nr-pool-node)]
(->BlockGraph spot-index post-index ospt-index tspo-index data-pool node-allocator id-checker
tree-block-manager tuple-block-manager)))

37 changes: 37 additions & 0 deletions src/asami/durable/macros.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
(ns ^{:doc "Macro definintions"
:author "Paula Gearon"}
asami.durable.macros
(:require [asami.durable.common :refer [lock! unlock!]]))

(defmacro with-lock
"Uses a lock for a block of code"
[lock & body]
`(try
(lock! ~lock)
~@body
(finally (unlock! ~lock))))

(defmacro assert-args
[& pairs]
`(do (when-not ~(first pairs)
(throw (IllegalArgumentException.
(str (first ~'&form) " requires " ~(second pairs) " in " ~'*ns* ":" (:line (meta ~'&form))))))
~(let [more (nnext pairs)]
(when more
(list* `assert-args more)))))

(defmacro with-open*
"Duplicates the with-open macro from clojure.core."
[bindings & body]
(assert-args
(vector? bindings) "a vector for its binding"
(even? (count bindings)) "an even number of forms in binding vector")
(cond
(= (count bindings) 0) `(do ~@body)
(symbol? (bindings 0)) `(let ~(subvec bindings 0 2)
(try
(with-open* ~(subvec bindings 2) ~@body)
(finally
(when ~(bindings 0)
(. ~(bindings 0) close)))))
:else (throw (ex-info "with-open only allows Symbols in bindings" {:bindings bindings}))))
174 changes: 174 additions & 0 deletions src/asami/durable/pool.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
(ns ^{:doc "Data pool with blocks"
:author "Paula Gearon"}
asami.durable.pool
(:require [asami.durable.common :refer [DataStorage Closeable Forceable Transaction
long-size get-object write-object! get-object
find-tx get-tx append! commit! rewind! force! close delete!]]
[asami.durable.common-utils :as common-utils]
[asami.durable.tree :as tree]
[asami.durable.encoder :as encoder :refer [to-bytes comparable]]
[asami.durable.decoder :as decoder :refer [type-info long-bytes-compare]]
[asami.durable.block.block-api :refer [get-long get-byte get-bytes put-byte! put-bytes! put-long! get-id
CountedBlocks get-block-count]]
[asami.cache :refer [lookup hit miss lru-cache-factory]]
#?(:clj [asami.durable.flat-file :as flat-file])))

;; (set! *warn-on-reflection* true)

(def ^:const index-name "Name of the index file" "idx.bin")

(def ^:const data-name "Name of the data file" "data.bin")

(def ^:const data-offset 0)

(def ^:const id-offset-long 2)

(def ^:const id-offset (* id-offset-long long-size))

(def ^:const payload-len (- id-offset data-offset))

(def ^:const tree-node-size "Number of bytes used in the index nodes" (* (inc id-offset-long) long-size))

(defn get-object-ref
[node]
(get-long node id-offset-long))

(defn index-writer
[node [[header body] id]]
(let [remaining (dec payload-len)]
(put-byte! node data-offset (aget ^bytes header 0))
(when (> remaining 0)
(put-bytes! node 1 (min remaining (alength ^bytes body)) body))
(put-long! node id-offset-long id)))

(defn pool-comparator-fn
"Returns a function that can compare data to what is found in a node"
[data-store]
(fn [[type-byte header body object] node]
(let [node-type (type-info (get-byte node data-offset))
c (compare type-byte node-type)]
(if (zero? c)
(let [nc (long-bytes-compare (byte type-byte) header body object (get-bytes node data-offset payload-len))]
(if (zero? nc)
;; There is an optimization option here if one of the strings is shorter than the
;; node payload length and matches the header of the other string, then they match
;; and this next step is performed. Instead, in this case a +/- 1 can be returned.
(let [stored-data (get-object data-store (get-object-ref node))]
(compare (comparable object) (comparable stored-data)))
nc))
c))))

(declare ->ReadOnlyPool)

(defn find*
[{:keys [data index]} object]
(let [[header body] (to-bytes object)
node (tree/find-node index [^byte (type-info (aget ^bytes header 0)) header body object])]
(when (and node (not (vector? node)))
(get-object-ref node))))

(defrecord ReadOnlyPool [data index root-id cache]
DataStorage
(find-object
[this id]
(or
(decoder/unencapsulate-id id)
(get-object data id)))

(find-id
[this object]
(or
(encoder/encapsulate-id object)
(find* this object)))

(write! [this object]
(throw (ex-info "Unsupported Operation" {:cause "Read Only" :operation "write"})))

(at [this new-root-id]
(->ReadOnlyPool data (tree/at index new-root-id) new-root-id)))

(defrecord DataPool [data index root-id cache]
DataStorage
(find-object
[this id]
(let [value (decoder/unencapsulate-id id)]
(if (nil? value)
(get-object data id)
value)))

(find-id
[this object]
(or
(encoder/encapsulate-id object)
(find* this object)))

(write! [this object]
(if-let [id (encoder/encapsulate-id object)]
[id this]
(if-let [id (lookup @cache object)]
(do
(swap! cache hit object)
[id this])
(let [[header body :as object-data] (to-bytes object)
location (tree/find-node index [^byte (type-info (aget ^bytes header 0)) header body object])]
(if (or (nil? location) (vector? location))
(let [id (write-object! data object)
;; Note that this writer takes a different format to the comparator!
;; That's OK, since this `add` function does not require the location to be found again
;; and the writer will format the data correctly
next-index (tree/add index [object-data id] index-writer location)]
(swap! cache miss object id)
[id (assoc this :index next-index :root-id (get-id (:root next-index)))])
(let [id (get-object-ref location)]
(swap! cache miss object id)
[id this]))))))

(at [this new-root-id]
(->ReadOnlyPool data (tree/at index new-root-id) new-root-id cache))

CountedBlocks
(get-block-count
[this]
(get-block-count index))

Transaction
(commit! [this]
(force! data)
(let [{r :root :as next-index} (commit! index)]
;; root can be nil if only small values have been stored
(assoc this :index next-index :root-id (and r (get-id r)))))

(rewind! [this]
(let [{r :root :as next-index} (rewind! index)]
;; root can be nil if only small values have been stored
(assoc this :index next-index :root-id (and r (get-id r)))))

Closeable
(close [this]
(close index)
(close data))

(delete! [this]
(delete! index)
(delete! data)))

(def data-constructor #?(:clj flat-file/flat-store))

(def ^:const encap-cache-size 1024)

(defn open-pool
"Opens all the resources required for a pool, and returns the pool structure"
[name root-id block-count]
(let [data-store (data-constructor name data-name)
data-compare (pool-comparator-fn data-store)
index (tree/new-block-tree (fn
([] true)
([lname size] (common-utils/create-block-manager name lname size block-count)))
index-name tree-node-size data-compare root-id)
encap-cache (atom (lru-cache-factory {} :threshold encap-cache-size))]
(->DataPool data-store index root-id encap-cache)))

(defn create-pool
"Creates a datapool object"
([name] (create-pool name nil nil))
([name root-id block-count] (common-utils/named-storage open-pool name root-id block-count)))
233 changes: 233 additions & 0 deletions src/asami/durable/resolver.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
(ns ^{:doc "Handles resolving patterns on a graph"
:author "Paula Gearon"}
asami.durable.resolver
(:require [asami.graph :refer [broad-node-type?]]
[asami.common-index :as common-index :refer [?]]
[asami.durable.decoder :as decoder]
[asami.durable.common :as common :refer [find-tuples count-tuples find-object]]
[clojure.set :as set]))

;; (set! *warn-on-reflection* true)

(defmulti get-from-index
"Lookup an index in the graph for the requested data.
Returns a sequence of unlabelled bindings. Each binding is a vector of binding values."
common-index/simplify)

(def v2 (fn [dp t] (vector (find-object dp (nth t 2)))))
(def v12 (fn [dp t] (vector
(find-object dp (nth t 1))
(find-object dp (nth t 2)))))
(def v21 (fn [dp t] (vector
(find-object dp (nth t 2))
(find-object dp (nth t 1)))))

;; Extracts the required index (idx), and looks up the requested fields.
;; If an embedded index is pulled out, then this is referred to as edx.
(defmethod get-from-index [:v :v :v]
[{idx :spot dp :pool} s p o]
(if (seq (find-tuples idx [s p o])) [[]] []))

(defmethod get-from-index [:v :v ?]
[{idx :spot dp :pool} s p o]
(map #(v2 dp %) (find-tuples idx [s p])))

(defmethod get-from-index [:v ? :v]
[{idx :ospt dp :pool} s p o]
(map #(v2 dp %) (find-tuples idx [o s])))

(defmethod get-from-index [:v ? ?]
[{idx :spot dp :pool} s p o]
(map #(v12 dp %) (find-tuples idx [s])))

(defmethod get-from-index [ ? :v :v]
[{idx :post dp :pool} s p o]
(map #(v2 dp %) (find-tuples idx [p o])))

(defmethod get-from-index [ ? :v ?]
[{idx :post dp :pool} s p o]
(map #(v21 dp %) (find-tuples idx [p])))

(defmethod get-from-index [ ? ? :v]
[{idx :ospt dp :pool} s p o]
(map #(v12 dp %) (find-tuples idx [o])))

(defmethod get-from-index [ ? ? ?]
[{idx :spot dp :pool} s p o]
;; use an unrolled map of find-object to speed up extraction
(map (fn [[s p o _]] [(find-object dp s) (find-object dp p) (find-object dp o)])
(find-tuples idx [])))

(defn zero-step
"Prepend a zero step value if the tag requests it"
[tag pool zero result]
(if (= :star tag)
(let [z [(find-object pool zero)]]
(cons z result))
result))

(def project-after-first #(subvec % 1 3))

(defmulti get-transitive-from-index
"Lookup an index in the graph for the requested data, and returns all data where the required predicate
is a transitive relationship. Unspecified predicates extend across the graph.
Returns a sequence of unlabelled bindings. Each binding is a vector of binding values."
common-index/trans-simplify)

(defn get-single-from-index
[idx data-pool tag st p srch]
(loop [seen? #{} starts [st] result []]
(let [step (for [st' starts n (map #(nth % 2) (find-tuples idx (srch st'))) :when (not (seen? n))] n)]
(if (empty? step)
(->> result
(map (fn [x] [(find-object data-pool x)]))
(zero-step tag data-pool st))
(recur (into seen? step) step (concat result step))))))

;; follows a predicate transitively from a node
(defmethod get-transitive-from-index [:v :v ?]
[{idx :spot pool :pool :as graph} tag s p o]
(get-single-from-index idx pool tag s p (fn [s'] [s' p])))

;; finds all transitive paths that end at a node
(defmethod get-transitive-from-index [ ? :v :v]
[{idx :post pool :pool :as graph} tag s p o]
(get-single-from-index idx pool tag o p (fn [o'] [p o'])))


(defn *stream-from
[selector knowns initial-node]
(letfn [(stream-from [node]
(let [next-nodes (selector node) ;; node is a subject/object, get all objects/subjects
next-nodes' (remove @knowns next-nodes)] ;; remove the knowns from the next nodes
(vswap! knowns into next-nodes') ;; add all these new nodes to the known set
(doseq [n next-nodes']
(stream-from n))))] ;; go to the next step for each node
(stream-from initial-node)))


(defn transitive-from
"Steps out from a provided node either forwards or backwards, to the next nodes in the required direction.
idx: The index to use for lookups in the required direction
pool: The datapool to turn local nodes (numbers) into global nodes (values)
tag: Indicates is the transitive operation is * or +
x: The starting node. A subject when going downstream, or an object when going upstream.
tproject: A function for projecting tuples of predicates and the next node from x.
ypos: The position in the index of the next nodes in the required direction.
subjects when starting at an object, and objects when starting at a subject.
rproject: A function for projecting the final result as a vector in the expected order for the operation."
[idx pool tag x tproject ypos rproject]
(let [f-obj #(find-object pool %)
x-val (f-obj x)
starred (= :star tag) ;; was the transitive operation * or +
tuples (map tproject (find-tuples idx [x]))
;; the following includes state, but this is in order to stay relatively lazy
all-pred (volatile! {})
knowns (volatile! #{})]
(for [[pred y] tuples
y' (do
(when-not (@all-pred pred) ;; when the predicate changes
(vreset! knowns #{y}) ;; start with a fresh set of known nodes
(vswap! all-pred assoc pred (f-obj pred))) ;; remember the new predicate & its global form
;; accumulate all nodes up/down-stream from the object node
(*stream-from (fn [x] (into #{} (map #(nth % ypos) (find-tuples idx [x]))))
knowns y)
;; extract the accumulated nodes. Add the zero-step node if needed
(conj (if starred (conj @knowns x) @knowns) y))]
;; emit the global forms of the predicate and each object
(rproject (@all-pred pred) (f-obj y')))))

;; entire graph from a node
;; the predicates returned are the first step in the path
;; consider the entire path, as per the [:v ? :v] function
(defmethod get-transitive-from-index [:v ? ?]
[{idx :spot pool :pool :as graph} tag s p o]
(transitive-from idx pool tag s project-after-first 2 vector))

;; entire graph that ends at a node
(defmethod get-transitive-from-index [ ? ? :v]
[{idx :ospt pos :pos pool :pool :as graph} tag s p o]
(transitive-from idx pool tag o (fn [[_ s p]] [p s]) 1 (fn [pr sb] [sb pr])))

(defn ordered-collect
"Converts a sequence of key/value pairs that are grouped by key, and returns a map of keys to sets of values.
The grouping of keys allows the avoidance of map lookups."
[pairs]
(loop [[[k v :as p] & rpairs] pairs prev-key nil vls #{} result {}]
(if-not p
(if prev-key (assoc result prev-key vls) result)
(if (= k prev-key)
(recur rpairs prev-key (conj vls v) result)
(recur rpairs k (conj #{} v) (if prev-key (assoc result prev-key vls) result))))))

;; every node that can reach every node with a specified predicate
;; This result is in-memory. It can be done with lazy joins, but will be significantly slower
;; Revist this is scalability becomes an issue
(defmethod get-transitive-from-index [ ? :v ?]
[{idx :post pool :pool :as graph} tag s p o]
(let [os-pairs (map project-after-first (find-tuples idx [p]))
result-index (loop [result (ordered-collect os-pairs)]
(let [next-result (common-index/step-by-predicate result)]
;; note: consider a faster comparison
(if (= next-result result)
result
(recur next-result))))]
(for [s' (keys result-index) :let [gs (find-object pool s')] o' (result-index s')]
[gs (find-object pool o')])))

;; finds a path between 2 nodes
(defmethod get-transitive-from-index [:v ? :v]
[{idx :spot pool :pool :as graph} tag s p o]
(let [edges-from (fn [n] ;; finds all property/value pairs from an entity
(map project-after-first (find-tuples idx [n])))
node-type? (fn [n] (or (decoder/encapsulated-node? n)
(broad-node-type? (find-object pool n))))
[path] (common-index/get-path-between idx edges-from node-type? tag s o)
find-in-pool #(find-object pool %)]
(if path
(vector (map #(mapv find-in-pool %) path))
[])))

;; every node that can reach every node
;; expensive and pointless, so throw exception
(defmethod get-transitive-from-index [ ? ? ?]
[graph tag s p o]
(throw (ex-info "Unable to do transitive closure with nothing bound" {:args [s p o]})))


(defmulti count-from-index
"This optimizes counting by traversing index nodes, and not iterating over tuples blocks."
common-index/simplify)

(defmethod count-from-index [:v :v :v]
[{idx :spot pool :pool :as graph} s p o]
(count-tuples idx [s p o]))

(defmethod count-from-index [:v :v ?]
[{idx :spot pool :pool :as graph} s p o]
(count-tuples idx [s p]))

(defmethod count-from-index [:v ? :v]
[{idx :ospt pool :pool :as graph} s p o]
(count-tuples idx [o s]))

(defmethod count-from-index [:v ? ?]
[{idx :spot pool :pool :as graph} s p o]
(count-tuples idx [s]))

(defmethod count-from-index [ ? :v :v]
[{idx :post pool :pool :as graph} s p o]
(count-tuples idx [p o]))

(defmethod count-from-index [ ? :v ?]
[{idx :post pool :pool :as graph} s p o]
(count-tuples idx [p]))

(defmethod count-from-index [ ? ? :v]
[{idx :ospt pool :pool :as graph} s p o]
(count-tuples idx [o]))

(defmethod count-from-index [ ? ? ?]
[{idx :spot pool :pool :as graph} s p o]
(count-tuples idx []))

285 changes: 285 additions & 0 deletions src/asami/durable/store.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
(ns ^{:doc "The implements the Block storage version of a Graph/Database/Connection"
:author "Paula Gearon"}
asami.durable.store
(:require [asami.storage :as storage :refer [ConnectionType DatabaseType UpdateData]]
[asami.graph :as graph]
[asami.internal :as i :refer [now instant? long-time]]
[asami.durable.common :as common
:refer [append-tx! commit! get-tx latest tx-count find-tx close delete!]]
[asami.durable.common-utils :as common-utils]
[asami.durable.macros :as m :include-macros true]
[asami.durable.pool :as pool]
[asami.durable.tuples :as tuples]
[asami.durable.graph :as dgraph]
[zuko.schema :refer [Triple]]
[asami.entities.general :refer [GraphType]]
[asami.entities.reader :as reader]
[schema.core :as s :include-macros true]
#?(:clj [asami.durable.flat-file :as flat-file])
#?(:clj [clojure.java.io :as io]))
#?(:clj (:import [java.util.concurrent.locks Lock ReentrantLock]
[java.io File]
[java.nio.channels FileLock])))

#?(:clj (set! *warn-on-reflection* true))

(def tx-name "tx.dat")

;; transactions contain tree roots for the 3 tree indices,
;; the tree root for the data pool,
;; the 3 block counts for the tuples index tree, the tuples blocks, and the data pool index tree
;; the internal node counter
(def tx-record-size (* 8 common/long-size))

(def TxRecord {(s/required-key :r-spot) (s/maybe s/Int)
(s/required-key :r-post) (s/maybe s/Int)
(s/required-key :r-ospt) (s/maybe s/Int)
(s/required-key :r-pool) (s/maybe s/Int)
(s/required-key :nr-index-node) (s/maybe s/Int)
(s/required-key :nr-index-block) (s/maybe s/Int)
(s/required-key :nr-pool-node) (s/maybe s/Int)
(s/required-key :nodes) s/Int
(s/required-key :timestamp) s/Int})

(def TxRecordPacked {(s/required-key :timestamp) s/Int
(s/required-key :tx-data) [(s/one s/Int "spot root id")
(s/one s/Int "post root id")
(s/one s/Int "ospt root id")
(s/one s/Int "pool root id")
(s/one s/Int "number of index nodes allocated")
(s/one s/Int "number of index blocks allocated")
(s/one s/Int "number of pool index nodes allocated")
(s/one s/Int "node id counter")]})

(s/defn pack-tx :- TxRecordPacked
"Packs a transaction into a vector for serialization"
[{:keys [r-spot r-post r-ospt r-pool nr-index-node nr-index-block nr-pool-node nodes timestamp]} :- TxRecord]
{:timestamp timestamp :tx-data [(or r-spot 0) (or r-post 0) (or r-ospt 0) (or r-pool 0)
(or nr-index-node 0) (or nr-index-block 0) (or nr-pool-node 0)
nodes]})

(s/defn unpack-tx :- TxRecord
"Unpacks a transaction vector into a structure when deserializing"
[{[r-spot r-post r-ospt r-pool
nr-index-node nr-index-block nr-pool-node nodes] :tx-data
timestamp :timestamp} :- TxRecordPacked]
(letfn [(non-zero [v] (and v (when-not (zero? v) v)))]
{:r-spot (non-zero r-spot)
:r-post (non-zero r-post)
:r-ospt (non-zero r-ospt)
:r-pool (non-zero r-pool)
:nr-index-node (non-zero nr-index-node)
:nr-index-block (non-zero nr-index-block)
:nr-pool-node (non-zero nr-pool-node)
:nodes nodes
:timestamp timestamp}))

(s/defn new-db :- TxRecordPacked
[]
{:timestamp (long-time (now)) :tx-data [0 0 0 0 0 0 0 0]})

(declare ->DurableDatabase)

(s/defn as-of* :- DatabaseType
"Returns a database value for a provided t-value.
If t-val is the transaction number for an older database, then returns that database. Otherwise, will return this database
If t-val is a timestamp then returns the most recent database that was valid at that time."
[{{:keys [tx-manager] :as connection} :connection
bgraph :bgraph
timestamp :timestamp
t :t :as database} :- DatabaseType
t-val]
(if-let [new-t (cond
(instant? t-val) (let [requested-time (long-time t-val)]
(and (< requested-time timestamp)
(find-tx tx-manager requested-time)))
(int? t-val) (and (< t-val t) t-val)
:default (throw (ex-info (str "Unable to retrieve database for datatype " (type t-val))
{:value t-val :type (type t-val)})))]
(let [bounded-t (min (max 0 new-t) (dec (tx-count tx-manager)))
{new-ts :timestamp :as tx} (unpack-tx (get-tx tx-manager bounded-t))]
(->DurableDatabase connection (dgraph/graph-at bgraph tx) bounded-t new-ts))
database))

(s/defn since* :- (s/maybe DatabaseType)
"Returns the next database value after the provided t-value.
If t-val is a transaction number for an older databse, then it returns the next database. If it refers to the current
database or later, then returns nil (even if more recent databases exist, since this database is stateless.
If t-val is a timestamp, then it returns the next database after that time, unless the timestamp is at or after
the timestamp on the current database."
[{{tx-manager :tx-manager :as connection} :connection
timestamp :timestamp
bgraph :bgraph
t :t :as database} :- DatabaseType
t-val]
(letfn [(set-database [tx txid ts]
(->DurableDatabase connection (dgraph/graph-at bgraph tx) txid ts))
(db-for [txid]
(let [{ts :timestamp :as tx} (unpack-tx (get-tx tx-manager txid))]
(set-database tx txid ts)))]
;; check that the database isn't empty
(when (> (tx-count tx-manager) 0)
(cond
;; look for a since point by timestamp
(instant? t-val) (let [requested-time (long-time t-val)]
(when (< requested-time timestamp) ;; if at or after the final timestamp, then nil
(let [{fts :timestamp :as first-tx} (unpack-tx (get-tx tx-manager 0))]
(if (< requested-time fts) ;; before the first timestamp, so the first commit point
(set-database first-tx 0 fts)
(let [txid (inc (find-tx tx-manager requested-time))]
(db-for txid))))))
;; look for a since point by tx ID.
;; If it's at or after the time of the latest database, then return nil
(int? t-val) (when (< t-val t)
(let [txid (max 0 (inc t-val))]
(db-for txid)))
:default (throw (ex-info (str "Unable to retrieve database for datatype " (type t-val))
{:value t-val :type (type t-val)}))))))

(s/defn entity* :- (s/maybe {s/Keyword s/Any})
[{bgraph :bgraph :as database}
id
nested? :- s/Bool]
(reader/ident->entity bgraph id nested?))

(defrecord DurableDatabase [connection bgraph t timestamp]
storage/Database
(as-of [this t-val] (as-of* this t-val))
(as-of-t [this] t)
(as-of-time [this] timestamp)
(since [this t-val] (since* this t-val))
(since-t [this] t)
(graph [this] bgraph)
(entity [this id nested?] (entity* this id nested?)))

(s/defn db* :- DatabaseType
"Returns the most recent database value from the connection."
[{:keys [name tx-manager grapha] :as connection} :- ConnectionType]
(let [tx (latest tx-manager)
{:keys [r-spot r-post r-ospt timestamp]} (and tx (unpack-tx tx))
{:keys [spot post ospt] :as g} @grapha
tx-id (dec (common/tx-count tx-manager))]
(assert (= r-spot (:root-id spot)))
(assert (= r-post (:root-id post)))
(assert (= r-ospt (:root-id ospt)))
(->DurableDatabase connection g tx-id timestamp)))

(s/defn delete-database*
"Delete the graph, which will recursively delete all resources"
[{:keys [name grapha tx-manager] :as connection} :- ConnectionType]
(close @grapha)
(delete! @grapha)
(reset! grapha nil)
(close tx-manager)
(delete! tx-manager)
#?(:clj (when-let [d (common-utils/get-directory name)]
(.delete ^File d))
:cljs true))

(s/defn release*
"Closes the transaction manager, and the graph, which will recursively close all resources"
[{:keys [name grapha tx-manager] :as connection} :- ConnectionType]
(close @grapha)
(reset! grapha nil)
(close tx-manager))

(def DBsBeforeAfter [(s/one DatabaseType "db-before")
(s/one DatabaseType "db-after")])

;; Update functions return a Graph, and accept a Graph and an integer
(def UpdateFunction (s/=> GraphType GraphType s/Int))

(s/defn transact-update* :- DBsBeforeAfter
"Updates a graph according to a provided function. This will be done in a new, single transaction."
[{:keys [tx-manager grapha nodea] :as connection} :- ConnectionType
update-fn :- UpdateFunction]
;; multithreaded environments require exclusive writing for the graph
;; this also ensures no writing between the read/write operations of the update-fn
;; Locking is required, as opposed to using atoms, since I/O operations cannot be repeated.
(let [file-lock (volatile! nil)]
(m/with-lock connection
(m/with-open* #?(:clj [^FileLock file-lock (common/acquire-lock! tx-manager)]
:cljs [file-lock (common/acquire-lock! tx-manager)])
;; keep a reference of what the data looks like now
(let [{:keys [bgraph t timestamp] :as db-before} (db* connection)
;; figure out the next transaction number to use
tx-id (common/tx-count tx-manager)
;; do the modification on the graph
next-graph (update-fn @grapha tx-id)
;; step each underlying index to its new transaction point
graph-after (commit! next-graph)
;; get the metadata (tree roots) for all the transactions
new-timestamp (long-time (now))
tx (assoc (common/get-tx-data graph-after)
:nodes @nodea
:timestamp new-timestamp)]
;; save the transaction metadata
(common/append-tx! tx-manager (pack-tx tx))
;; update the connection to refer to the latest graph
(reset! grapha graph-after)
;; return the required database values
[db-before (->DurableDatabase connection graph-after tx-id new-timestamp)])))))

(s/defn transact-data* :- DBsBeforeAfter
"Removes a series of tuples from the latest graph, and asserts new tuples into the graph.
Updates the connection to the new graph."
([conn :- ConnectionType
updates! :- UpdateData
asserts :- [Triple] ;; triples to insert
retracts :- [Triple]] ;; triples to remove
(transact-update* conn (fn [graph tx-id] (graph/graph-transact graph tx-id asserts retracts updates!))))
([conn :- ConnectionType
updates! :- UpdateData
generator-fn]
(transact-update* conn
(fn [graph tx-id]
(let [[asserts retracts] (generator-fn graph)]
(graph/graph-transact graph tx-id asserts retracts updates!))))))

(s/defn get-url* :- s/Str
[{:keys [name]} :- ConnectionType]
(str "asami:local://" name))

(defrecord DurableConnection [name tx-manager grapha nodea lock]
storage/Connection
(get-name [this] name)
(get-url [this] (get-url* this))
(next-tx [this] (common/tx-count tx-manager))
(db [this] (db* this))
(delete-database [this] (delete-database* this))
(release [this] (release* this))
(transact-update [this update-fn] (transact-update* this update-fn))
(transact-data [this updates! asserts retracts] (transact-data* this updates! asserts retracts))
(transact-data [this updates! generator-fn] (transact-data* this updates! generator-fn))
common/Lockable
(lock! [this] #?(:clj (.lock ^Lock lock)))
(unlock! [this] #?(:clj (.unlock ^Lock lock))))

(s/defn db-exists? :- s/Bool
"Tests if this database exists by looking for the transaction file"
[store-name :- s/Str]
#?(:clj (flat-file/store-exists? store-name tx-name) :cljs nil))

(defn- create-lock
"Creates a lock object for the connection. This is a noop in ClojureScript"
[]
#?(:clj (ReentrantLock.)))

(s/defn create-database :- ConnectionType
"This opens a connection to an existing database by the name of the location for resources.
If the database does not exist then it is created."
[name :- s/Str]
(let [ex (db-exists? name)
tx-manager #?(:clj (flat-file/tx-store name tx-name tx-record-size) :cljs nil)
;; initialize new databases with a transaction that indicates an empty store
_ (when-not ex (common/append-tx! tx-manager (new-db)))
tx (latest tx-manager)
unpacked-tx (and tx (unpack-tx tx))
node-ct (get unpacked-tx :nodes 0)
node-counter (atom node-ct)
node-allocator (fn [] (graph/new-node (swap! node-counter inc)))
;; the following function is called under locked conditions
id-checker (fn [id] (when (> id @node-counter) (reset! node-counter id)))
block-graph (dgraph/new-merged-block-graph name unpacked-tx node-allocator id-checker)]
(->DurableConnection name tx-manager (atom block-graph) node-counter (create-lock))))

448 changes: 448 additions & 0 deletions src/asami/durable/tree.cljc

Large diffs are not rendered by default.

604 changes: 604 additions & 0 deletions src/asami/durable/tuples.cljc

Large diffs are not rendered by default.

193 changes: 193 additions & 0 deletions src/asami/entities.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
(ns ^{:doc "Entity to triple mapping for the transaction api.
This handles conversion of entities as well as managing updates."
:author "Paula Gearon"}
asami.entities
(:require [asami.storage :as storage :refer [DatabaseType]]
[asami.graph :as gr]
[asami.entities.general :refer [EntityMap GraphType]]
[asami.entities.writer :as writer :refer [Triple]]
[zuko.util :as util]
[zuko.node :as node]
#?(:clj [schema.core :as s]
:cljs [schema.core :as s :include-macros true])))


(defn ^:private annotated-attribute?
"Checks if an attribute has been annotated with a character"
[c a] ;; usually a keyword, but attributes can be other things
(and (keyword a) (= c (last (name a)))))

(def ^:private update-attribute?
"Checks if an attribute indicates that it should be updated"
(partial annotated-attribute? \'))

(def ^:private append-attribute?
"Checks if an attribute indicates that the data is an array that should be appended to"
(partial annotated-attribute? \+))

(defn- normalize-attribute
"Converts an updating attribute to its normalized form"
[a]
(if-not (keyword? a)
a
(let [n (name a)]
(keyword (namespace a) (subs n 0 (dec (count n)))))))

(s/defn ^:private contains-updates?
"Checks if any part of the object is to be updated"
[obj :- {s/Any s/Any}]
(let [obj-keys (keys obj)]
(or (some update-attribute? obj-keys)
(some append-attribute? obj-keys)
(some #(and (map? %) (contains-updates? %)) (vals obj)))))

(s/defn ^:private minus :- (s/maybe s/Num)
[limit :- (s/maybe s/Num)
n :- s/Num]
(when limit (- limit n)))

(s/defn ^:private entity-triples :- [(s/one [Triple] "New triples")
(s/one [Triple] "Retractions")
(s/one {s/Any s/Any} "New list of ID mappings")
(s/one #{s/Any} "Running total set of top-level IDs")]
"Creates the triples to be added and removed for a new entity.
graph: the graph the entity is to be added to
obj: The entity to generate triples for
existing-ids: When IDs are provided by the user, then they get mapped to the internal ID that is actually used.
This map contains a mapping of user IDs to the ID allocated for the entity
top-ids: The IDs of entities that are inserted at the top level. These are accumulated and this set
avoids the need to query for them."
[graph :- GraphType
{id :db/id ident :db/ident ident2 :id :as obj} :- EntityMap
existing-ids :- {s/Any s/Any}
top-ids :- #{s/Any}
limit :- (s/maybe s/Num)]
(let [[new-obj removals additions]
(if (contains-updates? obj)
(do
(when-not (or id ident ident2)
(throw (ex-info "Nodes to be updated must be identified with :db/id or :db/ident" obj)))
(let [node-ref (cond
id (and (seq (gr/resolve-triple graph id '?a '?v)) id)
ident (ffirst (gr/resolve-triple graph '?r :db/ident ident))
ident2 (ffirst (gr/resolve-triple graph '?r :id ident2)))
_ (when-not node-ref (throw (ex-info "Cannot update a non-existent node" (select-keys obj [:db/id :db/ident :id]))))
;; find the annotated attributes
obj-keys (keys obj)
update-attributes (set (filter update-attribute? obj-keys))
append-attributes (filter append-attribute? obj-keys)
;; map annotated attributes to the unannotated form
attribute-map (->> (concat update-attributes append-attributes)
(map (fn [a] [a (normalize-attribute a)]))
(into {}))
;; update attributes get converted, append attributes get removed
clean-obj (->> obj
(keep (fn [[k v :as e]] (if-let [nk (attribute-map k)] (when (update-attributes k) [nk v]) e)))
(into {}))
;; find existing attribute/values that match the updates
entity-av-pairs (gr/resolve-triple graph node-ref '?a '?v)
update-attrs (set (map attribute-map update-attributes))
;; determine what needs to be removed
removal-pairs (filter (comp update-attrs first) entity-av-pairs)
removals (mapcat (partial writer/existing-triples graph node-ref) removal-pairs)

;; find the lists that the appending attributes refer to
append-attrs (set (map attribute-map append-attributes))
;; find what should be the heads of lists, removing any that aren't list heads
attr-heads (->> entity-av-pairs
(filter (comp append-attrs first))
(filter #(seq (gr/resolve-triple graph (second %) :tg/first '?v))))
;; find any appending attributes that are not in use. These are new arrays
remaining-attrs (reduce (fn [attrs [k v]] (disj attrs k)) append-attrs attr-heads)
;; reassociate the object with any attributes that are for new arrays, making it a singleton array
append->annotate (into {} (map (fn [a] [(attribute-map a) a]) append-attributes))
new-obj (reduce (fn [o a] (assoc o a [(obj (append->annotate a))])) clean-obj remaining-attrs)
;; find tails function
find-tail (fn [node]
(if-let [n (ffirst (gr/resolve-triple graph node :tg/rest '?r))]
(recur n)
node))
;; create appending triples
append-triples (mapcat (fn [[attr head]]
(let [v (obj (append->annotate attr))
new-node (node/new-node graph)]
[[(find-tail head) :tg/rest new-node] [new-node :tg/first v] [head :tg/contains v]])) attr-heads)]
(if (and limit (> (count append-triples) limit))
(throw (ex-info "Limit reached" {:overflow true}))
[new-obj removals append-triples])))
[obj nil nil])

[triples ids new-top-ids] (writer/ident-map->triples graph
new-obj
existing-ids
top-ids
(minus limit (count additions)))

;; if updates occurred new entity statements are redundant
triples (if (or (seq removals) (seq additions) (not (identical? obj new-obj)))
(remove #(= :tg/entity (second %)) triples)
triples)]
[(concat triples additions) removals ids new-top-ids]))

(defn- vec-rest
"Takes a vector and returns a vector of all but the first element. Same as (vec (rest s))"
[s]
#?(:clj (subvec (vec s) 1)
:cljs (vec (rest s))))

(defn- temp-id?
"Tests if an entity ID is a temporary ID"
[i]
(and (number? i) (neg? i)))

(defn resolve-lookup-refs [graph i]
(or (and (writer/lookup-ref? i)
(ffirst (gr/resolve-triple graph '?r (first i) (second i))))
i))

(s/defn build-triples :- [(s/one [Triple] "Data to be asserted")
(s/one [Triple] "Data to be retracted")
(s/one {s/Any s/Any} "ID map of created objects")]
"Converts a set of transaction data into triples.
Returns a tuple containing [triples removal-triples tempids]"
([graph :- gr/GraphType
data :- [s/Any]]
(build-triples graph data nil))
([graph :- gr/GraphType
data :- [s/Any]
limit :- (s/maybe s/Num)]
(let [[retract-stmts new-data] (util/divide' #(= :db/retract (first %)) data)
ref->id (partial resolve-lookup-refs graph)
retractions (mapv (comp (partial mapv ref->id) rest) retract-stmts)
add-triples (fn [[acc racc ids top-ids :as last-result] obj]
(if (and limit (> (count acc) limit))
(reduced last-result)
(if (map? obj)
(try
(let [[triples rtriples new-ids new-top-ids] (entity-triples graph
obj
ids
top-ids
(minus limit (count acc)))]
[(into acc triples) (into racc rtriples) new-ids new-top-ids])
(catch #?(:clj Exception :cljs :default) e
(if-let [overflow (:overflow (ex-data e))]
(reduced last-result)
(throw e))))
(if (and (seqable? obj)
(= 4 (count obj))
(= :db/add (first obj)))
(or
(when (= (nth obj 2) :db/id)
(let [id (nth obj 3)]
(when (temp-id? id)
(let [new-id (or (ids id) (node/new-node graph))]
[(conj acc (assoc (vec-rest obj) 2 new-id))
racc
(assoc ids (or id new-id) new-id)
top-ids]))))
[(conj acc (mapv #(or (ids %) (ref->id %)) (rest obj))) racc ids top-ids])
(throw (ex-info (str "Bad data in transaction: " obj) {:data obj}))))))
[triples rtriples id-map top-level-ids] (reduce add-triples [[] retractions {} #{}] new-data)
triples (writer/backtrack-unlink-top-entities top-level-ids triples)]
[triples rtriples id-map])))
24 changes: 24 additions & 0 deletions src/asami/entities/general.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
(ns ^{:doc "Common functionality for the entity reader/writer namespaces"
:author "Paula Gearon"}
asami.entities.general
(:require [schema.core :as s :refer [=>]]
[clojure.string :as string]
[zuko.node :as node]
[naga.store :as store :refer [StorageType]]))

(def tg-ns "tg")

(def KeyValue [(s/one s/Any "Key") (s/one s/Any "Value")])

(def EntityMap {s/Any s/Any})


(def Result [(s/one s/Any "first") (s/optional s/Any "second") (s/optional s/Any "third")])
(def Pattern [(s/one s/Any "entity") (s/one s/Any "attribute") (s/one s/Any "value")])

;; The resolver function takes a single pattern argument, and returns a seq of Result
(def ResolverFn (=> [Result] [Pattern]))

(def GraphType (s/pred #(satisfies? node/NodeAPI %)))


187 changes: 187 additions & 0 deletions src/asami/entities/reader.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
(ns ^{:doc "Reads structured data from a graph."
:author "Paula Gearon"}
asami.entities.reader
(:require [asami.entities.general :as general :refer [tg-ns KeyValue EntityMap GraphType]]
[zuko.node :as node]
[schema.core :as s :refer [=>]]
[clojure.string :as string]))


(def MapOrList (s/cond-pre EntityMap [s/Any]))

(def NodeType s/Any) ;; No checking, but indicates a node in a graph

(defn get-tg-first
"Finds the tg/first property in a map, and gets the value."
[struct]
(let [first-pair? (fn [[k v :as p]]
(and (keyword? k)
(= tg-ns (namespace k))
(string/starts-with? (name k) "first")
p))]
(some first-pair? struct)))

(s/defn property-values :- [KeyValue]
"Return all the property/value pairs for a given entity in the store. "
[graph :- GraphType
entity :- s/Any]
(->> (node/find-triple graph [entity '?p '?o])
(remove #(= :tg/owns (first %)))))


(s/defn check-structure :- (s/maybe [KeyValue])
"Determines if a value represents a structure. If so, return the property/values for it.
Otherwise, return nil."
[graph :- GraphType
prop :- s/Any
v :- s/Any]
(when (and (not (#{:db/ident :db/id} prop)) (node/node-type? graph prop v))
(let [data (property-values graph v)]
data)))


(declare pairs->struct recurse-node)

(s/defn build-list :- [s/Any]
"Takes property/value pairs and if they represent a list node, returns the list.
else, nil."
[graph :- GraphType
seen :- #{NodeType}
pairs :- [KeyValue]]
;; convert the data to a map
(let [st (into {} pairs)]
;; if the properties indicate a list, then process it
(if-let [first-prop-elt (get-tg-first st)]
(let [remaining (:tg/rest st)
[_ first-elt] (recurse-node graph seen first-prop-elt)]
(assert first-elt)
(let [head-elt (if (= :tg/nil first-elt) nil first-elt)]
;; recursively build the list
(if remaining
(cons head-elt (build-list graph seen (property-values graph remaining)))
(list head-elt))))
(when (= :tg/list (:tg/type st)) []))))

(s/defn vbuild-list :- [s/Any]
"Calls build-list, converting to a vector as the final step"
[graph :- GraphType
seen :- #{NodeType}
pairs :- [KeyValue]]
(let [l (build-list graph seen pairs)]
(if (seq? l) (vec l) l)))

(def ^:dynamic *nested-structs* false)

(s/defn recurse-node :- s/Any
"Determines if the val of a map entry is a node to be recursed on, and loads if necessary.
If referring directly to a top level node, then short circuit and return the ID"
[graph :- GraphType
seen :- #{NodeType}
[prop v :as prop-val] :- KeyValue]
(if-let [pairs (check-structure graph prop v)]
(if (or (seen v)
(and (not *nested-structs*) (some #(= :tg/entity (first %)) pairs)))
[prop (if-let [[idd ident] (some (fn [[k v]] (if (#{:db/ident :id} k) [k v])) pairs)]
{idd ident}
{:db/id v})]
(let [next-seen (conj seen v)]
[prop (or (vbuild-list graph next-seen pairs)
(pairs->struct graph pairs next-seen))]))
(if (= :tg/empty-list v)
[prop []]
prop-val)))


(s/defn into-multimap
"Takes key/value tuples and inserts them into a map. If there are duplicate keys then create a set for the values."
[xform kvs :- [[(s/one s/Any "Key") (s/one s/Any "Value")]]]
#?(:clj
(transduce xform
(fn
([m] (persistent! m))
([m [k v]]
(assoc! m k (if-let [[km vm] (find m k)]
(if (set? vm) (conj vm v) (hash-set vm v))
v))))
(transient {}) kvs)
:cljs
(transduce xform
(fn
([m] (persistent! m))
([m [k v]]
(assoc! m k (let [vm (get m k ::null)]
(if-not (= ::null vm)
(if (set? vm) (conj vm v) (hash-set vm v))
v)))))
(transient {}) kvs)))


(s/defn pairs->struct :- EntityMap
"Uses a set of property-value pairs to load up a nested data structure from the graph"
([graph :- GraphType
prop-vals :- [KeyValue]] (pairs->struct graph prop-vals #{}))
([graph :- GraphType
prop-vals :- [KeyValue]
seen :- #{NodeType}]
(if (some (fn [[k _]] (= :tg/first k)) prop-vals)
(vbuild-list graph seen prop-vals)
(into-multimap
(comp
(remove (comp #{:db/id :db/ident :tg/entity} first)) ;; INTERNAL PROPERTIES
(map (fn [[a v :as av]] (if (= :tg/nil v) [a nil] av)))
(map (partial recurse-node graph seen))
(map (fn [[a v :as av]] (if (seq? v) [a (vec v)] av))))
prop-vals))))


(s/defn ref->entity :- EntityMap
"Uses an id node to load up a nested data structure from the graph.
Accepts a value that identifies the internal node."
([graph :- GraphType
entity-id :- s/Any]
(ref->entity graph entity-id false nil))
([graph :- GraphType
entity-id :- s/Any
nested? :- s/Bool]
(ref->entity graph entity-id nested? nil))
([graph :- GraphType
entity-id :- s/Any
nested? :- s/Bool
exclusions :- (s/maybe #{(s/cond-pre s/Keyword s/Str)})]
(binding [*nested-structs* nested?]
(let [prop-vals (property-values graph entity-id)
pvs (if (seq exclusions)
(remove (comp exclusions first) prop-vals)
prop-vals)]
(pairs->struct graph pvs #{entity-id})))))


(s/defn ident->entity :- EntityMap
"Converts data in a database to a data structure suitable for JSON encoding
Accepts an internal node identifier to identify the entity object"
([graph :- GraphType
ident :- s/Any]
(ident->entity graph ident false))
([graph :- GraphType
ident :- s/Any
nested? :- s/Bool]
;; find the entity by its ident. Some systems will make the id the entity id,
;; and the ident will be separate, so look for both. Also supporting lookup by :id
(when-let [eid (or (and (seq (node/find-triple graph [ident '?a '?v])) ident)
(ffirst (node/find-triple graph ['?eid :db/ident ident]))
(ffirst (node/find-triple graph ['?eid :id ident])))]
(ref->entity graph eid nested?))))

(s/defn graph->entities :- [EntityMap]
"Pulls all top level entities out of a store"
([graph :- GraphType]
(graph->entities graph false nil))
([graph :- GraphType
nested? :- s/Bool]
(graph->entities graph nested? nil))
([graph :- GraphType
nested? :- s/Bool
exclusions :- (s/maybe #{s/Keyword})]
(->> (node/find-triple graph '[?e :tg/entity true])
(map first)
(map #(ref->entity graph % nested? exclusions)))))
278 changes: 278 additions & 0 deletions src/asami/entities/writer.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
(ns ^{:doc "Converts external data into a graph format (triples)."
:author "Paula Gearon"}
asami.entities.writer
(:require [asami.entities.general :as general :refer [tg-ns KeyValue EntityMap GraphType]]
[asami.entities.reader :as reader]
[zuko.node :as node]
[schema.core :as s :refer [=>]]
[clojure.string :as string]))

;; internal generated properties:
;; :tg/rest List structure
;; :tg/owns References sub entities
;; :tg/entity When true, then indicates a top level entity

;; The following 2 attributes may vary according to the database.
;; e.g. Datomic appends -s -l -d etc to these attributes for different datatypes
;; Asami uses these names without modification:
;; :tg/first Indicates a list member by position. Returned by node/data-attribute
;; :tg/contains Shortcut to list members. Returned by node/container-attribute

;; The following are graph nodes with special meaning:
;; :tg/emtpty-list A list without entries
;; :tg/nil a nil value


;; provides dynamic scope of the current contents of the graph
;; This approach has been adopted to avoid redundantly passing the graph down the callstack
(def ^:dynamic *current-graph* nil)

;; The following provide dynamic scope of accumulated state through the
;; conversion of entities into triples. This approach has been adopted for speed.
(def ^:dynamic *id-map* nil)

(def ^:dynamic *triples* nil)

(def ^:dynamic *limit* nil)

(def ^:dynamic *current-entity* nil)

(def ^:dynamic *top-level-entities* nil)

(def Triple [(s/one s/Any "Entity")
(s/one s/Any "attribute")
(s/one s/Any "value")])

(def identity-prop?
"Tests if a property is a identifier property"
#{:id :db/ident})

(declare value-triples map->triples)

(defn add-triples!
[op data]
(vswap! *triples* op data)
(when (and *limit*
(> (count @*triples*) *limit*))
(throw (ex-info "overflow" {:overflow true}))))

(defn list-triples
"Creates the triples for a list. Returns a node and list of nodes representing contents of the list."
[vlist]
(when (seq vlist)
(loop [list-ref nil, last-ref nil, value-nodes [], [v & vs :as val-list] vlist]
(if-not (seq val-list)
[list-ref value-nodes]
(let [node-ref (node/new-node *current-graph*)
_ (when last-ref
(add-triples! conj [last-ref :tg/rest node-ref]))
value-ref (value-triples v)]
(add-triples! conj [node-ref (node/data-attribute *current-graph* value-ref) value-ref])
(recur (or list-ref node-ref) node-ref (conj value-nodes value-ref) vs))))))

(s/defn value-triples-list
[vlist :- [s/Any]]
(if (seq vlist)
(let [[node value-nodes] (list-triples vlist)]
(doseq [vn value-nodes]
(add-triples! conj [node (node/container-attribute *current-graph* vn) vn]))
node)
:tg/empty-list))

(defn lookup-ref?
"Tests if i is a lookup ref"
[i]
(and (vector? i) (keyword? (first i)) (= 2 (count i))))

(defn resolve-ref
[[prop id]]
(or (and (= :db/id prop) (get @*id-map* id id))
(ffirst (node/find-triple *current-graph* ['?n prop id]))))

(defn top-level-entity?
[node]
(seq (node/find-triple *current-graph* [node :tg/entity true])))

(defn add-subentity-relationship
"Adds a sub-entity relationship for a provided node. Returns the node"
[node]
(when-not (or (= node *current-entity*)
(@*top-level-entities* node)
(= node :tg/empty-list))
(add-triples! conj [*current-entity* :tg/owns node]))
node)

(defn value-triples
"Converts a value into a list of triples.
Return the entity ID of the data."
[v]
(cond
(lookup-ref? v) (or (resolve-ref v)
(value-triples-list v))
(sequential? v) (-> (value-triples-list v) add-subentity-relationship)
(set? v) (value-triples-list (seq v))
(map? v) (-> (map->triples v) add-subentity-relationship)
(nil? v) :tg/nil
:default v))

(s/defn property-vals
"Takes a property-value pair associated with an entity,
and builds triples around it"
[entity-ref :- s/Any
[property value] :- KeyValue]
(if (identity-prop? property)
(add-triples! conj [entity-ref property value])
(if (set? value)
(doseq [v value]
(let [vr (value-triples v)]
(add-triples! conj [entity-ref property vr])))
(let [v (value-triples value)]
(add-triples! conj [entity-ref property v])))))

(defn new-node
[id]
(let [next-id (node/new-node *current-graph*)]
(when id
(vswap! *id-map* assoc (or id next-id) next-id))
next-id))

(s/defn get-ref
"Returns the reference (a node-id) for an object, and a flag that is false if a new reference was generated"
[{id :db/id ident :db/ident ident2 :id :as data} :- EntityMap]
(if-let [r (@*id-map* id)] ;; an ID that is already mapped
[r false]
(let [idd (or ident ident2)]
(cond ;; a negative ID is a request for a new saved ID
(and (number? id) (neg? id)) (let [new-id (new-node id)]
(when idd
(vswap! *id-map* assoc idd new-id))
[new-id false])
;; Use the provided ID
id (if (node/node-type? *current-graph* :db/id id)
[id false]
(throw (ex-info ":db/id must be a value node type" {:db/id id})))
;; With no ID do an ident lookup
idd (if-let [r (@*id-map* idd)]
[r true]
(let [lookup (if ident
(node/find-triple *current-graph* ['?n :db/ident ident])
(node/find-triple *current-graph* ['?n :id ident2]))]
(if (seq lookup)
(let [read-id (ffirst lookup)]
(when (top-level-entity? read-id)
(vswap! *top-level-entities* conj read-id))
(vswap! *id-map* assoc idd read-id)
[read-id true]) ;; return the retrieved ref
[(new-node idd) false]))) ;; nothing retrieved so generate a new ref
;; generate an ID
:default [(new-node nil) false])))) ;; generate a new ref


(s/defn map->triples
"Converts a single map to triples. Returns the entity reference or node id.
The triples are built up statefully in the volatile *triples*."
[data :- EntityMap]
(let [[entity-ref ident?] (get-ref data)
data (dissoc data :db/id)
data (if ident? (dissoc data :db/ident) data)]
;; build up result in *triples*
;; duplicating the code on both branches of the condition,
;; in order to avoid an unnecessary binding on the stack
(if *current-entity*
(doseq [d data]
(property-vals entity-ref d))
(binding [*current-entity* entity-ref]
(vswap! *top-level-entities* conj entity-ref)
(doseq [d data]
(property-vals entity-ref d))))
entity-ref))


(defn name-for
"Convert an id (probably a number) to a keyword for identification"
[id]
(if (or (keyword? id) (node/node-type? *current-graph* :db/id id))
id
(node/node-label *current-graph* id)))


(s/defn ident-map->triples
"Converts a single map to triples for an ID'ed map"
([graph :- GraphType
j :- EntityMap]
(ident-map->triples graph j {} #{} nil))
([graph :- GraphType
j :- EntityMap
id-map :- {s/Any s/Any}
top-level-ids :- #{s/Any}
limit :- (s/maybe s/Num)]
(binding [*current-graph* graph
*id-map* (volatile! id-map)
*triples* (volatile! [])
*limit* limit
*top-level-entities* (volatile! top-level-ids)]
(let [derefed-id-map (ident-map->triples j)]
[@*triples* derefed-id-map @*top-level-entities*])))
([j :- EntityMap]
(let [node-ref (map->triples j)]
(if (:db/ident j)
(add-triples! conj [node-ref :tg/entity true])
(add-triples! into [[node-ref :db/ident (name-for node-ref)] [node-ref :tg/entity true]]))
@*id-map*)))

(defn backtrack-unlink-top-entities
"Goes back through generated triples and removes sub-entity links to entities that were later
determined to be top-level entities."
[top-entities triples]
(remove #(and (= :tg/owns (nth % 1)) (top-entities (nth % 2))) triples))

(s/defn entities->triples :- [Triple]
"Converts objects into a sequence of triples."
([graph :- GraphType
entities :- [EntityMap]]
(entities->triples graph entities {}))
([graph :- GraphType
entities :- [EntityMap]
id-map :- {s/Any s/Any}]
(binding [*current-graph* graph
*id-map* (volatile! id-map)
*triples* (volatile! [])
*top-level-entities* (volatile! #{})]
(doseq [e entities]
(ident-map->triples e))
;; backtrack to see if there were forward references to top level entities
(backtrack-unlink-top-entities @*top-level-entities* @*triples*))))


;; updating the store

(s/defn existing-triples
[graph :- GraphType
node-ref
[k v]]
(or
(if-let [subpv (reader/check-structure graph k v)]
(if-not (some #(= :tg/entity (first %)) subpv)
(cons [node-ref k v] (mapcat (partial existing-triples graph v) subpv))))
[[node-ref k v]]))

(s/defn entity-update->triples :- [(s/one [Triple] "assertions") (s/one [Triple] "retractions")]
"Takes a single structure and converts it into triples to be added and triples to be retracted to create a change"
[graph :- GraphType
node-ref ;; a reference for the structure to be updated
entity] ;; the structure to update the structure in the database to
(binding [*current-graph* graph
*id-map* (volatile! {})]
(let [pvs (reader/property-values graph node-ref)
old-struct (reader/pairs->struct graph pvs)
to-remove (remove (fn [[k v]] (if-let [newv (get entity k)] (= v newv))) old-struct)
pvs-to-remove (filter (comp (set (map first to-remove)) first) pvs)
triples-to-remove (mapcat (partial existing-triples graph node-ref) pvs-to-remove)

to-add (remove (fn [[k v]] (when-let [new-val (get old-struct k)] (= new-val v))) entity)
triples-to-add (binding [*triples* (volatile! [])
*top-level-entities* (volatile! #{})
*current-entity* node-ref]
(doseq [pvs to-add] (property-vals node-ref pvs))
(backtrack-unlink-top-entities @*top-level-entities* @*triples*))]
[triples-to-add triples-to-remove])))
96 changes: 89 additions & 7 deletions src/asami/graph.cljc
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
(ns ^{:doc "The graph index API."
:author "Paula Gearon"}
asami.graph
(:require #?(:clj [schema.core :as s]
:cljs [schema.core :as s :include-macros true])
[clojure.string :as string]))
(:require [schema.core :as s :include-macros true]
[clojure.string :as string]
#?(:cljs [cljs.reader :as reader]))
#?(:clj (:import [java.io Writer])))

(def ^:dynamic *default-tx-id* 0)

(defprotocol Graph
(new-graph [this] "Creates an empty graph of the same type")
(graph-add [this subj pred obj] "Adds triples to the graph")
(graph-add [this subj pred obj] [this subj pred obj tx] "Adds triples to the graph")
(graph-delete [this subj pred obj] "Removes triples from the graph")
(graph-transact
[this tx-id assertions retractions]
[this tx-id assertions retractions generated]
"Bulk operation to add and remove multiple statements in a single operation")
(graph-diff [this other] "Returns all subjects that have changed in this graph, compared to other")
(resolve-triple [this subj pred obj] "Resolves patterns from the graph, and returns unbound columns only")
(count-triple [this subj pred obj] "Resolves patterns from the graph, and returns the size of the resolution"))
@@ -26,18 +32,94 @@
[graph [s p o :as pattern]]
(count-triple graph s p o))


(defprotocol IdCheck
(id-check [o checker] "Checks an object with the provided checker"))

(extend-type #?(:clj Object :cljs object) IdCheck
(id-check [_ _]))

#?(:clj
(deftype InternalNode [^long id]
Object
(toString [_] (str "#a/n[" id "]"))
(equals [_ o] (and (instance? InternalNode o) (= id (.id ^InternalNode o))))
(hashCode [_] (hash id))
IdCheck
(id-check [_ checker] (checker id)))

:cljs
(deftype InternalNode [^long id]
Object
(toString [_] (str "#a/n[" id "]"))

IEquiv
(-equiv [_ o] (and (instance? InternalNode o) (= id (.-id o))))

IHash
(-hash [_] (hash id))

IPrintWithWriter
(-pr-writer [this writer _] (-write writer (str this)))

IdCheck
(id-check [_ checker] (checker id))))

#?(:clj
(defmethod clojure.core/print-method InternalNode [^InternalNode o ^Writer w]
(.write w "#a/n[")
(.write w (str (.id o)))
(.write w "]")))

(defprotocol NodeData
(node-read [data] "Reads an internal node out of data"))

#?(:clj
(extend-protocol NodeData
String
(node-read [s] (InternalNode. (Long/parseLong s)))
clojure.lang.Indexed
(node-read [v] (InternalNode. (nth v 0))))

:cljs
(extend-protocol NodeData
string
(node-read [s] (InternalNode. (long s)))
PersistentVector
(node-read [v] (InternalNode. (nth v 0)))))

;; can set this at a Clojure repl:
;; (set! *data-readers* graph/node-reader)
(def node-reader {'a/n node-read})

#?(:cljs (swap! reader/*tag-table* assoc 'a/n node-read))

;; common implementations of the NodeAPI functions
(def tg-ns "tg")
(def node-prefix "node-")
(def prefix-len (count node-prefix))

;; common implementations of the NodeAPI functions
(defn new-node [] (->> node-prefix gensym name (keyword tg-ns)))
(defn new-node
([] (->> node-prefix gensym name (keyword tg-ns)))
([id] (InternalNode. id)))

(defn node-id [n] (subs (name n) prefix-len))

(defn node-type? [n] (and (keyword? n) (= tg-ns (namespace n)) (string/starts-with? (name n) node-prefix)))
(defn node-type? [n]
(or
(instance? InternalNode n)
(and (keyword? n) (= tg-ns (namespace n)) (string/starts-with? (name n) node-prefix))))

(defn broad-node-type?
[n]
(or
(instance? InternalNode n)
(keyword? n)
(uri? n)
(uuid? n)))

(defn node-label
"Returns a keyword label for a node"
[n]
(keyword tg-ns (str "id-" (node-id n))))

96 changes: 63 additions & 33 deletions src/asami/index.cljc
Original file line number Diff line number Diff line change
@@ -5,26 +5,38 @@
[asami.common-index :as common :refer [? NestedIndex]]
[asami.analytics :as analytics]
[zuko.node :refer [NodeAPI]]
#?(:clj [schema.core :as s]
:cljs [schema.core :as s :include-macros true])))
[zuko.logging :as log :include-macros true]
[schema.core :as s :include-macros true]))

(s/defn index-add :- {s/Any {s/Any #{s/Any}}}
"Add elements to a 3-level index"
[idx :- {s/Any {s/Any #{s/Any}}}
(def Index {s/Any {s/Any {s/Any {(s/required-key :t) s/Int ;transaction id
(s/required-key :id) s/Int}}}}) ;statement id

(s/defn index-add :- Index
"Add elements to a 4-level index.
If triple already exists, returns given index unchanged."
[idx :- Index
a :- s/Any
b :- s/Any
c :- s/Any]
(update-in idx [a b] (fn [v] (if (seq v) (conj v c) #{c}))))
c :- s/Any
id :- s/Int
t :- s/Int]
(if-let [idxb (get idx a)]
(if-let [idxc (get idxb b)]
(if (get idxc c)
idx
(assoc idx a (assoc idxb b (assoc idxc c {:t t :id id}))))
(assoc idx a (assoc idxb b {c {:t t :id id}})))
(assoc idx a {b {c {:t t :id id}}})))

(s/defn index-delete :- (s/maybe {s/Any {s/Any #{s/Any}}})
"Remove elements from a 3-level index. Returns the new index, or nil if there is no change."
[idx :- {s/Any {s/Any #{s/Any}}}
(s/defn index-delete :- (s/maybe Index)
"Remove elements from a 4-level index. Returns the new index, or nil if there is no change."
[idx :- Index
a :- s/Any
b :- s/Any
c :- s/Any]
(if-let [idx2 (idx a)]
(if-let [idx3 (idx2 b)]
(let [new-idx3 (disj idx3 c)]
(let [new-idx3 (dissoc idx3 c)]
(if-not (identical? new-idx3 idx3)
(let [new-idx2 (if (seq new-idx3) (assoc idx2 b new-idx3) (dissoc idx2 b))
new-idx (if (seq new-idx2) (assoc idx a new-idx2) (dissoc idx a))]
@@ -37,14 +49,14 @@

;; Extracts the required index (idx), and looks up the requested fields.
;; If an embedded index is pulled out, then this is referred to as edx.
(defmethod get-from-index [:v :v :v] [{idx :spo} s p o] (if (get-in idx [s p o]) [[]] []))
(defmethod get-from-index [:v :v ?] [{idx :spo} s p o] (map vector (get-in idx [s p])))
(defmethod get-from-index [:v ? :v] [{idx :osp} s p o] (map vector (get-in idx [o s])))
(defmethod get-from-index [:v ? ?] [{idx :spo} s p o] (let [edx (idx s)] (for [p (keys edx) o (edx p)] [p o])))
(defmethod get-from-index [ ? :v :v] [{idx :pos} s p o] (map vector (get-in idx [p o])))
(defmethod get-from-index [ ? :v ?] [{idx :pos} s p o] (let [edx (idx p)] (for [o (keys edx) s (edx o)] [s o])))
(defmethod get-from-index [ ? ? :v] [{idx :osp} s p o] (let [edx (idx o)] (for [s (keys edx) p (edx s)] [s p])))
(defmethod get-from-index [ ? ? ?] [{idx :spo} s p o] (for [s (keys idx) p (keys (idx s)) o ((idx s) p)] [s p o]))
(defmethod get-from-index [:v :v :v] [{idx :spo} s p o] (if (some-> idx (get s) (get p) (get o) keys) [[]] []))
(defmethod get-from-index [:v :v ?] [{idx :spo} s p o] (map vector (some-> idx (get s) (get p) keys)))
(defmethod get-from-index [:v ? :v] [{idx :osp} s p o] (map vector (some-> idx (get o) (get s) keys)))
(defmethod get-from-index [:v ? ?] [{idx :spo} s p o] (let [edx (idx s)] (for [p (keys edx) o ((comp keys edx) p)] [p o])))
(defmethod get-from-index [ ? :v :v] [{idx :pos} s p o] (map vector (some-> idx (get p) (get o) keys)))
(defmethod get-from-index [ ? :v ?] [{idx :pos} s p o] (let [edx (idx p)] (for [o (keys edx) s ((comp keys edx) o)] [s o])))
(defmethod get-from-index [ ? ? :v] [{idx :osp} s p o] (let [edx (idx o)] (for [s (keys edx) p ((comp keys edx) s)] [s p])))
(defmethod get-from-index [ ? ? ?] [{idx :spo} s p o] (for [s (keys idx) p (keys (idx s)) o (keys ((idx s) p))] [s p o]))



@@ -62,29 +74,47 @@

(declare empty-graph)

(defrecord GraphIndexed [spo pos osp]
(defrecord GraphIndexed [spo pos osp next-stmt-id]
NestedIndex
(lowest-level-fn [this] identity)
(lowest-level-sets-fn [this] identity)
(lowest-level-set-fn [this] identity)
(mid-level-map-fn [this] identity)
(lowest-level-fn [this] keys)
(lowest-level-sets-fn [this] (partial map (comp set keys)))
(lowest-level-set-fn [this] (comp set keys))
(mid-level-map-fn [this] #(into {} (map (fn [[k v]] [k (set (keys v))]) %)))

Graph
(new-graph [this] empty-graph)
(graph-add [this subj pred obj]
(let [new-spo (index-add spo subj pred obj)]
(graph-add this subj pred obj gr/*default-tx-id*))
(graph-add [this subj pred obj tx]
(log/trace "insert: " [subj pred obj tx])
(let [id (or (:next-stmt-id this) 1)
new-spo (index-add spo subj pred obj id tx)]
(if (identical? spo new-spo)
this
(do
(log/trace "statement already existed")
this)
(assoc this :spo new-spo
:pos (index-add pos pred obj subj)
:osp (index-add osp obj subj pred)))))
:pos (index-add pos pred obj subj id tx)
:osp (index-add osp obj subj pred id tx)
:next-stmt-id (inc id)))))
(graph-delete [this subj pred obj]
(log/trace "delete " [subj pred obj])
(if-let [idx (index-delete spo subj pred obj)]
(assoc this :spo idx :pos (index-delete pos pred obj subj) :osp (index-delete osp obj subj pred))
this))
(assoc this
:spo idx
:pos (index-delete pos pred obj subj)
:osp (index-delete osp obj subj pred))
(do
(log/trace "statement did not exist")
this)))
(graph-transact [this tx-id assertions retractions]
(common/graph-transact this tx-id assertions retractions (volatile! [[] [] {}])))
(graph-transact [this tx-id assertions retractions generated-data]
(common/graph-transact this tx-id assertions retractions generated-data))
(graph-diff [this other]
(let [s-po (remove (fn [[s po]] (= po (get (:spo other) s)))
spo)]
(when-not (= (type this) (type other))
(throw (ex-info "Unable to compare diffs between graphs of different types" {:this this :other other})))
(let [s-po (remove (fn [[s po]] (= po (get (:spo other) s))) spo)]
(map first s-po)))
(resolve-triple [this subj pred obj]
(if-let [[plain-pred trans-tag] (common/check-for-transitive pred)]
@@ -103,4 +133,4 @@
(node-type? [_ _ n] (gr/node-type? n))
(find-triple [this [e a v]] (resolve-triple this e a v)))

(def empty-graph (->GraphIndexed {} {} {}))
(def empty-graph (->GraphIndexed {} {} {} nil))
53 changes: 52 additions & 1 deletion src/asami/internal.cljc
Original file line number Diff line number Diff line change
@@ -1,8 +1,45 @@
(ns ^{:doc "Common internal elements of storage"
:author "Paula Gearon"}
asami.internal
(:require [asami.graph :as graph]))
(:require [asami.graph :as graph]
[asami.cache :refer [lookup hit miss lru-cache-factory]])
#?(:clj (:import [java.util Date]
[java.time Instant])))

#?(:clj (set! *warn-on-reflection* true))

(defprotocol TimeType
(instant? [this] "Indicates if this object is a time type that supports an instant")
(long-time [this] "Returns a long value as the number of milliseconds since the epoch")
(to-timestamp [this] "Converts to a common time type. Useful for comparison"))

(extend-protocol TimeType
#?(:clj Date :cljs js/Date)
(instant? [_] true)
(long-time [this] (.getTime this))
(to-timestamp [this] this)

#?@(:clj
[Instant
(instant? [_] true)
(long-time [this] (.toEpochMilli this))
(to-timestamp [this] (Date. (.toEpochMilli this)))])

#?(:clj Object :cljs default)
(instant? [_] false)
(long-time [this] (throw (ex-info (str "Unable to convert " (type this) " to a time") {:object this})))
(to-timestamp [this] (throw (ex-info (str "Unable to convert " (type this) " to a time") {:object this}))))

(defn now
"Creates an object to represent the current time"
[]
#?(:clj (Date.)
:cljs (js/Date.)))

(defn instant
"Creates an instant from a long millisecond value"
[^long ms]
#?(:clj (Date. ms) :cljs (js/Date. ms)))

(def project-args {:new-node graph/new-node
:node-label graph/node-label})
@@ -12,3 +49,17 @@
(assoc project-args
:resolve-pattern (partial graph/resolve-pattern graph)))

(defn shallow-cache-1
"Builds a cached version of an arity-1 function that contains only a small number of cached items.
size: the number of results to cache.
f: The arity-1 function to cache results for."
[size f]
(let [cache (atom (lru-cache-factory {} :threshold size))]
(fn [arg]
(if-let [ret (lookup @cache arg)]
(do
(swap! cache hit arg)
ret)
(let [ret (f arg)]
(swap! cache miss arg ret)
ret)))))
166 changes: 101 additions & 65 deletions src/asami/memory.cljc
69 changes: 47 additions & 22 deletions src/asami/multi_graph.cljc
56 changes: 56 additions & 0 deletions src/asami/peer.clj
61 changes: 50 additions & 11 deletions src/asami/planner.cljc
242 changes: 242 additions & 0 deletions src/asami/projection.cljc
330 changes: 249 additions & 81 deletions src/asami/query.cljc
54 changes: 54 additions & 0 deletions src/asami/sandbox.cljc
18 changes: 16 additions & 2 deletions src/asami/storage.cljc
85 changes: 85 additions & 0 deletions test-native/asami/main_test.clj
58 changes: 29 additions & 29 deletions test/asami/test_analytics.cljc → test/asami/analytics_test.cljc
891 changes: 891 additions & 0 deletions test/asami/api_test.cljc
68 changes: 68 additions & 0 deletions test/asami/cache_test.cljc
291 changes: 274 additions & 17 deletions test/asami/test_core_query.cljc → test/asami/core_query_test.cljc
387 changes: 322 additions & 65 deletions test/asami/test_api.cljc → test/asami/durable/api_test.cljc
117 changes: 117 additions & 0 deletions test/asami/durable/block/blockfile_test.clj
156 changes: 156 additions & 0 deletions test/asami/durable/block/blockmanager_test.cljc
32 changes: 32 additions & 0 deletions test/asami/durable/block/test_util.cljc
503 changes: 503 additions & 0 deletions test/asami/durable/block_tree_test.cljc
444 changes: 444 additions & 0 deletions test/asami/durable/codec_test.cljc
115 changes: 115 additions & 0 deletions test/asami/durable/flat_test.cljc
288 changes: 288 additions & 0 deletions test/asami/durable/graph_test.cljc
46 changes: 46 additions & 0 deletions test/asami/durable/idx_codec_test.cljc
207 changes: 207 additions & 0 deletions test/asami/durable/object_codec_test.clj
118 changes: 118 additions & 0 deletions test/asami/durable/pages_test.clj
237 changes: 237 additions & 0 deletions test/asami/durable/pool_test.cljc
422 changes: 422 additions & 0 deletions test/asami/durable/store_test.cljc
135 changes: 135 additions & 0 deletions test/asami/durable/test_utils.cljc
277 changes: 277 additions & 0 deletions test/asami/durable/transitive_test.cljc
525 changes: 525 additions & 0 deletions test/asami/durable/tuples_test.cljc
23 changes: 23 additions & 0 deletions test/asami/entities/helper_stub.cljc
463 changes: 463 additions & 0 deletions test/asami/entities/test_entity.cljc
197 changes: 197 additions & 0 deletions test/asami/multi_graph_test.cljc
6 changes: 3 additions & 3 deletions test/asami/test_planner.cljc → test/asami/planner_test.cljc
16 changes: 16 additions & 0 deletions test/asami/race_condition_test.clj
98 changes: 0 additions & 98 deletions test/asami/test_multi_graph.cljc

This file was deleted.

145 changes: 126 additions & 19 deletions test/asami/test_transitive.cljc → test/asami/transitive_test.cljc
13,713 changes: 13,713 additions & 0 deletions test/resources/pride_and_prejudice.txt
8 changes: 8 additions & 0 deletions update.sh