Skip to content

Commit

Permalink
Merge pull request #1 from codex-semantics-library/hashconsed-maps
Browse files Browse the repository at this point in the history
Hashconsed maps nodes
  • Loading branch information
dlesbre authored May 15, 2024
2 parents bef2736 + 7a92256 commit a702db3
Show file tree
Hide file tree
Showing 8 changed files with 1,477 additions and 501 deletions.
35 changes: 30 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,39 @@
# Unreleased
# v0.10.0 - Unreleased

- Patricia Tree now support using negative keys. Tree are built using the bitwise representation
of integer, meaning they effectively use an unsigned order. Negative keys are
considered bigger than positive keys, `0` is the minimal number and `-1` the maximal one.
## Main changes

- Added hash-consed nodes and functors to build hash-consed maps and sets
- Now support using negative keys, removed `zarith` dependency.
- Fixed some bugs

## Detailed changes

**Breaking changes:**
- Renamed `MakeCustom` to `MakeCustomMap`, added new functor `MakeCustomSet`.
`MakeCustomMap` changed to take a new argument to specify the `'a value` type.
- Renamed `MakeCustomHeterogeneous` to `MakeCustomHeterogeneousMap`, added new functor
`MakeCustomHeterogeneousSet`.
- Renamed `NODE_WITH_ID.get_id` to `NODE_WITH_ID.to_int`, this allows using
instances `NODE_WITH_ID` directly as a `KEY`.
- Renamed `VALUE` to `HETEROGENEOUS_VALUE`, added a `VALUE` module type (previously unnamed).
- Renamed `min_binding`, `max_binding`, `pop_minimum`, `pop_maximum`, `min_elt`
and `max_elt` to `unsigned_min_binding`, `unsigned_max_binding`,
`pop_unsigned_minimum`, `pop_unsigned_maximum`, `unsigned_min_elt`
and `unsigned_max_elt` respectively, to clarify that these functions consider
negative numbers as larger than positive ones.
- Fixed a bug where NodeWithId wasn't incrementing ids properly

**New features:**
- Added new interface `MAP_WITH_VALUE` which is the same as `MAP` but with a custom
type `'a value` instead of just `'a`.
- Added `HashconsedNode`, `HashconsedSetNode` as well as four functors to create
hash-consed heterogeneous/homogeneous maps/sets: `MakeHashconsedMap`, `MakeHashconsedSet`,
`MakeHashconsedHeterogeneousMap` and `MakeHashconsedHeterogeneousSet`.
- Now support using negative keys. Trees are built using the bitwise representation
of integer, meaning they effectively use an unsigned order. Negative keys are
considered bigger than positive keys, `0` is the minimal number and `-1` the maximal one.

**Bug fixes:**
- Fixed a bug where `NodeWithId` wasn't incrementing ids properly
- `zarith` is no longer a dependency, used GCC's `__builtin_clz` as a faster
method of finding an integer's highest bit.
- Fixed a bug where `pop_minimum` and `pop_maximum` could throw a private exception
Expand Down
51 changes: 43 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,11 @@ dune build @doc
be extended to store size information in nodes if needed.
- Exposes a common interface (`view`) to allow users to write their own pattern
matching on the tree structure without depending on the `NODE` being used.
- hash-consed versions of heterogeneous/homogeneous maps/sets are
available. These provide constant time equality and comparison, and ensure
maps/set with the same constants are always physically equal. It comes at the cost
of a constant overhead in memory usage (at worst, as hash-consing may allow memory gains) and constant time overhead
when calling constructors.

## Quick overview

Expand All @@ -110,11 +115,35 @@ module MakeSet(Key: KEY) : SET with type elt = Key.t
module MakeHeterogeneousSet(Key: HETEROGENEOUS_KEY) : HETEROGENEOUS_SET
with type 'a elt = 'a Key.t
module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: VALUE) : HETEROGENEOUS_MAP
module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_VALUE) :
HETEROGENEOUS_MAP
with type 'a key = 'a Key.t
and type ('k,'m) value = ('k,'m) Value.t
```

There are also [hash-consed](https://en.wikipedia.org/wiki/Hash_consing) versions
of these four functors: `MakeHashconsedMap`, `MakeHashconsedSet`,
`MakeHashconsedHeterogeneousMap` and `MakeHashconsedHeterogeneousSet`.
These uniquely number their nodes, and ensure nodes with the same contents are
always physically equal. With this unique numbering:
- `equal` and `compare` become constant time operations;
- two maps with the same bindings (where keys are compared by `KEY.to_int` and
values by `HASHED_VALUE.polyeq`) will always be physically equal;
- functions that benefit from sharing will see improved performance;
- constructors are slightly slower, as they now require a hash-table lookup;
- memory usage is increased: nodes store their tags inside themselves, and
a global hash-table of all built nodes must be maintained;
- hash-consed maps assume their values are immutable;
- **WARNING:** when using physical equality as `HASHED_VALUE.polyeq`,
some maps of different types may be given the same identifier. See the end of
the documentation of `HASHED_VALUE.polyeq` for details.
Note that this is the case in the default implementations `HashedValue`
and `HeterogeneousHashedValue`.
- All hash-consing functors are **generative**, since each functor call will
create a new hash-table to store the created nodes. Calling a functor
twice with same arguments will lead to two numbering systems for identifiers,
and thus the types should not be considered compatible.

### Interfaces

Here is a brief overview of the various module types of our library:
Expand All @@ -135,18 +164,24 @@ Here is a brief overview of the various module types of our library:
These just consist of a type, a (polymorphic) equality function, and an
injective `to_int` coercion.

The heterogeneous map functor also has a `VALUE` parameter to specify the
The heterogeneous map functor also has a `HETEROGENEOUS_VALUE` parameter to specify the
`('a, 'b) value` type
- The internal representations of our tree can be customized to use different
internal `NODE`. Each node come with its own private constructors and destructors,
as well as a cast to a uniform `view` type used for pattern matching.

A number of implementations are provided `SimpleNode` (exactly the `view` type),
`WeakNode` (node which only store weak pointer to its elements), `NodeWithId`
(node which contain a unique identifier), `SetNode` (node optimized for set,
doesn't store the `unit` value) and `WeakSetNode`.

Use the functors `MakeCustomHeterogeneous` and `MakeCustom` to build
A number of implementations are provided:
- `SimpleNode`: exactly the `NODE.view` type;
- `WeakNode`: only store weak pointer to its elements;
- `NodeWithId`: node which contains a unique identifier;
- `SetNode`: optimized for sets, doesn't store the [unit] value;
- `WeakSetNode`: both a `WeakNode` and as `SetNode`
- `HashconsedNode`: performs hash-consing (it also stores a unique identifier, but checks when
building a new node whether a node with similar content already exists);
- `HashconsedSetNode`: both a `HashconsedNode` and a `SetNode`.

Use the functors `MakeCustomMap` and `MakeCustomSet` (or their heterogeneous
versions `MakeCustomHeterogeneousMap` and `MakeCustomHeterogeneousSet`) to build
maps using these nodes, or any other custom nodes.

## Examples
Expand Down
2 changes: 1 addition & 1 deletion dune-project
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

(name patricia-tree)

(version 0.9.0)
(version 0.10.0)

(maintainers "Dorian Lesbre <[email protected]>")

Expand Down
98 changes: 67 additions & 31 deletions index.mld
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This library contains a single module: {!PatriciaTree}.

This is version [0.9.0] of the library. It is known to work with OCaml versions
This is version [0.10.0] of the library. It is known to work with OCaml versions
ranging from [4.14] to [5.2].

This is an {{: https://ocaml.org/}OCaml} library that implements sets and maps as
Expand Down Expand Up @@ -44,8 +44,8 @@ dune build @doc
using the same function names when possible
and the same convention for order of arguments. This should allow switching to
and from Patricia Tree with minimal effort.}
{li The functor parameters ({!PatriciaTree.KEY} module) requires an injective [to_int : t -> int]
function instead of a [compare] function. {!PatriciaTree.KEY.to_int} should be fast,
{li The functor parameters ({{!PatriciaTree.KEY}[KEY]} module) requires an injective [to_int : t -> int]
function instead of a [compare] function. {{!PatriciaTree.KEY.to_int}[KEY.to_int]} should be fast,
and injective.
This works well with {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consed} types.}
{li The Patricia Tree representation is stable, contrary to maps, inserting nodes
Expand All @@ -70,19 +70,24 @@ dune build @doc
by Jan Mitgaard.

It also affects functions like {{!PatriciaTree.BASE_MAP.unsigned_min_binding}[unsigned_min_binding]}
and {{!PatriciaTree.BASE_MAP.pop_unsigned_minimum}[pop_unsigned_minimum}. They will return the smallest
and {{!PatriciaTree.BASE_MAP.pop_unsigned_minimum}[pop_unsigned_minimum]}. They will return the smallest
positive integer of both positive and negative keys are present; and not the smallest negative,
as one might expect.}
{li Supports generic maps and sets: a ['m map] that maps ['k key] to [('k, 'm) value].
This is especially useful when using {{: https://v2.ocaml.org/manual/gadts-tutorial.html}GADTs}
for the type of keys. This is also sometimes called a dependent map.}
{li Allows easy and fast operations across different types of maps and set
which have the same type of keys (e.g. an intersection between a map and a set).}
{li Multiple choices for internal representation ({!PatriciaTree.NODE}), which allows for efficient
{li Multiple choices for internal representation ({{!PatriciaTree.NODE}[NODE]}), which allows for efficient
storage (no need to store a value for sets), or using weak nodes only (values removed from the tree if no other pointer to it exists). This system can also
be extended to store size information in nodes if needed.}
{li Exposes a common interface ({!type:PatriciaTree.NODE.view}) to allow users to write their own pattern
matching on the tree structure without depending on the {!PatriciaTree.NODE} being used.}}
matching on the tree structure without depending on the {{!PatriciaTree.NODE}[NODE]} being used.}
{li Additionally, hashconsed versions of heterogeneous/homogeneous maps/sets are
available. These provide constant time equality and comparison, and ensure
maps/set with the same constants are always physically equal. It comes at the cost
of a constant overhead in memory usage (at worst, as hash-consing may allow
memory gains) and constant time overhead when calling constructors.}}

{1 Quick overview}

Expand All @@ -91,60 +96,91 @@ dune build @doc
This library contains a single module, {!PatriciaTree}.
The functors used to build maps and sets are the following:
{ul
{li For homogeneous (non-generic) maps and sets: {!PatriciaTree.MakeMap} and
{!PatriciaTree.MakeSet}. These are similar to the standard library's maps and sets.
{li For homogeneous (non-generic) maps and sets: {{!PatriciaTree.MakeMap}[MakeMap]} and
{{!PatriciaTree.MakeSet}[MakeSet]}. These are similar to the standard library's maps and sets.
{[
module MakeMap(Key: KEY) : MAP with type key = Key.t
module MakeSet(Key: KEY) : SET with type elt = Key.t
]}}
{li For Heterogeneous (generic) maps and sets: {!PatriciaTree.MakeHeterogeneousMap}
and {!PatriciaTree.MakeHeterogeneousSet}.
{li For Heterogeneous (generic) maps and sets: {{!PatriciaTree.MakeHeterogeneousMap}[MakeHeterogeneousMap]}
and {{!PatriciaTree.MakeHeterogeneousSet}[MakeHeterogeneousSet]}.
{[
module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: VALUE) : HETEROGENEOUS_MAP
module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_VALUE) :
HETEROGENEOUS_MAP
with type 'a key = 'a Key.t
and type ('k,'m) value = ('k,'m) Value.t
module MakeHeterogeneousSet(Key: HETEROGENEOUS_KEY) : HETEROGENEOUS_SET
with type 'a elt = 'a Key.t
]}}
}

{li
There are also {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consed} versions
of these four functors: {{!PatriciaTree.MakeHashconsedMap}[MakeHashconsedMap]}, {{!PatriciaTree.MakeHashconsedSet}[MakeHashconsedSet]},
{{!PatriciaTree.MakeHashconsedHeterogeneousMap}[MakeHashconsedHeterogeneousMap]} and {{!PatriciaTree.MakeHashconsedHeterogeneousSet}[MakeHashconsedHeterogeneousSet]}.
These uniquely number their nodes, and ensures {b nodes with the same contents are
always physically equal}. With this unique numbering:
- [equal] and [compare] become constant time operations;
- two maps with the same bindings (where keys are compared by {{!PatriciaTree.KEY.to_int}[KEY.to_int]} and
values by {{!PatriciaTree.HASHED_VALUE.polyeq}[HASHED_VALUE.polyeq]}) will always be physically equal;
- functions that benefit from sharing will see improved performance;
- constructors are slightly slower, as they now require a hash-table lookup;
- memory usage is increased: nodes store their tags inside themselves, and
a global hash-table of all built nodes must be maintained;
- hash-consed maps assume their values are immutable;
- {b WARNING:} when using physical equality as {{!PatriciaTree.HASHED_VALUE.polyeq}[HASHED_VALUE.polyeq]}, some maps of different
types may be given the same identifier. See the end of
the documentation of {{!PatriciaTree.HASHED_VALUE.polyeq}[HASHED_VALUE.polyeq]} for details.
Note that this is the case in the default implementations
{{!PatriciaTree.HashedValue}[HashedValue]}
and {{!PatriciaTree.HeterogeneousHashedValue}[HeterogeneousHashedValue]}.
- All hash-consing functors are {b generative}, since each functor call will
create a new hash-table to store the created nodes. Calling a functor
twice with same arguments will lead to two numbering systems for identifiers,
and thus the types should not be considered compatible.
}}

{2 Interfaces}

Here is a brief overview of the various module types of our library:
{ul
{li {!PatriciaTree.BASE_MAP}: the underlying module type of all our trees (maps end sets). It
{li {{!PatriciaTree.BASE_MAP}[BASE_MAP]}: the underlying module type of all our trees (maps end sets). It
represents a ['b map] binding ['a key] to [('a,'b) value], as well as all
functions needed to manipulate them.

It can be accessed from any of the more specific maps types, thus providing a
unified representation, useful for cross map operations. However, for practical
purposes, it is often best to use the more specific interfaces:
{ul
{li {!PatriciaTree.HETEROGENEOUS_MAP} for heterogeneous maps (this is just [BASE_MAP] with a
{li {{!PatriciaTree.HETEROGENEOUS_MAP}[HETEROGENEOUS_MAP]} for heterogeneous maps (this is just {{!PatriciaTree.BASE_MAP}[BASE_MAP]} with a
[WithForeign] functor).}
{li {!PatriciaTree.MAP} for homogeneous maps, this interface is close to {{: https://ocaml.org/api/Map.S.html}[Stdlib.Map.S]}.}
{li {!PatriciaTree.HETEROGENEOUS_SET} for heterogeneous sets (sets of ['a elt]). These are just
{li {{!PatriciaTree.MAP}[MAP]} for homogeneous maps, this interface is close to {{: https://ocaml.org/api/Map.S.html}[Stdlib.Map.S]}.}
{li {{!PatriciaTree.HETEROGENEOUS_SET}[HETEROGENEOUS_SET]} for heterogeneous sets (sets of ['a elt]). These are just
maps to [unit], but with a custom node representation to avoid storing [unit] in
nodes.}
{li {!PatriciaTree.SET} for homogeneous sets, this interface is close to {{: https://ocaml.org/api/Set.S.html}[Stdlib.Set.S]}.}
{li {{!PatriciaTree.SET}[SET]} for homogeneous sets, this interface is close to {{: https://ocaml.org/api/Set.S.html}[Stdlib.Set.S]}.}
}}
{li The parameter of our functor are either {!PatriciaTree.KEY} or {!PatriciaTree.HETEROGENEOUS_KEY}.
{li The parameter of our functor are either {{!PatriciaTree.KEY}[KEY]} or {{!PatriciaTree.HETEROGENEOUS_KEY}[HETEROGENEOUS_KEY]}.
These just consist of a type, a (polymorphic) equality function, and an
injective [to_int] coercion.

The heterogeneous map functor also has a {!PatriciaTree.VALUE} parameter to specify the
The heterogeneous map functor also has a {{!PatriciaTree.HETEROGENEOUS_VALUE}[HETEROGENEOUS_VALUE]} parameter to specify the
[('a, 'b) value] type.}
{li The internal representations of our tree can be customized to use different
internal {!PatriciaTree.NODE}. Each node come with its own private constructors and destructors,
as well as a cast to a uniform {!type:PatriciaTree.NODE.view} type used for pattern matching.

A number of implementations are provided {!PatriciaTree.SimpleNode} (exactly the {!type:PatriciaTree.NODE.view} type),
{!PatriciaTree.WeakNode} (node which only store weak pointer to its elements), {!PatriciaTree.NodeWithId}
(node which contain a unique identifier), {!PatriciaTree.SetNode} (node optimized for set,
doesn't store the [unit] value) and {!PatriciaTree.WeakSetNode}.

Use the functors {!PatriciaTree.MakeCustomHeterogeneous} and {!PatriciaTree.MakeCustom} to build
internal {{!PatriciaTree.NODE}[NODE]}. Each node come with its own private constructors and destructors,
as well as a cast to a uniform {{!type:PatriciaTree.NODE.view}[NODE.view]} type used for pattern matching.

A number of implementations are provided:
- {{!PatriciaTree.SimpleNode}[SimpleNode]}: exactly the {{!type:PatriciaTree.NODE.view}[NODE.view]} type;
- {{!PatriciaTree.WeakNode}[WeakNode]}: only store weak pointer to its elements;
- {{!PatriciaTree.NodeWithId}[NodeWithId]}: node which contains a unique identifier;
- {{!PatriciaTree.SetNode}[SetNode]}: optimized for sets, doesn't store the [unit] value;
- {{!PatriciaTree.WeakSetNode}[WeakSetNode]}: both a {{!PatriciaTree.WeakNode}[WeakNode]} and a {{!PatriciaTree.SetNode}[SetNode]}
- {{!PatriciaTree.HashconsedNode}[HashconsedNode]}: performs hash-consing (it also stores a unique identifier, but checks when
building a new node whether a node with similar content already exists);
- {{!PatriciaTree.HashconsedSetNode}[HashconsedSetNode]}: both a {{!PatriciaTree.HashconsedNode}[HashconsedNode]} and a {{!PatriciaTree.SetNode}[SetNode]}.

Use the functors {{!PatriciaTree.MakeCustomMap}[MakeCustomMap]} and {{!PatriciaTree.MakeCustomSet}[MakeCustomSet]}
(or their heterogeneous versions {{!PatriciaTree.MakeCustomHeterogeneousMap}[MakeCustomHeterogeneousMap]} and
{{!PatriciaTree.MakeCustomHeterogeneousSet}[MakeCustomHeterogeneousSet]}) to build
maps using these nodes, or any other custom nodes.}
}

Expand Down Expand Up @@ -297,8 +333,8 @@ These are smaller and closer to OCaml's built-in [Map] and [Set], however:
- Our interface and implementation tries to maximize the sharing between different
versions of the tree, and to benefit from this memory sharing. Theirs do not.
- These libraries work with older version of OCaml ([>= 4.05] I believe), whereas
ours requires OCaml [>= 4.14] (for the new interface of [Ephemeron] used in
{!PatriciaTree.WeakNode}).
ours requires OCaml [>= 4.14] (for the new interface of {{: https://v2.ocaml.org/api/Ephemeron.html}[Ephemeron]} used in
{{!PatriciaTree.WeakNode}[WeakNode]}).

{2 dmap}

Expand Down
2 changes: 1 addition & 1 deletion patricia-tree.opam
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file is generated by dune, edit dune-project instead
opam-version: "2.0"
version: "0.9.0"
version: "0.10.0"
synopsis:
"Patricia Tree data structure in OCaml for maps and sets. Supports generic key-value pairs"
maintainer: ["Dorian Lesbre <[email protected]>"]
Expand Down
Loading

0 comments on commit a702db3

Please sign in to comment.