diff --git a/CHANGELOG.md b/CHANGELOG.md index 06dd05b..31c7aad 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,14 +1,39 @@ -# Unreleased +# v0.10.0 - Unreleased -- Patricia Tree now support using negative keys. Tree are built using the bitwise representation - of integer, meaning they effectively use an unsigned order. Negative keys are - considered bigger than positive keys, `0` is the minimal number and `-1` the maximal one. +## Main changes + +- Added hash-consed nodes and functors to build hash-consed maps and sets +- Now support using negative keys, removed `zarith` dependency. +- Fixed some bugs + +## Detailed changes + +**Breaking changes:** +- Renamed `MakeCustom` to `MakeCustomMap`, added new functor `MakeCustomSet`. + `MakeCustomMap` changed to take a new argument to specify the `'a value` type. +- Renamed `MakeCustomHeterogeneous` to `MakeCustomHeterogeneousMap`, added new functor + `MakeCustomHeterogeneousSet`. +- Renamed `NODE_WITH_ID.get_id` to `NODE_WITH_ID.to_int`, this allows using + instances `NODE_WITH_ID` directly as a `KEY`. +- Renamed `VALUE` to `HETEROGENEOUS_VALUE`, added a `VALUE` module type (previously unnamed). - Renamed `min_binding`, `max_binding`, `pop_minimum`, `pop_maximum`, `min_elt` and `max_elt` to `unsigned_min_binding`, `unsigned_max_binding`, `pop_unsigned_minimum`, `pop_unsigned_maximum`, `unsigned_min_elt` and `unsigned_max_elt` respectively, to clarify that these functions consider negative numbers as larger than positive ones. -- Fixed a bug where NodeWithId wasn't incrementing ids properly + +**New features:** +- Added new interface `MAP_WITH_VALUE` which is the same as `MAP` but with a custom + type `'a value` instead of just `'a`. +- Added `HashconsedNode`, `HashconsedSetNode` as well as four functors to create + hash-consed heterogeneous/homogeneous maps/sets: `MakeHashconsedMap`, `MakeHashconsedSet`, + `MakeHashconsedHeterogeneousMap` and `MakeHashconsedHeterogeneousSet`. +- Now support using negative keys. Trees are built using the bitwise representation + of integer, meaning they effectively use an unsigned order. Negative keys are + considered bigger than positive keys, `0` is the minimal number and `-1` the maximal one. + +**Bug fixes:** +- Fixed a bug where `NodeWithId` wasn't incrementing ids properly - `zarith` is no longer a dependency, used GCC's `__builtin_clz` as a faster method of finding an integer's highest bit. - Fixed a bug where `pop_minimum` and `pop_maximum` could throw a private exception diff --git a/README.md b/README.md index 7016594..b5fe4fc 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,11 @@ dune build @doc be extended to store size information in nodes if needed. - Exposes a common interface (`view`) to allow users to write their own pattern matching on the tree structure without depending on the `NODE` being used. +- hash-consed versions of heterogeneous/homogeneous maps/sets are + available. These provide constant time equality and comparison, and ensure + maps/set with the same constants are always physically equal. It comes at the cost + of a constant overhead in memory usage (at worst, as hash-consing may allow memory gains) and constant time overhead + when calling constructors. ## Quick overview @@ -110,11 +115,35 @@ module MakeSet(Key: KEY) : SET with type elt = Key.t module MakeHeterogeneousSet(Key: HETEROGENEOUS_KEY) : HETEROGENEOUS_SET with type 'a elt = 'a Key.t -module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: VALUE) : HETEROGENEOUS_MAP +module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_VALUE) : + HETEROGENEOUS_MAP with type 'a key = 'a Key.t and type ('k,'m) value = ('k,'m) Value.t ``` +There are also [hash-consed](https://en.wikipedia.org/wiki/Hash_consing) versions +of these four functors: `MakeHashconsedMap`, `MakeHashconsedSet`, +`MakeHashconsedHeterogeneousMap` and `MakeHashconsedHeterogeneousSet`. +These uniquely number their nodes, and ensure nodes with the same contents are +always physically equal. With this unique numbering: +- `equal` and `compare` become constant time operations; +- two maps with the same bindings (where keys are compared by `KEY.to_int` and + values by `HASHED_VALUE.polyeq`) will always be physically equal; +- functions that benefit from sharing will see improved performance; +- constructors are slightly slower, as they now require a hash-table lookup; +- memory usage is increased: nodes store their tags inside themselves, and + a global hash-table of all built nodes must be maintained; +- hash-consed maps assume their values are immutable; +- **WARNING:** when using physical equality as `HASHED_VALUE.polyeq`, + some maps of different types may be given the same identifier. See the end of + the documentation of `HASHED_VALUE.polyeq` for details. + Note that this is the case in the default implementations `HashedValue` + and `HeterogeneousHashedValue`. +- All hash-consing functors are **generative**, since each functor call will + create a new hash-table to store the created nodes. Calling a functor + twice with same arguments will lead to two numbering systems for identifiers, + and thus the types should not be considered compatible. + ### Interfaces Here is a brief overview of the various module types of our library: @@ -135,18 +164,24 @@ Here is a brief overview of the various module types of our library: These just consist of a type, a (polymorphic) equality function, and an injective `to_int` coercion. - The heterogeneous map functor also has a `VALUE` parameter to specify the + The heterogeneous map functor also has a `HETEROGENEOUS_VALUE` parameter to specify the `('a, 'b) value` type - The internal representations of our tree can be customized to use different internal `NODE`. Each node come with its own private constructors and destructors, as well as a cast to a uniform `view` type used for pattern matching. - A number of implementations are provided `SimpleNode` (exactly the `view` type), - `WeakNode` (node which only store weak pointer to its elements), `NodeWithId` - (node which contain a unique identifier), `SetNode` (node optimized for set, - doesn't store the `unit` value) and `WeakSetNode`. - - Use the functors `MakeCustomHeterogeneous` and `MakeCustom` to build + A number of implementations are provided: + - `SimpleNode`: exactly the `NODE.view` type; + - `WeakNode`: only store weak pointer to its elements; + - `NodeWithId`: node which contains a unique identifier; + - `SetNode`: optimized for sets, doesn't store the [unit] value; + - `WeakSetNode`: both a `WeakNode` and as `SetNode` + - `HashconsedNode`: performs hash-consing (it also stores a unique identifier, but checks when + building a new node whether a node with similar content already exists); + - `HashconsedSetNode`: both a `HashconsedNode` and a `SetNode`. + + Use the functors `MakeCustomMap` and `MakeCustomSet` (or their heterogeneous + versions `MakeCustomHeterogeneousMap` and `MakeCustomHeterogeneousSet`) to build maps using these nodes, or any other custom nodes. ## Examples diff --git a/dune-project b/dune-project index 621fb83..7e65e45 100644 --- a/dune-project +++ b/dune-project @@ -23,7 +23,7 @@ (name patricia-tree) -(version 0.9.0) +(version 0.10.0) (maintainers "Dorian Lesbre ") diff --git a/index.mld b/index.mld index 1085e0d..b69727a 100644 --- a/index.mld +++ b/index.mld @@ -2,7 +2,7 @@ This library contains a single module: {!PatriciaTree}. -This is version [0.9.0] of the library. It is known to work with OCaml versions +This is version [0.10.0] of the library. It is known to work with OCaml versions ranging from [4.14] to [5.2]. This is an {{: https://ocaml.org/}OCaml} library that implements sets and maps as @@ -44,8 +44,8 @@ dune build @doc using the same function names when possible and the same convention for order of arguments. This should allow switching to and from Patricia Tree with minimal effort.} -{li The functor parameters ({!PatriciaTree.KEY} module) requires an injective [to_int : t -> int] - function instead of a [compare] function. {!PatriciaTree.KEY.to_int} should be fast, +{li The functor parameters ({{!PatriciaTree.KEY}[KEY]} module) requires an injective [to_int : t -> int] + function instead of a [compare] function. {{!PatriciaTree.KEY.to_int}[KEY.to_int]} should be fast, and injective. This works well with {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consed} types.} {li The Patricia Tree representation is stable, contrary to maps, inserting nodes @@ -70,7 +70,7 @@ dune build @doc by Jan Mitgaard. It also affects functions like {{!PatriciaTree.BASE_MAP.unsigned_min_binding}[unsigned_min_binding]} - and {{!PatriciaTree.BASE_MAP.pop_unsigned_minimum}[pop_unsigned_minimum}. They will return the smallest + and {{!PatriciaTree.BASE_MAP.pop_unsigned_minimum}[pop_unsigned_minimum]}. They will return the smallest positive integer of both positive and negative keys are present; and not the smallest negative, as one might expect.} {li Supports generic maps and sets: a ['m map] that maps ['k key] to [('k, 'm) value]. @@ -78,11 +78,16 @@ dune build @doc for the type of keys. This is also sometimes called a dependent map.} {li Allows easy and fast operations across different types of maps and set which have the same type of keys (e.g. an intersection between a map and a set).} -{li Multiple choices for internal representation ({!PatriciaTree.NODE}), which allows for efficient +{li Multiple choices for internal representation ({{!PatriciaTree.NODE}[NODE]}), which allows for efficient storage (no need to store a value for sets), or using weak nodes only (values removed from the tree if no other pointer to it exists). This system can also be extended to store size information in nodes if needed.} {li Exposes a common interface ({!type:PatriciaTree.NODE.view}) to allow users to write their own pattern - matching on the tree structure without depending on the {!PatriciaTree.NODE} being used.}} + matching on the tree structure without depending on the {{!PatriciaTree.NODE}[NODE]} being used.} +{li Additionally, hashconsed versions of heterogeneous/homogeneous maps/sets are + available. These provide constant time equality and comparison, and ensure + maps/set with the same constants are always physically equal. It comes at the cost + of a constant overhead in memory usage (at worst, as hash-consing may allow + memory gains) and constant time overhead when calling constructors.}} {1 Quick overview} @@ -91,29 +96,53 @@ dune build @doc This library contains a single module, {!PatriciaTree}. The functors used to build maps and sets are the following: {ul -{li For homogeneous (non-generic) maps and sets: {!PatriciaTree.MakeMap} and - {!PatriciaTree.MakeSet}. These are similar to the standard library's maps and sets. +{li For homogeneous (non-generic) maps and sets: {{!PatriciaTree.MakeMap}[MakeMap]} and + {{!PatriciaTree.MakeSet}[MakeSet]}. These are similar to the standard library's maps and sets. {[ module MakeMap(Key: KEY) : MAP with type key = Key.t module MakeSet(Key: KEY) : SET with type elt = Key.t ]}} -{li For Heterogeneous (generic) maps and sets: {!PatriciaTree.MakeHeterogeneousMap} - and {!PatriciaTree.MakeHeterogeneousSet}. +{li For Heterogeneous (generic) maps and sets: {{!PatriciaTree.MakeHeterogeneousMap}[MakeHeterogeneousMap]} + and {{!PatriciaTree.MakeHeterogeneousSet}[MakeHeterogeneousSet]}. {[ - module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: VALUE) : HETEROGENEOUS_MAP + module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_VALUE) : + HETEROGENEOUS_MAP with type 'a key = 'a Key.t and type ('k,'m) value = ('k,'m) Value.t module MakeHeterogeneousSet(Key: HETEROGENEOUS_KEY) : HETEROGENEOUS_SET with type 'a elt = 'a Key.t ]}} -} - +{li + There are also {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consed} versions + of these four functors: {{!PatriciaTree.MakeHashconsedMap}[MakeHashconsedMap]}, {{!PatriciaTree.MakeHashconsedSet}[MakeHashconsedSet]}, + {{!PatriciaTree.MakeHashconsedHeterogeneousMap}[MakeHashconsedHeterogeneousMap]} and {{!PatriciaTree.MakeHashconsedHeterogeneousSet}[MakeHashconsedHeterogeneousSet]}. + These uniquely number their nodes, and ensures {b nodes with the same contents are + always physically equal}. With this unique numbering: + - [equal] and [compare] become constant time operations; + - two maps with the same bindings (where keys are compared by {{!PatriciaTree.KEY.to_int}[KEY.to_int]} and + values by {{!PatriciaTree.HASHED_VALUE.polyeq}[HASHED_VALUE.polyeq]}) will always be physically equal; + - functions that benefit from sharing will see improved performance; + - constructors are slightly slower, as they now require a hash-table lookup; + - memory usage is increased: nodes store their tags inside themselves, and + a global hash-table of all built nodes must be maintained; + - hash-consed maps assume their values are immutable; + - {b WARNING:} when using physical equality as {{!PatriciaTree.HASHED_VALUE.polyeq}[HASHED_VALUE.polyeq]}, some maps of different + types may be given the same identifier. See the end of + the documentation of {{!PatriciaTree.HASHED_VALUE.polyeq}[HASHED_VALUE.polyeq]} for details. + Note that this is the case in the default implementations + {{!PatriciaTree.HashedValue}[HashedValue]} + and {{!PatriciaTree.HeterogeneousHashedValue}[HeterogeneousHashedValue]}. + - All hash-consing functors are {b generative}, since each functor call will + create a new hash-table to store the created nodes. Calling a functor + twice with same arguments will lead to two numbering systems for identifiers, + and thus the types should not be considered compatible. +}} {2 Interfaces} Here is a brief overview of the various module types of our library: {ul -{li {!PatriciaTree.BASE_MAP}: the underlying module type of all our trees (maps end sets). It +{li {{!PatriciaTree.BASE_MAP}[BASE_MAP]}: the underlying module type of all our trees (maps end sets). It represents a ['b map] binding ['a key] to [('a,'b) value], as well as all functions needed to manipulate them. @@ -121,30 +150,37 @@ Here is a brief overview of the various module types of our library: unified representation, useful for cross map operations. However, for practical purposes, it is often best to use the more specific interfaces: {ul - {li {!PatriciaTree.HETEROGENEOUS_MAP} for heterogeneous maps (this is just [BASE_MAP] with a + {li {{!PatriciaTree.HETEROGENEOUS_MAP}[HETEROGENEOUS_MAP]} for heterogeneous maps (this is just {{!PatriciaTree.BASE_MAP}[BASE_MAP]} with a [WithForeign] functor).} - {li {!PatriciaTree.MAP} for homogeneous maps, this interface is close to {{: https://ocaml.org/api/Map.S.html}[Stdlib.Map.S]}.} - {li {!PatriciaTree.HETEROGENEOUS_SET} for heterogeneous sets (sets of ['a elt]). These are just + {li {{!PatriciaTree.MAP}[MAP]} for homogeneous maps, this interface is close to {{: https://ocaml.org/api/Map.S.html}[Stdlib.Map.S]}.} + {li {{!PatriciaTree.HETEROGENEOUS_SET}[HETEROGENEOUS_SET]} for heterogeneous sets (sets of ['a elt]). These are just maps to [unit], but with a custom node representation to avoid storing [unit] in nodes.} - {li {!PatriciaTree.SET} for homogeneous sets, this interface is close to {{: https://ocaml.org/api/Set.S.html}[Stdlib.Set.S]}.} + {li {{!PatriciaTree.SET}[SET]} for homogeneous sets, this interface is close to {{: https://ocaml.org/api/Set.S.html}[Stdlib.Set.S]}.} }} -{li The parameter of our functor are either {!PatriciaTree.KEY} or {!PatriciaTree.HETEROGENEOUS_KEY}. +{li The parameter of our functor are either {{!PatriciaTree.KEY}[KEY]} or {{!PatriciaTree.HETEROGENEOUS_KEY}[HETEROGENEOUS_KEY]}. These just consist of a type, a (polymorphic) equality function, and an injective [to_int] coercion. - The heterogeneous map functor also has a {!PatriciaTree.VALUE} parameter to specify the + The heterogeneous map functor also has a {{!PatriciaTree.HETEROGENEOUS_VALUE}[HETEROGENEOUS_VALUE]} parameter to specify the [('a, 'b) value] type.} {li The internal representations of our tree can be customized to use different - internal {!PatriciaTree.NODE}. Each node come with its own private constructors and destructors, - as well as a cast to a uniform {!type:PatriciaTree.NODE.view} type used for pattern matching. - - A number of implementations are provided {!PatriciaTree.SimpleNode} (exactly the {!type:PatriciaTree.NODE.view} type), - {!PatriciaTree.WeakNode} (node which only store weak pointer to its elements), {!PatriciaTree.NodeWithId} - (node which contain a unique identifier), {!PatriciaTree.SetNode} (node optimized for set, - doesn't store the [unit] value) and {!PatriciaTree.WeakSetNode}. - - Use the functors {!PatriciaTree.MakeCustomHeterogeneous} and {!PatriciaTree.MakeCustom} to build + internal {{!PatriciaTree.NODE}[NODE]}. Each node come with its own private constructors and destructors, + as well as a cast to a uniform {{!type:PatriciaTree.NODE.view}[NODE.view]} type used for pattern matching. + + A number of implementations are provided: + - {{!PatriciaTree.SimpleNode}[SimpleNode]}: exactly the {{!type:PatriciaTree.NODE.view}[NODE.view]} type; + - {{!PatriciaTree.WeakNode}[WeakNode]}: only store weak pointer to its elements; + - {{!PatriciaTree.NodeWithId}[NodeWithId]}: node which contains a unique identifier; + - {{!PatriciaTree.SetNode}[SetNode]}: optimized for sets, doesn't store the [unit] value; + - {{!PatriciaTree.WeakSetNode}[WeakSetNode]}: both a {{!PatriciaTree.WeakNode}[WeakNode]} and a {{!PatriciaTree.SetNode}[SetNode]} + - {{!PatriciaTree.HashconsedNode}[HashconsedNode]}: performs hash-consing (it also stores a unique identifier, but checks when + building a new node whether a node with similar content already exists); + - {{!PatriciaTree.HashconsedSetNode}[HashconsedSetNode]}: both a {{!PatriciaTree.HashconsedNode}[HashconsedNode]} and a {{!PatriciaTree.SetNode}[SetNode]}. + + Use the functors {{!PatriciaTree.MakeCustomMap}[MakeCustomMap]} and {{!PatriciaTree.MakeCustomSet}[MakeCustomSet]} + (or their heterogeneous versions {{!PatriciaTree.MakeCustomHeterogeneousMap}[MakeCustomHeterogeneousMap]} and + {{!PatriciaTree.MakeCustomHeterogeneousSet}[MakeCustomHeterogeneousSet]}) to build maps using these nodes, or any other custom nodes.} } @@ -297,8 +333,8 @@ These are smaller and closer to OCaml's built-in [Map] and [Set], however: - Our interface and implementation tries to maximize the sharing between different versions of the tree, and to benefit from this memory sharing. Theirs do not. - These libraries work with older version of OCaml ([>= 4.05] I believe), whereas - ours requires OCaml [>= 4.14] (for the new interface of [Ephemeron] used in - {!PatriciaTree.WeakNode}). + ours requires OCaml [>= 4.14] (for the new interface of {{: https://v2.ocaml.org/api/Ephemeron.html}[Ephemeron]} used in + {{!PatriciaTree.WeakNode}[WeakNode]}). {2 dmap} diff --git a/patricia-tree.opam b/patricia-tree.opam index d47812a..ddc68a1 100644 --- a/patricia-tree.opam +++ b/patricia-tree.opam @@ -1,6 +1,6 @@ # This file is generated by dune, edit dune-project instead opam-version: "2.0" -version: "0.9.0" +version: "0.10.0" synopsis: "Patricia Tree data structure in OCaml for maps and sets. Supports generic key-value pairs" maintainer: ["Dorian Lesbre "] diff --git a/patriciaTree.ml b/patriciaTree.ml index d15484b..afb6a4d 100644 --- a/patriciaTree.ml +++ b/patriciaTree.ml @@ -50,7 +50,13 @@ end module type NODE_WITH_ID = sig include NODE - val get_id: 'a t -> int + val to_int: 'a t -> int +end + +module type HASH_CONSED_NODE = sig + include NODE_WITH_ID + val equal : 'a t -> 'a t -> bool + val compare : 'a t -> 'a t -> int end module type BASE_MAP = sig @@ -276,87 +282,87 @@ end type (_, 'b) snd = Snd of 'b [@@unboxed] -(** The signature for maps with a single type for keys and values. *) -module type MAP = sig +module type MAP_WITH_VALUE = sig type key type 'a t + type 'a value module BaseMap : HETEROGENEOUS_MAP with type 'a t = 'a t and type _ key = key - and type ('a,'b) value = ('a,'b) snd + and type ('a,'b) value = ('a,'b value) snd val empty : 'a t val is_empty : 'a t -> bool - val unsigned_min_binding : 'a t -> (key * 'a) - val unsigned_max_binding : 'a t -> (key * 'a) - val singleton : key -> 'a -> 'a t + val unsigned_min_binding : 'a t -> (key * 'a value) + val unsigned_max_binding : 'a t -> (key * 'a value) + val singleton : key -> 'a value -> 'a t val cardinal : 'a t -> int - val is_singleton : 'a t -> (key * 'a) option - val find : key -> 'a t -> 'a - val find_opt : key -> 'a t -> 'a option + val is_singleton : 'a t -> (key * 'a value) option + val find : key -> 'a t -> 'a value + val find_opt : key -> 'a t -> 'a value option val mem : key -> 'a t -> bool val remove : key -> 'a t -> 'a t - val pop_unsigned_minimum : 'a t -> (key * 'a * 'a t) option - val pop_unsigned_maximum : 'a t -> (key * 'a * 'a t) option - val insert : key -> ('a option -> 'a) -> 'a t -> 'a t - val update : key -> ('a option -> 'a option) -> 'a t -> 'a t - val add : key -> 'a -> 'a t -> 'a t - val split : key -> 'a t -> 'a t * 'a option * 'a t - val iter : (key -> 'a -> unit) -> 'a t -> unit - val fold : (key -> 'a -> 'acc -> 'acc) -> 'a t -> 'acc -> 'acc - val fold_on_nonequal_inter : (key -> 'a -> 'a -> 'acc -> 'acc) -> + val pop_unsigned_minimum : 'a t -> (key * 'a value * 'a t) option + val pop_unsigned_maximum : 'a t -> (key * 'a value * 'a t) option + val insert : key -> ('a value option -> 'a value) -> 'a t -> 'a t + val update : key -> ('a value option -> 'a value option) -> 'a t -> 'a t + val add : key -> 'a value -> 'a t -> 'a t + val split : key -> 'a t -> 'a t * 'a value option * 'a t + val iter : (key -> 'a value -> unit) -> 'a t -> unit + val fold : (key -> 'a value -> 'acc -> 'acc) -> 'a t -> 'acc -> 'acc + val fold_on_nonequal_inter : (key -> 'a value -> 'a value -> 'acc -> 'acc) -> 'a t -> 'a t -> 'acc -> 'acc val fold_on_nonequal_union : - (key -> 'a option -> 'a option -> 'acc -> 'acc) -> + (key -> 'a value option -> 'a value option -> 'acc -> 'acc) -> 'a t -> 'a t -> 'acc -> 'acc - val filter : (key -> 'a -> bool) -> 'a t -> 'a t - val for_all : (key -> 'a -> bool) -> 'a t -> bool - val map : ('a -> 'a) -> 'a t -> 'a t - val map_no_share : ('a -> 'b) -> 'a t -> 'b t - val mapi : (key -> 'a -> 'a) -> 'a t -> 'a t - val mapi_no_share : (key -> 'a -> 'b) -> 'a t -> 'b t - val filter_map : (key -> 'a -> 'a option) -> 'a t -> 'a t - val filter_map_no_share : (key -> 'a -> 'b option) -> 'a t -> 'b t - val reflexive_same_domain_for_all2 : (key -> 'a -> 'a -> bool) -> 'a t -> 'a t -> bool - val nonreflexive_same_domain_for_all2 : (key -> 'a -> 'b -> bool) -> 'a t -> 'b t -> bool - val reflexive_subset_domain_for_all2 : (key -> 'a -> 'a -> bool) -> 'a t -> 'a t -> bool - val idempotent_union : (key -> 'a -> 'a -> 'a) -> 'a t -> 'a t -> 'a t - val idempotent_inter : (key -> 'a -> 'a -> 'a) -> 'a t -> 'a t -> 'a t - val nonidempotent_inter_no_share : (key -> 'a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t - val idempotent_inter_filter : (key -> 'a -> 'a -> 'a option) -> 'a t -> 'a t -> 'a t - val slow_merge : (key -> 'a option -> 'b option -> 'c option) -> 'a t -> 'b t -> 'c t + val filter : (key -> 'a value -> bool) -> 'a t -> 'a t + val for_all : (key -> 'a value -> bool) -> 'a t -> bool + val map : ('a value -> 'a value) -> 'a t -> 'a t + val map_no_share : ('a value -> 'b value) -> 'a t -> 'b t + val mapi : (key -> 'a value -> 'a value) -> 'a t -> 'a t + val mapi_no_share : (key -> 'a value -> 'b value) -> 'a t -> 'b t + val filter_map : (key -> 'a value -> 'a value option) -> 'a t -> 'a t + val filter_map_no_share : (key -> 'a value -> 'b value option) -> 'a t -> 'b t + val reflexive_same_domain_for_all2 : (key -> 'a value -> 'a value -> bool) -> 'a t -> 'a t -> bool + val nonreflexive_same_domain_for_all2 : (key -> 'a value -> 'b value -> bool) -> 'a t -> 'b t -> bool + val reflexive_subset_domain_for_all2 : (key -> 'a value -> 'a value -> bool) -> 'a t -> 'a t -> bool + val idempotent_union : (key -> 'a value -> 'a value -> 'a value) -> 'a t -> 'a t -> 'a t + val idempotent_inter : (key -> 'a value -> 'a value -> 'a value) -> 'a t -> 'a t -> 'a t + val nonidempotent_inter_no_share : (key -> 'a value -> 'b value -> 'c value) -> 'a t -> 'b t -> 'c t + val idempotent_inter_filter : (key -> 'a value -> 'a value -> 'a value option) -> 'a t -> 'a t -> 'a t + val slow_merge : (key -> 'a value option -> 'b value option -> 'c value option) -> 'a t -> 'b t -> 'c t val disjoint : 'a t -> 'a t -> bool module WithForeign(Map2 : BASE_MAP with type _ key = key):sig - type ('b,'c) polyfilter_map_foreign = { f: 'a. key -> ('a,'b) Map2.value -> 'c option } [@@unboxed] + type ('b,'c) polyfilter_map_foreign = { f: 'a. key -> ('a,'b) Map2.value -> 'c value option } [@@unboxed] val filter_map_no_share : ('b, 'c) polyfilter_map_foreign -> 'b Map2.t -> 'c t type ('value,'map2) polyinter_foreign = - { f: 'a. 'a Map2.key -> 'value -> ('a, 'map2) Map2.value -> 'value } [@@unboxed] + { f: 'a. 'a Map2.key -> 'value value -> ('a, 'map2) Map2.value -> 'value value } [@@unboxed] val nonidempotent_inter : ('a, 'b) polyinter_foreign -> 'a t -> 'b Map2.t -> 'a t - type ('map1,'map2) polyupdate_multiple = { f: 'a. key -> 'map1 option -> ('a,'map2) Map2.value -> 'map1 option } [@@unboxed] + type ('map1,'map2) polyupdate_multiple = { f: 'a. key -> 'map1 value option -> ('a,'map2) Map2.value -> 'map1 value option } [@@unboxed] val update_multiple_from_foreign : 'b Map2.t -> ('a,'b) polyupdate_multiple -> 'a t -> 'a t - type ('map1,'map2) polyupdate_multiple_inter = { f: 'a. key -> 'map1 -> ('a,'map2) Map2.value -> 'map1 option } [@@unboxed] - val update_multiple_from_inter_with_foreign : 'b Map2.t -> ('a,'b) polyupdate_multiple_inter -> 'a t -> 'a t + type ('map1,'map2) polyupdate_multiple_inter = { f: 'a. key -> 'map1 value -> ('a,'map2) Map2.value -> 'map1 value option } [@@unboxed] + val update_multiple_from_inter_with_foreign: 'b Map2.t -> ('a,'b) polyupdate_multiple_inter -> 'a t -> 'a t end val pretty : ?pp_sep:(Format.formatter -> unit -> unit) -> - (Format.formatter -> key -> 'a -> unit) -> + (Format.formatter -> key -> 'a value -> unit) -> Format.formatter -> 'a t -> unit - val to_seq : 'a t -> (key * 'a) Seq.t - val to_rev_seq : 'a t -> (key * 'a) Seq.t - val add_seq : (key * 'a) Seq.t -> 'a t -> 'a t - val of_seq : (key * 'a) Seq.t -> 'a t - - val of_list : (key * 'a) list -> 'a t - val to_list : 'a t -> (key * 'a) list + val to_seq : 'a t -> (key * 'a value) Seq.t + val to_rev_seq : 'a t -> (key * 'a value) Seq.t + val add_seq : (key * 'a value) Seq.t -> 'a t -> 'a t + val of_seq : (key * 'a value) Seq.t -> 'a t + val of_list : (key * 'a value) list -> 'a t + val to_list : 'a t -> (key * 'a value) list end +module type MAP = MAP_WITH_VALUE with type 'a value = 'a (** {2 Keys and Value} *) @@ -373,7 +379,9 @@ module type HETEROGENEOUS_KEY = sig val polyeq: 'a t -> 'b t -> ('a,'b) cmp end -module type VALUE = sig +module type VALUE = sig type 'a t end + +module type HETEROGENEOUS_VALUE = sig type ('key,'map) t end @@ -401,7 +409,7 @@ let mask i m = i land (lnot (2*m-1)) (** {1 Nodes} *) (** Simple node, with no hash consing. *) -module [@inline] SimpleNode(Key:sig type 'a t end)(Value:VALUE) = struct +module [@inline] SimpleNode(Key:sig type 'a t end)(Value:HETEROGENEOUS_VALUE) = struct type 'a key = 'a Key.t type ('key,'map) value = ('key,'map) Value.t @@ -422,7 +430,7 @@ module [@inline] SimpleNode(Key:sig type 'a t end)(Value:VALUE) = struct | _ -> Branch{prefix;branching_bit;tree0;tree1} end -module WeakNode(Key:sig type 'a t end)(Value:VALUE)(* :NODE *) = struct +module WeakNode(Key:sig type 'a t end)(Value:HETEROGENEOUS_VALUE)(* :NODE *) = struct type 'a key = 'a Key.t type ('key,'map) value = ('key,'map) Value.t @@ -465,7 +473,7 @@ end (** Add a unique id to nodes, e.g. so that they can be used as keys in maps or sets. *) -module NodeWithId(Key:sig type 'a t end)(Value:VALUE):NODE_WITH_ID +module NodeWithId(Key:sig type 'a t end)(Value:HETEROGENEOUS_VALUE):NODE_WITH_ID with type 'key key = 'key Key.t and type ('key,'map) value = ('key,'map) Value.t = struct @@ -487,12 +495,12 @@ module NodeWithId(Key:sig type 'a t end)(Value:VALUE):NODE_WITH_ID | NBranch{prefix;branching_bit;tree0;tree1;_} -> Branch{prefix;branching_bit;tree0;tree1} | NLeaf{key;value;_} -> Leaf{key;value} - let get_id = function + let to_int = function | NEmpty -> 0 | NBranch{id;_} -> id | NLeaf{id;_} -> id - let count = ref 0;; + let count = ref 0 let empty = NEmpty let is_empty x = x == NEmpty @@ -502,7 +510,6 @@ module NodeWithId(Key:sig type 'a t end)(Value:VALUE):NODE_WITH_ID | NEmpty, x -> x | x, NEmpty -> x | _ -> incr count; NBranch{prefix;branching_bit;tree0;tree1;id=(!count)} - end @@ -579,25 +586,227 @@ module WeakSetNode(Key:sig type 'a t end)(* :NODE *) = struct end -module MakeCustomHeterogeneous +let sdbm x y = y + (x lsl 16) + (x lsl 6) - x +(** Combine two numbers into a new hash *) + +module type HETEROGENEOUS_HASHED_VALUE = sig + include HETEROGENEOUS_VALUE + + val hash : ('a, 'b) t -> int + val polyeq : ('a, 'b) t -> ('a, 'c) t -> bool +end + +module type HASHED_VALUE = sig + type 'map t + + val hash : 'map t -> int + val polyeq : 'a t -> 'b t -> bool +end + +module HeterogeneousHashedValueFromHashedValue(Value: HASHED_VALUE) +: HETEROGENEOUS_HASHED_VALUE with type ('a, 'map) t = ('a, 'map Value.t) snd = struct + type ('a, 'map) t = ('a, 'map Value.t) snd + let hash (Snd x) = Value.hash x + let polyeq (Snd a) (Snd b) = Value.polyeq a b +end + +module HashconsedNode(Key:HETEROGENEOUS_KEY)(Value:HETEROGENEOUS_HASHED_VALUE)() +(* : HASH_CONSED_NODE + with type 'key key = 'key Key.t + and type ('key, 'map) value = ('key, 'map) Value.t *) += struct + + type 'a key = 'a Key.t + type ('key, 'map) value = ('key, 'map) Value.t + + type 'map view = + | Empty: 'map view + | Branch: { prefix:intkey; branching_bit:mask; tree0:'map t; tree1:'map t } -> 'map view + | Leaf: { key:'key key; value:('key,'map) value } -> 'map view + and 'map t = + | NEmpty: 'map t + | NBranch: { prefix:intkey; branching_bit:mask; tree0:'map t; tree1:'map t; id:int } -> 'map t + | NLeaf: { key:'key key; value:('key, 'map) Value.t; id:int } -> 'map t + + let view = function + | NEmpty -> Empty + | NBranch{prefix;branching_bit;tree0;tree1;_} -> Branch{prefix;branching_bit;tree0;tree1} + | NLeaf{key;value;_} -> Leaf{key;value} + + let to_int = function + | NEmpty -> 0 + | NBranch{ id; _ } -> id + | NLeaf{ id; _ } -> id + + let count = ref 1 (** Start at 1 as we increment in post *) + + type any_map = AnyMap : 'a t -> any_map [@@unboxed] + + module HashArg = struct + type t = any_map + let equal (AnyMap a) (AnyMap b) = match a, b with + | NEmpty, NEmpty -> true + | NLeaf{key=key1;value=value1;_}, NLeaf{key=key2;value=value2;_} -> + begin match Key.polyeq key1 key2 with + | Eq -> Value.polyeq value1 value2 + | Diff -> false + end + | NBranch{prefix=prefixa;branching_bit=branching_bita;tree0=tree0a;tree1=tree1a;_}, + NBranch{prefix=prefixb;branching_bit=branching_bitb;tree0=tree0b;tree1=tree1b;_} -> + prefixa == prefixb && branching_bita == branching_bitb && + to_int tree0a = to_int tree0b && to_int tree1a = to_int tree1b + | _ -> false + + let hash (AnyMap x) = match x with + | NEmpty -> 0 + | NLeaf{key; value; _} -> + let hash = sdbm (Key.to_int key) (Value.hash value) in + (hash lsl 1) lor 1 + (* All leaf hashes are odd *) + | NBranch{prefix; branching_bit; tree0; tree1; _} -> (* All branch hashes are even *) + (sdbm (prefix lor branching_bit) @@ sdbm (to_int tree0) (to_int tree1)) lsl 1 + end + + module WeakHash = Weak.Make(HashArg) + + let weakh = WeakHash.create 120 + + let empty = NEmpty + let is_empty x = x == NEmpty + + let try_find (tentative : 'a t) = + let AnyMap x = WeakHash.merge weakh (AnyMap tentative) in + let x : 'a t = Obj.magic x in + if x == tentative then incr count; + x + + let leaf key value = try_find (NLeaf{key;value;id= !count}) + + let branch ~prefix ~branching_bit ~tree0 ~tree1 = + match tree0,tree1 with + | NEmpty, x -> x + | x, NEmpty -> x + | _ -> try_find (NBranch{prefix;branching_bit;tree0;tree1;id=(!count)}) + + let equal x y = x == y + let compare x y = Int.compare (to_int x) (to_int y) +end + +module HashconsedSetNode(Key:HETEROGENEOUS_KEY)(): HASH_CONSED_NODE + with type 'key key = 'key Key.t + and type ('key,'map) value = unit += struct + + type 'a key = 'a Key.t + type ('key,'map) value = unit + + type map = + | NEmpty: map + | NBranch: { prefix:intkey; branching_bit:mask; tree0:map; tree1:map; id:int } -> map + | NLeaf: { key:'key key; id:int } -> map + type 'map view = + | Empty: 'map view + | Branch: { prefix:intkey; branching_bit:mask; tree0:'map t; tree1:'map t } -> 'map view + | Leaf: { key:'key key; value:unit } -> 'map view + and _ t = map + + let view = function + | NEmpty -> Empty + | NBranch{prefix;branching_bit;tree0;tree1;_} -> Branch{prefix;branching_bit;tree0;tree1} + | NLeaf{ key; _ } -> Leaf{ key; value=() } + + let to_int = function + | NEmpty -> 0 + | NBranch{ id; _ } -> id + | NLeaf{ id; _ } -> id + + let count = ref 1 (** Start at 1 as we increment in post *) + + module HashArg = struct + type t = map + let equal a b = match a, b with + | NEmpty, NEmpty -> true + | NLeaf{key=key1;_}, NLeaf{key=key2;_} -> + begin match Key.polyeq key1 key2 with + | Eq -> true + | Diff -> false + end + | NBranch{prefix=prefixa;branching_bit=branching_bita;tree0=tree0a;tree1=tree1a;_}, + NBranch{prefix=prefixb;branching_bit=branching_bitb;tree0=tree0b;tree1=tree1b;_} -> + prefixa == prefixb && branching_bita == branching_bitb && + tree0a == tree0b && tree1a == tree1b + | _ -> false + + let hash a = match a with + | NEmpty -> 0 + | NLeaf{key; _} -> ((Key.to_int key) lsl 1) lor 1 (* All leaf hashes are odd *) + | NBranch{prefix; branching_bit; tree0; tree1; _} -> (* All branch hashes are even *) + (sdbm (prefix lor branching_bit) @@ sdbm (to_int tree0) (to_int tree1)) lsl 1 + end + + module WeakHash = Weak.Make(HashArg) + + let weakh = WeakHash.create 120 + + let empty = NEmpty + let is_empty x = x == NEmpty + + let try_find tentative = + let x = WeakHash.merge weakh tentative in + if x == tentative then incr count; + x + + let leaf key () = try_find (NLeaf{ key; id = !count }) + + let branch ~prefix ~branching_bit ~tree0 ~tree1 = + match tree0,tree1 with + | NEmpty, x -> x + | x, NEmpty -> x + | _ -> try_find (NBranch{prefix;branching_bit;tree0;tree1;id=(!count)}) + + let equal x y = x == y + let compare x y = Int.compare (to_int x) (to_int y) +end + +(** {1 Keys and values} *) + +module HomogeneousValue = struct + type ('a,'map) t = 'map +end + +module WrappedHomogeneousValue = struct + type ('a, 'map) t = ('a, 'map) snd +end + +module HeterogeneousKeyFromKey(Key:KEY):(HETEROGENEOUS_KEY with type 'a t = Key.t) = struct + type _ t = Key.t + + (** The type-safe way to do it would be to define this type, to + guarantee that 'a is always bound to the same type, and Eq is + safe. But this requires a lot of conversion code, and identity + functions that may not be well detected. [polyeq] is unsafe in + that it allows arbitrary conversion of t1 by t2 in t1 t, but + this unsafety is not exported, and I don't think we can do + something wrong using it. *) + (* type 'a t = K: Key.t -> unit t [@@unboxed] *) + let polyeq: type a b. a t -> b t -> (a,b) cmp = + fun a b -> match a,b with + | a, b when (Key.to_int a) == (Key.to_int b) -> Obj.magic Eq + | _ -> Diff + let to_int = Key.to_int +end + + +(** {1 Functors} *) + +module MakeCustomHeterogeneousMap (Key:HETEROGENEOUS_KEY) - (Value:VALUE) + (Value:HETEROGENEOUS_VALUE) (NODE:NODE with type 'a key = 'a Key.t and type ('key,'map) value = ('key,'map) Value.t) : HETEROGENEOUS_MAP with type 'a key = 'a Key.t and type ('key,'map) value = ('key,'map) Value.t and type 'a t = 'a NODE.t = struct - - (* We provide two versions: with or without hash-consing. Hash-consing - allows faster implementations for the fold_on_diff* operations. - Benchmarks seems to indicate that hashconsing and the more complex - fold_on_diff are not very useful in practice (perhaps they would on - huge structures?) *) - - (* With hash-consing of interior nodes: slower node construction, but - faster comparison with fold_on_diff. *) - - (* module NODE = TNoHashCons;; *) include NODE type 'map key_value_pair = KeyValue: 'a Key.t * ('a,'map) value -> 'map key_value_pair @@ -1491,13 +1700,11 @@ module MakeCustomHeterogeneous let to_list m = List.of_seq (to_seq m) end - -(* TODO: We should make it a functor, so that we can simplify the - interface for set independently from how it is constructed. *) -module MakeHeterogeneousSet(Key:HETEROGENEOUS_KEY) : HETEROGENEOUS_SET - with type 'a elt = 'a Key.t = struct - module NODE = SetNode(Key) - module BaseMap = MakeCustomHeterogeneous(Key)(struct type ('a,'b) t = unit end)(NODE) +module MakeCustomHeterogeneousSet + (Key:HETEROGENEOUS_KEY) + (Node:NODE with type 'a key = 'a Key.t and type ('a, 'b) value = unit) +: HETEROGENEOUS_SET with type 'a elt = 'a Key.t and type 'a BaseMap.t = 'a Node.t = struct + module BaseMap = MakeCustomHeterogeneousMap(Key)(struct type ('a,'b) t = unit end)(Node) (* No need to differentiate the values. *) include BaseMap @@ -1563,47 +1770,25 @@ module MakeHeterogeneousSet(Key:HETEROGENEOUS_KEY) : HETEROGENEOUS_SET let to_list s = List.of_seq (to_seq s) end -module MakeHeterogeneousMap(Key:HETEROGENEOUS_KEY)(Value:VALUE) = - MakeCustomHeterogeneous(Key)(Value)(SimpleNode(Key)(Value)) - - - -module HomogeneousValue = struct - type ('a,'map) t = 'map -end - -module WrappedHomogeneousValue = struct - type ('a, 'map) t = ('a, 'map) snd -end - -module HeterogeneousKeyFromKey(Key:KEY):(HETEROGENEOUS_KEY with type 'a t = Key.t) = struct - type 'a t = Key.t +module MakeHeterogeneousMap(Key:HETEROGENEOUS_KEY)(Value:HETEROGENEOUS_VALUE) = + MakeCustomHeterogeneousMap(Key)(Value)(SimpleNode(Key)(Value)) - (** The type-safe way to do it would be to define this type, to - guarantee that 'a is always bound to the same type, and Eq is - safe. But this requires a lot of conversion code, and identity - functions that may not be well detected. [polyeq] is unsafe in - that it allows arbitrary conversion of t1 by t2 in t1 t, but - this unsafety is not exported, and I don't think we can do - something wrong using it. *) - (* type 'a t = K: Key.t -> unit t [@@unboxed] *) - let polyeq: type a b. a t -> b t -> (a,b) cmp = - fun a b -> match a,b with - | a, b when (Key.to_int a) == (Key.to_int b) -> Obj.magic Eq - | _ -> Diff - let to_int = Key.to_int -end +module MakeHeterogeneousSet(Key:HETEROGENEOUS_KEY) = + MakeCustomHeterogeneousSet(Key)(SetNode(Key)) -module MakeCustom +module MakeCustomMap (Key:KEY) - (NODE:NODE with type 'a key = Key.t and type ('key,'map) value = ('key,'map) snd) + (Value: VALUE) + (NODE:NODE with type 'a key = Key.t and type ('key,'map) value = ('key,'map Value.t) snd) = struct module NewKey(* :Key *) = HeterogeneousKeyFromKey(Key) - module BaseMap = MakeCustomHeterogeneous(NewKey)(WrappedHomogeneousValue)(NODE) + module BaseMap = MakeCustomHeterogeneousMap + (NewKey)(struct type ('key,'map) t = ('key,'map Value.t) snd end)(NODE) include BaseMap type key = Key.t + type 'a value = 'a Value.t let snd_opt = function | None -> None @@ -1635,65 +1820,65 @@ module MakeCustom | None -> None | Some(KeyValue(k,Snd v)) -> Some(k,v) - let filter (f: key -> 'a -> bool) m = BaseMap.filter {f = fun k (Snd v) -> f k v} m + let filter (f: key -> 'a value -> bool) m = BaseMap.filter {f = fun k (Snd v) -> f k v} m let map f a = BaseMap.map {f = fun (Snd v) -> Snd (f v)} a let map_no_share f a = BaseMap.map_no_share {f = fun (Snd v) -> Snd (f v)} a - let mapi (f : key -> 'a -> 'a) a = BaseMap.mapi {f = fun k (Snd v) -> Snd (f k v)} a - let mapi_no_share (f : key -> 'a -> 'b) a = BaseMap.mapi_no_share {f = fun k (Snd v) -> Snd (f k v)} a - let filter_map (f: key -> 'a -> 'a option) a = + let mapi (f : key -> 'a value -> 'a value) a = BaseMap.mapi {f = fun k (Snd v) -> Snd (f k v)} a + let mapi_no_share (f : key -> 'a value -> 'b value) a = BaseMap.mapi_no_share {f = fun k (Snd v) -> Snd (f k v)} a + let filter_map (f: key -> 'a value -> 'a value option) a = BaseMap.filter_map {f=fun k (Snd v) -> snd_opt (f k v) } a - let filter_map_no_share (f: key -> 'a -> 'b option) a = + let filter_map_no_share (f: key -> 'a value -> 'b value option) a = BaseMap.filter_map_no_share {f=fun k (Snd v) -> snd_opt (f k v) } a - let idempotent_union (f: key -> 'a -> 'a -> 'a) a b = + let idempotent_union (f: key -> 'a value -> 'a value -> 'a value) a b = BaseMap.idempotent_union {f=fun k (Snd v1) (Snd v2) -> Snd (f k v1 v2)} a b - let idempotent_inter (f: key -> 'a -> 'a -> 'a) a b = + let idempotent_inter (f: key -> 'a value -> 'a value -> 'a value) a b = BaseMap.idempotent_inter {f=fun k (Snd v1) (Snd v2) -> Snd (f k v1 v2)} a b - let nonidempotent_inter_no_share (f: key -> 'a -> 'b -> 'c) a b = + let nonidempotent_inter_no_share (f: key -> 'a value -> 'b value -> 'c value) a b = BaseMap.nonidempotent_inter_no_share {f=fun k (Snd v1) (Snd v2) -> Snd (f k v1 v2)} a b - let idempotent_inter_filter (f: key -> 'a -> 'a -> 'a option) a b = + let idempotent_inter_filter (f: key -> 'a value -> 'a value -> 'a value option) a b = BaseMap.idempotent_inter_filter {f=fun k (Snd v1) (Snd v2) -> snd_opt (f k v1 v2)} a b - let reflexive_same_domain_for_all2 (f: key -> 'a -> 'a -> bool) a b = + let reflexive_same_domain_for_all2 (f: key -> 'a value -> 'a value -> bool) a b = BaseMap.reflexive_same_domain_for_all2 {f=fun k (Snd v1) (Snd v2) -> f k v1 v2} a b - let nonreflexive_same_domain_for_all2 (f: key -> 'a -> 'b -> bool) a b = + let nonreflexive_same_domain_for_all2 (f: key -> 'a value -> 'b value -> bool) a b = BaseMap.nonreflexive_same_domain_for_all2 {f=fun k (Snd v1) (Snd v2) -> f k v1 v2} a b - let reflexive_subset_domain_for_all2 (f: key -> 'a -> 'a -> bool) a b = + let reflexive_subset_domain_for_all2 (f: key -> 'a value -> 'a value -> bool) a b = BaseMap.reflexive_subset_domain_for_all2 {f=fun k (Snd v1) (Snd v2) -> f k v1 v2} a b - let slow_merge (f : key -> 'a option -> 'b option -> 'c option) a b = BaseMap.slow_merge {f=fun k v1 v2 -> snd_opt (f k (opt_snd v1) (opt_snd v2))} a b - let iter (f: key -> 'a -> unit) a = BaseMap.iter {f=fun k (Snd v) -> f k v} a - let fold (f: key -> 'a -> 'acc) m acc = BaseMap.fold {f=fun k (Snd v) acc -> f k v acc} m acc - let fold_on_nonequal_inter (f: key -> 'a -> 'b -> 'acc) ma mb acc = + let slow_merge (f : key -> 'a value option -> 'b value option -> 'c value option) a b = BaseMap.slow_merge {f=fun k v1 v2 -> snd_opt (f k (opt_snd v1) (opt_snd v2))} a b + let iter (f: key -> 'a value -> unit) a = BaseMap.iter {f=fun k (Snd v) -> f k v} a + let fold (f: key -> 'a value -> 'acc -> 'acc) m acc = BaseMap.fold {f=fun k (Snd v) acc -> f k v acc} m acc + let fold_on_nonequal_inter (f: key -> 'a value -> 'a value -> 'acc -> 'acc) ma mb acc = let f k (Snd va) (Snd vb) acc = f k va vb acc in BaseMap.fold_on_nonequal_inter {f} ma mb acc let fold_on_nonequal_union - (f: key -> 'a option -> 'b option -> 'acc) ma mb acc = + (f: key -> 'a value option -> 'a value option -> 'acc -> 'acc) ma mb acc = let f k va vb acc = let va = Option.map (fun (Snd v) -> v) va in let vb = Option.map (fun (Snd v) -> v) vb in f k va vb acc in BaseMap.fold_on_nonequal_union {f} ma mb acc - let pretty ?pp_sep (f: Format.formatter -> key -> 'a -> unit) fmt m = + let pretty ?pp_sep (f: Format.formatter -> key -> 'a value -> unit) fmt m = BaseMap.pretty ?pp_sep {f=fun fmt k (Snd v) -> f fmt k v} fmt m - let for_all (f : key -> 'a -> bool) m = BaseMap.for_all {f = fun k (Snd v) -> f k v} m + let for_all (f : key -> 'a value -> bool) m = BaseMap.for_all {f = fun k (Snd v) -> f k v} m module WithForeign(Map2 : BASE_MAP with type _ key = key) = struct module BaseForeign = BaseMap.WithForeign(Map2) - type ('b,'c) polyfilter_map_foreign = { f: 'a. key -> ('a,'b) Map2.value -> 'c option } [@@unboxed] + type ('b,'c) polyfilter_map_foreign = { f: 'a. key -> ('a,'b) Map2.value -> 'c value option } [@@unboxed] let filter_map_no_share f m2 = BaseForeign.filter_map_no_share { f=fun k v-> snd_opt (f.f k v)} m2 type ('value,'map2) polyinter_foreign = - { f: 'a. 'a Map2.key -> 'value -> ('a, 'map2) Map2.value -> 'value } [@@unboxed] + { f: 'a. 'a Map2.key -> 'value value -> ('a, 'map2) Map2.value -> 'value value } [@@unboxed] let nonidempotent_inter f m1 m2 = BaseForeign.nonidempotent_inter {f = fun k (Snd v) v2 -> Snd (f.f k v v2)} m1 m2 - type ('map1,'map2) polyupdate_multiple = { f: 'a. key -> 'map1 option -> ('a,'map2) Map2.value -> 'map1 option } [@@unboxed] + type ('map1,'map2) polyupdate_multiple = { f: 'a. key -> 'map1 value option -> ('a,'map2) Map2.value -> 'map1 value option } [@@unboxed] let update_multiple_from_foreign m2 f m = BaseForeign.update_multiple_from_foreign m2 {f = fun k v1 v2 -> snd_opt (f.f k (opt_snd v1) v2)} m - type ('map1,'map2) polyupdate_multiple_inter = { f: 'a. key -> 'map1 -> ('a,'map2) Map2.value -> 'map1 option } [@@unboxed] + type ('map1,'map2) polyupdate_multiple_inter = { f: 'a. key -> 'map1 value -> ('a,'map2) Map2.value -> 'map1 value option } [@@unboxed] let update_multiple_from_inter_with_foreign m2 f m = BaseForeign.update_multiple_from_inter_with_foreign m2 {f = fun k (Snd v1) v2 -> snd_opt (f.f k v1 v2)} m end @@ -1707,15 +1892,31 @@ module MakeCustom let to_list s = List.of_seq (to_seq s) end +module Value : VALUE with type 'a t = 'a = struct type 'a t = 'a end + +module HashedValue : HASHED_VALUE with type 'a t = 'a = struct + include Value + let hash x = Hashtbl.hash x + let polyeq: type a b. a -> b -> bool = fun a b -> a == Obj.magic b +end +module HeterogeneousHashedValue : HETEROGENEOUS_HASHED_VALUE with type ('k, 'm) t = 'm = struct + include HomogeneousValue + let hash x = Hashtbl.hash x + let polyeq: type a b. a -> b -> bool = fun a b -> a == Obj.magic b +end + module MakeMap(Key: KEY) = struct module NKey = struct type 'a t = Key.t end - module NODE = SimpleNode(NKey)(WrappedHomogeneousValue) - include MakeCustom(Key)(NODE) + module Node = SimpleNode(NKey)(WrappedHomogeneousValue) + include MakeCustomMap(Key)(Value)(Node) end -module MakeSet(Key: KEY) : SET with type elt = Key.t = struct +module MakeCustomSet + (Key: KEY) + (Node:NODE with type 'a key = Key.t and type ('key,'map) value = unit) +: SET with type elt = Key.t and type 'a BaseMap.t = 'a Node.t = struct module HKey = HeterogeneousKeyFromKey(Key) - module S = MakeHeterogeneousSet(HKey) + module S = MakeCustomHeterogeneousSet(HKey)(Node) include S type key = Key.t type elt = key @@ -1745,3 +1946,41 @@ module MakeSet(Key: KEY) : SET with type elt = Key.t = struct let of_list l = of_seq (List.to_seq l) let to_list s = List.of_seq (to_seq s) end + +module MakeSet(Key: KEY) = MakeCustomSet(Key)(SetNode(HeterogeneousKeyFromKey(Key))) + +module MakeHashconsedHeterogeneousMap(Key:HETEROGENEOUS_KEY)(Value:HETEROGENEOUS_HASHED_VALUE)() = struct + module Node = HashconsedNode(Key)(Value)() + include MakeCustomHeterogeneousMap(Key)(Value)(Node) + + let equal = Node.equal + let compare = Node.compare + let to_int = Node.to_int +end + +module MakeHashconsedHeterogeneousSet(Key:HETEROGENEOUS_KEY)() = struct + module Node = HashconsedSetNode(Key)() + include MakeCustomHeterogeneousSet(Key)(Node) + + let equal = Node.equal + let compare = Node.compare + let to_int = Node.to_int +end + +module MakeHashconsedSet(Key : KEY)() = struct + module Node = HashconsedSetNode(HeterogeneousKeyFromKey(Key))() + include MakeCustomSet(Key)(Node) + let equal = Node.equal + let compare = Node.compare + let to_int = Node.to_int +end + +module MakeHashconsedMap(Key: KEY)(Value: HASHED_VALUE)() = struct + module HetValue = HeterogeneousHashedValueFromHashedValue(Value) + module Node = HashconsedNode(HeterogeneousKeyFromKey(Key))(HetValue)() + include MakeCustomMap(Key)(Value)(Node) + + let equal = Node.equal + let compare = Node.compare + let to_int = Node.to_int +end diff --git a/patriciaTree.mli b/patriciaTree.mli index f083651..fcc62b4 100644 --- a/patriciaTree.mli +++ b/patriciaTree.mli @@ -27,15 +27,33 @@ This is similar to OCaml's Map, except that: - - The required signature for keys is different, in that we require - each key to be mapped to a unique integer identifier. + {ul + {- The required signature for keys is different, in that we require + each key to be mapped to a unique integer identifier.} - - The implementation uses Patricia Tree, as described in Oksasaki + {- The implementation uses Patricia Tree, as described in Okasaki and Gill's 1998 paper {{: https://www.semanticscholar.org/paper/Fast-Mergeable-Integer-Maps-Okasaki-Gill/23003be706e5f586f23dd7fa5b2a410cc91b659d}{i Fast mergeable integer maps}}, i.e. it is a space-efficient prefix trie over the big-endian representation of the key's integer identifier. + Example 5-bit patricia tree containing five numbers: 0 [0b0000], 1 [0b0001], + 5 [0b0101] and 7 [0b0111] and -8 [0b1111]: + {v + Branch + (prefix=0b?___) + / \ + Branch Leaf(-8) + (prefix=0b0?__) 0b1111 + / \ + Branch Branch + (prefix=0b000?) (prefix=0b01?_) + | | | | + Leaf(0) Leaf(1) Leaf(5) Leaf(7) + 0b0000 0b0001 0b0101 0b0111 + v} + + The main benefit of Patricia Tree is that their representation is stable (contrary to maps, inserting nodes in any order will return the same shape), which allows different versions of a map @@ -43,18 +61,19 @@ maps to benefit from this sharing. The functions in this library attempt to maximally preserve sharing and benefit from sharing, allowing very important improvements in complexity and running - time when combining maps or sets is a frequent operation. + time when combining maps or sets is a frequent operation.} - - Finally, the implementation is more customizable, allowing - notably (key,value) pairs or different types to be in the same map, - or to choose the memory representation of the nodes of the tree. + {- Finally, the implementation is more customizable, allowing + notably (key,value) pairs or different types to be in the same map, + or to choose the memory representation of the nodes of the tree.} - - Some operations like {{!BASE_MAP.pop_unsigned_minimum}[pop_unsigned_minimum]} and + {- Some operations like {{!BASE_MAP.pop_unsigned_minimum}[pop_unsigned_minimum]} and {{!BASE_MAP.pop_unsigned_maximum}[pop_unsigned_maximum]} make our Set suitable as priority queue (but remember that each element in the queue must map to a distinct integer, and that using the {{!unsigned_lt}unsigned order} means elements with negative priority are seen as greater than elements with - positive ones). *) + positive ones).} + } *) (** Note on complexity: in the following, n represents the size of the map when there is one (and [|map1|] is the number of elements in @@ -62,14 +81,10 @@ tree, which is log(n) if we assume an even distribution of numbers in the map (e.g. random distribution, or integers chosen contiguously using a counter). The worst-case height is - O(max(n,64)) which is actually constant, but not really + O(min(n,64)) which is actually constant, but not really informative; log(n) corresponds to the real complexity in usual distributions. *) - -type intkey -type mask - val unsigned_lt : int -> int -> bool (** All integers comparisons in this library are done according to their {b unsigned representation}. This is the same as signed comparison for same @@ -92,8 +107,20 @@ val unsigned_lt : int -> int -> bool - bool : false ]} + Using this unsigned order helps avoid a bug described in + {{: https://www.cs.tufts.edu/comp/150FP/archive/jan-midtgaard/qc-patricia.pdf}{i QuickChecking Patricia Trees}} + by Jan Mitgaard. + @since 0.10.0 *) + +type intkey = private int +(** Private type used to represent prefix stored in nodes. + These are integers with all bits after branching bit (included) set to zero *) + +type mask = private int +(** Private type: integers with a single bit set. *) + (**/**) val highest_bit : int -> (int[@untagged]) @@ -160,10 +187,14 @@ module type NODE = sig (** Can happen only at the toplevel: there is no empty interior node. *) | Branch : { prefix : intkey; branching_bit : mask; tree0 : 'map t; tree1 : 'map t; } -> 'map view - (** Branching bit contains only one bit set; the corresponding - mask is (branching_bit - 1). The prefixes are normalized: the - bits below the branching bit are set to zero (i.e. prefix & - (branching_bit - 1) = 0). *) + (** Same constraints as {!branch}: + - [branching_bit] contains only one bit set; the corresponding mask is (branching_bit - 1). + - [prefix] is normalized: the bits below the [branching_bit] are set to zero + (i.e. [prefix & (branching_bit - 1) = 0]). + - All elements of [tree0] should have their [to_int] start by + [prefix] followed by 0 at position [branching_bit]). + - All elements of [tree1] should have their [to_int] start by + [prefix] followed by 0 at position [branching_bit]). *) | Leaf : { key : 'key key; value : ('key, 'map) value; } -> 'map view (** A key -> value mapping. *) @@ -174,10 +205,60 @@ module type NODE = sig (** Convert the map to a view. Should be constant time. *) end -(** Associate a unique number to each node. *) +(** Associate a unique number to each node, so they can be used as keys in sets or maps. *) module type NODE_WITH_ID = sig - include NODE - val get_id: 'a t -> int + include NODE (** @closed *) + + val to_int: 'a t -> int + (** Unique number for each node. + + This is not {{!hash_consed}hash-consing}. + Equal nodes created separately will have different + identifiers. On the flip side, nodes with equal identifiers will always be + physically equal. *) +end + +(** Hash-consed nodes also associate a unique number to each node, + Unlike {!NODE_WITH_ID}, they also check before instanciating the node whether + a similar node already exists. This results in slightly slower constructors + (they perform an extra hash-table lookup), but allows for constant time + equality and comparison. + + See {!hash_consed} for a details on strengths and limits of hash-consing. + + @since v0.10.0 *) +module type HASH_CONSED_NODE = sig + include NODE (** @closed *) + + val to_int : 'a t -> int + (** Returns a unique number for each map, the {{!hash_consed}hash-consed} identifier of the map. + Unlike {!NODE_WITH_ID.to_int}, hash-consing ensures that maps + which contain the same keys (compared by {!KEY.to_int}) and values (compared + by {!HASHED_VALUE.polyeq}) will always be physically equal + and have the same identifier. + + Maps with the same identifier are also physically equal: + [to_int m1 = to_int m2] implies [m1 == m2]. + + Note that when using physical equality as {!HASHED_VALUE.polyeq}, some + maps of different types [a t] and [b t] may be given the same identifier. + See the end of the documentation of {!HASHED_VALUE.polyeq} for details. *) + + val equal : 'a t -> 'a t -> bool + (** Constant time equality using the {{!hash_consed}hash-consed} nodes identifiers. + This is equivalent to physical equality. + Two nodes are equal if their trees contain the same bindings, + where keys are compare by {!KEY.to_int} and values are compared by + {!HASHED_VALUE.polyeq}. *) + + val compare : 'a t -> 'a t -> int + (** Constant time comparison using the {{!hash_consed}hash-consed} node identifiers. + This order is fully arbitrary, but it is total and can be used to sort nodes. + It is based on node ids which depend on the order in which the nodes where created + (older nodes having smaller ids). + + One useful property of this order is that + child nodes will always have a smaller identifier than their parents. *) end (** {1 Map signatures} *) @@ -187,12 +268,12 @@ end (** Base map signature: a generic ['b map] storing bindings of ['a key] to [('a,'b) values]. All maps and set are a variation of this type, - sometimes with a simplified interface: + sometimes with a simplified interface. - {!HETEROGENEOUS_MAP} is just a {!BASE_MAP} with a functor {!HETEROGENEOUS_MAP.WithForeign} - for building operations that operate on two maps of different base types. - - {!MAP} specializes the interface for non-generic keys ([key] instead of ['a key]) - - {!HETEROGENEOUS_SET} specializes {!BASE_MAP} for sets ([('a,'b) value = unit])n - removes value argument from most operations + for building operations that operate on two maps of different base types; + - {!MAP} specializes the interface for non-generic keys ([key] instead of ['a key]); + - {!HETEROGENEOUS_SET} specializes {!BASE_MAP} for sets ([('a,'b) value = unit]) and + removes the value argument from most operations; - {!SET} specializes {!HETEROGENEOUS_SET} further by making elements (keys) non-generic ([elt] instead of ['a elt]). *) module type BASE_MAP = sig @@ -206,12 +287,12 @@ module type BASE_MAP = sig val unsigned_min_binding : 'a t -> 'a key_value_pair (** [unsigned_min_binding m] is minimal binding [KeyValue(k,v)] of the map, - using the {{!unsigned_lt}unsigned order} on [Key.to_int]. + using the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. @raises Not_found if the map is empty *) val unsigned_max_binding : 'a t -> 'a key_value_pair (** [unsigned_max_binding m] is maximal binding [KeyValue(k,v)] of the map, - using the {{!unsigned_lt}unsigned order} on [Key.to_int]. + using the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. @raises Not_found if the map is empty *) val singleton : 'a key -> ('a, 'b) value -> 'b t @@ -241,13 +322,13 @@ module type BASE_MAP = sig val pop_unsigned_minimum: 'map t -> ('map key_value_pair * 'map t) option (** [pop_unsigned_minimum m] returns [None] if [is_empty m], or [Some(key,value,m')] where [(key,value) = unsigned_min_binding m] and [m' = remove m key]. - Uses the {{!unsigned_lt}unsigned order} on [Key.to_int]. + Uses the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. O(log(n)) complexity. *) val pop_unsigned_maximum: 'map t -> ('map key_value_pair * 'map t) option (** [pop_unsigned_maximum m] returns [None] if [is_empty m], or [Some(key,value,m')] where [(key,value) = unsigned_max_binding m] and [m' = remove m key]. - Uses the {{!unsigned_lt}unsigned order} on [Key.to_int]. + Uses the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. O(log(n)) complexity. *) val insert: 'a key -> (('a,'map) value option -> ('a,'map) value) -> 'map t -> 'map t @@ -278,18 +359,19 @@ module type BASE_MAP = sig - submap of [map] whose keys are smaller than [key] - value associated to [key] (if present) - submap of [map] whose keys are bigger than [key] - Where the order is given by the {{!unsigned_lt}unsigned order} on [Key.to_int]. *) + + Where the order is given by the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. *) type 'map polyiter = { f : 'a. 'a key -> ('a, 'map) value -> unit; } [@@unboxed] val iter : 'map polyiter -> 'map t -> unit (** [iter f m] calls [f.f] on all bindings of [m], - in the {{!unsigned_lt}unsigned order} on [Key.to_int] *) + in the {{!unsigned_lt}unsigned order} on {!KEY.to_int} *) type ('acc,'map) polyfold = { f: 'a. 'a key -> ('a,'map) value -> 'acc -> 'acc } [@@unboxed] val fold : ('acc,'map) polyfold -> 'map t -> 'acc -> 'acc (** [fold f m acc] returns [f.f key_n value_n (... (f.f key_1 value_1 acc))] where [(key_1, value_1) ... (key_n, value_n)] are the bindings of [m], in - the {{!unsigned_lt}unsigned order} on [Key.to_int]. *) + the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. *) type ('acc,'map) polyfold2 = { f: 'a. 'a key -> ('a,'map) value -> ('a,'map) value -> 'acc -> 'acc } [@@unboxed] val fold_on_nonequal_inter : ('acc,'map) polyfold2 -> 'map t -> 'map t -> 'acc -> 'acc @@ -313,10 +395,10 @@ module type BASE_MAP = sig val filter : 'map polypredicate -> 'map t -> 'map t (** [filter f m] returns the submap of [m] containing the bindings [k->v] such that [f.f k v = true]. - [f.f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int] *) + [f.f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) val for_all : 'map polypredicate -> 'map t -> bool - (** [for_all f m] checks that [f] holds on all bindings of [m]7 + (** [for_all f m] checks that [f] holds on all bindings of [m]. Short-circuiting. *) (** In the following, the *no_share function allows taking arguments @@ -329,14 +411,14 @@ module type BASE_MAP = sig val map : ('map,'map) polymap -> 'map t -> 'map t val map_no_share : ('map1,'map2) polymap -> 'map1 t -> 'map2 t (** [map f m] and [map_no_share f m] replace all bindings [(k,v)] by [(k, f.f v)]. - Bindings are examined in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + Bindings are examined in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) type ('map1,'map2) polymapi = { f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value; } [@@unboxed] val mapi : ('map,'map) polymapi -> 'map t -> 'map t val mapi_no_share : ('map1,'map2) polymapi -> 'map1 t -> 'map2 t (** [mapi f m] and [mapi_no_share f m] replace all bindings [(k,v)] by [(k, f.f k v)]. - Bindings are examined in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + Bindings are examined in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) type ('map1,'map2) polyfilter_map = { f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value option; } [@@unboxed] @@ -345,7 +427,7 @@ module type BASE_MAP = sig (** [filter_map m f] and [filter_map_no_share m f] remove the bindings [(k,v)] for which [f.f k v] is [None], and replaces the bindings [(k,v)] for which [f.f k v] is [Some v'] by [(k,v')]. - Bindings are examined in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + Bindings are examined in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) type 'map polypretty = { f: 'a. Format.formatter -> 'a key -> ('a, 'map) value -> unit } [@@unboxed] val pretty : @@ -353,8 +435,8 @@ module type BASE_MAP = sig Format.formatter -> 'map t -> unit (** Pretty-prints a map using the given formatter. [pp_sep] is called once between each binding, - it defaults to [Format.pp_print_cut]. - Bindings are printed in the {{!unsigned_lt}unsigned order} of [Key.to_int] *) + it defaults to {{: https://v2.ocaml.org/api/Format.html#VALpp_print_cut}[Format.pp_print_cut]}. + Bindings are printed in the {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) (** {3 Functions on pairs of maps} *) @@ -366,9 +448,10 @@ module type BASE_MAP = sig (** [reflexive_same_domain_for_all2 f m1 m2] is true if and only if - [m1] and [m2] have the same domain (set of keys) - for all bindings [(k, v1)] in [m1] and [(k, v2)] in [m2], [f.f k v1 v2] holds - @assumes [f.f] is reflexive, i.e. [f.f k v v = true] to skip calls to equal subtrees. - Calls [f.f] in ascending {{!unsigned_lt}unsigned order} of [Key.to_int]. - Exits early if the domains mismatch. + + {b Assumes} [f.f] is reflexive, i.e. [f.f k v v = true] to skip calls to equal subtrees. + Calls [f.f] in ascending {{!unsigned_lt}unsigned order} of {!KEY.to_int}. + Exits early if the domains mismatch or if [f.f] returns false. It is useful to implement equality on maps: {[ @@ -382,16 +465,17 @@ module type BASE_MAP = sig ('map1,'map2) polysame_domain_for_all2 -> 'map1 t -> 'map2 t -> bool (** [nonreflexive_same_domain_for_all2 f m1 m2] is the same as {!reflexive_same_domain_for_all2}, but doesn't assume [f.f] is reflexive. - It thus calls [f.f] on every binding, in ascending {{!unsigned_lt}unsigned order} of [Key.to_int]. - Exits early if the domains mismatch. *) + It thus calls [f.f] on every binding, in ascending {{!unsigned_lt}unsigned order} of {!KEY.to_int}. + Exits early if the domains mismatch or if [f.f] returns false. *) val reflexive_subset_domain_for_all2 : ('map,'map) polysame_domain_for_all2 -> 'map t -> 'map t -> bool (** [reflexive_subset_domain_for_all2 f m1 m2] is true if and only if - [m1]'s domain is a subset of [m2]'s. (all keys defined in [m1] are also defined in [m2]) - for all bindings [(k, v1)] in [m1] and [(k, v2)] in [m2], [f.f k v1 v2] holds - @assumes [f.f] is reflexive, i.e. [f.f k v v = true] to skip calls to equal subtrees. - Calls [f.f] in ascending {{!unsigned_lt}unsigned order} of [Key.to_int]. + + {b Assumes} [f.f] is reflexive, i.e. [f.f k v v = true] to skip calls to equal subtrees. + Calls [f.f] in ascending {{!unsigned_lt}unsigned order} of {!KEY.to_int}. Exits early if the domains mismatch. *) type ('map1, 'map2, 'map3) polyunion = { @@ -400,8 +484,9 @@ module type BASE_MAP = sig (** [idempotent_union f map1 map2] returns a map whose keys is the union of the keys of [map1] and [map2]. [f.f] is used to combine the values of keys mapped in both maps. - @assumes [f.f] idempotent (i.e. [f key value value == value]) - [f.f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. + + {b Assumes} [f.f] idempotent (i.e. [f key value value == value]) + [f.f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. [f.f] is never called on physically equal values. Preserves physical equality as much as possible. Complexity is O(log(n)*Delta) where Delta is the number of @@ -414,8 +499,9 @@ module type BASE_MAP = sig (** [idempotent_inter f map1 map2] returns a map whose keys is the intersection of the keys of [map1] and [map2]. [f.f] is used to combine the values a key is mapped in both maps. - @assumes [f.f] idempotent (i.e. [f key value value == value]) - [f.f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. + + {b Assumes} [f.f] idempotent (i.e. [f key value value == value]) + [f.f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. [f.f] is never called on physically equal values. Preserves physical equality as much as possible. Complexity is O(log(n)*Delta) where Delta is the number of @@ -437,7 +523,7 @@ module type BASE_MAP = sig type ('map1, 'map2, 'map3) polymerge = { f : 'a. 'a key -> ('a, 'map1) value option -> ('a, 'map2) value option -> ('a, 'map3) value option; } [@@unboxed] val slow_merge : ('map1, 'map2, 'map3) polymerge -> 'map1 t -> 'map2 t -> 'map3 t - (** This is the same as {{: https://ocaml.org/manual/5.1/api/Map.S.html#VALmerge}Stdlib.Map.S.merge} *) + (** This is the same as {{: https://ocaml.org/api/Map.S.html#VALmerge}Stdlib.Map.S.merge} *) val disjoint : 'a t -> 'a t -> bool (** [disjoint m1 m2] is [true] iff [m1] and [m2] have disjoint domains *) @@ -445,10 +531,10 @@ module type BASE_MAP = sig (** {3 Conversion functions} *) val to_seq : 'a t -> 'a key_value_pair Seq.t - (** [to_seq m] iterates the whole map, in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_seq m] iterates the whole map, in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) val to_rev_seq : 'a t -> 'a key_value_pair Seq.t - (** [to_rev_seq m] iterates the whole map, in decreasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_rev_seq m] iterates the whole map, in decreasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) val add_seq : 'a key_value_pair Seq.t -> 'a t -> 'a t (** [add_seq s m] adds all bindings of the sequence [s] to [m] in order. *) @@ -462,7 +548,7 @@ module type BASE_MAP = sig If a key is bound multiple times in [l], the latest binding is kept *) val to_list : 'a t -> 'a key_value_pair list - (** [to_list m] returns the bindings of [m] as a list, in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_list m] returns the bindings of [m] as a list, in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) end (** {2 Heterogeneous maps and sets} *) @@ -483,10 +569,10 @@ module type HETEROGENEOUS_MAP = sig - The type of some return values, like key-value pairs, must be concealed existentially, hence the {!KeyValue} constructor. *) - include BASE_MAP + include BASE_MAP (** @closed *) (** Operation with maps/set of different types. - [Map2] must use the same [Key.to_int] function. *) + [Map2] must use the same {!KEY.to_int} function. *) module WithForeign(Map2:BASE_MAP with type 'a key = 'a key):sig type ('map1,'map2) polyinter_foreign = { f: 'a. 'a key -> ('a,'map1) value -> ('a,'map2) Map2.value -> ('a,'map1) value } [@@unboxed] @@ -507,7 +593,7 @@ module type HETEROGENEOUS_MAP = sig i.e. [update_multiple_from_foreign m_from f m_to] calls [f.f] on every key of [m_from], says if the corresponding value also exists in [m_to], and adds or remove the element in [m_to] depending on the value of [f.f]. - [f.f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. + [f.f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. O(size(m_from) + size(m_to)) complexity. *) type ('map1,'map2) polyupdate_multiple_inter = { f: 'a. 'a key -> ('a,'map1) value -> ('a,'map2) Map2.value -> ('a,'map1) value option } [@@unboxed] @@ -545,8 +631,8 @@ module type HETEROGENEOUS_SET = sig type 'a key = 'a elt (** Alias for elements, for compatibility with other PatriciaTrees *) + (** Existential wrapper for set elements. *) type any_elt = Any : 'a elt -> any_elt - (** Existential wrapper for keys *) (** {3 Basic functions} *) @@ -626,36 +712,36 @@ module type HETEROGENEOUS_SET = sig type polyiter = { f: 'a. 'a elt -> unit; } [@@unboxed] val iter: polyiter -> t -> unit - (** [iter f set] calls [f.f] on all elements of [set], in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + (** [iter f set] calls [f.f] on all elements of [set], in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) type polypredicate = { f: 'a. 'a elt -> bool; } [@@unboxed] val filter: polypredicate -> t -> t (** [filter f set] is the subset of [set] that only contains the elements that - satisfy [f.f]. [f.f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + satisfy [f.f]. [f.f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) val for_all: polypredicate -> t -> bool (** [for_all f set] is [true] if [f.f] is [true] on all elements of [set]. - Short-circuits on first [false]. [f.f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + Short-circuits on first [false]. [f.f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) type 'acc polyfold = { f: 'a. 'a elt -> 'acc -> 'acc } [@@unboxed] val fold: 'acc polyfold -> t -> 'acc -> 'acc (** [fold f set acc] returns [f.f elt_n (... (f.f elt_1 acc) ...)], where [elt_1, ..., elt_n] are the elements of [set], in increasing {{!unsigned_lt}unsigned order} of - [Key.to_int] *) + {!KEY.to_int} *) type polypretty = { f: 'a. Format.formatter -> 'a elt -> unit; } [@@unboxed] val pretty : ?pp_sep:(Format.formatter -> unit -> unit) -> polypretty -> Format.formatter -> t -> unit (** Pretty prints the set, [pp_sep] is called once between each element, - it defaults to [Format.pp_print_cut] *) + it defaults to {{: https://v2.ocaml.org/api/Format.html#VALpp_print_cut}[Format.pp_print_cut]} *) (** {3 Conversion functions} *) val to_seq : t -> any_elt Seq.t - (** [to_seq st] iterates the whole set, in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_seq st] iterates the whole set, in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) val to_rev_seq : t -> any_elt Seq.t - (** [to_rev_seq st] iterates the whole set, in decreasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_rev_seq st] iterates the whole set, in decreasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) val add_seq : any_elt Seq.t -> t -> t (** [add_seq s st] adds all elements of the sequence [s] to [st] in order. *) @@ -667,12 +753,13 @@ module type HETEROGENEOUS_SET = sig (** [of_list l] creates a new set from the elements of [l]. *) val to_list : t -> any_elt list - (** [to_list s] returns the elements of [s] as a list, in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_list s] returns the elements of [s] as a list, in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) end (** {2 Homogeneous maps and sets} *) -(** Same as above, but simple interfaces for non-generic keys *) +(** Same as above, but simple interfaces for non-generic keys. These + are also close to the standard library's interface for sets and maps. *) (** Signature for sets implemented using Patricia trees. Most of this interface should be shared with {{: https://ocaml.org/api/Set.S.html}[Stdlib.Set.S]}. *) @@ -680,19 +767,19 @@ module type SET = sig type elt (** The type of elements of the set *) + type key = elt + (** Alias for the type of elements, for cross-compatibility with maps *) + (** Underlying basemap, for cross map/set operations *) module BaseMap : HETEROGENEOUS_MAP with type _ key = elt and type (_,_) value = unit - (** {3 Basic functions} *) - - type key = elt - (** Alias for the type of elements, for cross-compatibility with maps *) - type t = unit BaseMap.t (** The set type *) + (** {3 Basic functions} *) + val empty: t (** The empty set *) @@ -711,7 +798,7 @@ module type SET = sig (** [singleton elt] returns a set containing a single element: [elt] *) val cardinal: t -> int - (** the size of the set (number of elements), O(n) complexity. *) + (** [cardinal set] is the size of the set (number of elements), O(n) complexity. *) val is_singleton: t -> elt option (** [is_singleton set] is [Some (Any elt)] if [set] is [singleton elt] and [None] otherwise. *) @@ -721,52 +808,52 @@ module type SET = sig Returns a value physically equal to [set] if [elt] is not present. *) val unsigned_min_elt: t -> elt - (** The minimal element (according to the {{!unsigned_lt}unsigned order} on [Key.to_int]) if non empty. + (** The minimal element (according to the {{!unsigned_lt}unsigned order} on {!KEY.to_int}) if non empty. @raises Not_found *) val unsigned_max_elt: t -> elt - (** The maximal element (according to the {{!unsigned_lt}unsigned order} on [Key.to_int]) if non empty. + (** The maximal element (according to the {{!unsigned_lt}unsigned order} on {!KEY.to_int}) if non empty. @raises Not_found *) val pop_unsigned_minimum: t -> (elt * t) option (** [pop_unsigned_minimum s] is [Some (elt, s')] where [elt = unsigned_min_elt s] and [s' = remove elt s] if [s] is non empty. - Uses the {{!unsigned_lt}unsigned order} on [Key.to_int]. *) + Uses the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. *) val pop_unsigned_maximum: t -> (elt * t) option (** [pop_unsigned_maximum s] is [Some (elt, s')] where [elt = unsigned_max_elt s] and [s' = remove elt s] if [s] is non empty. - Uses the {{!unsigned_lt}unsigned order} on [Key.to_int]. *) + Uses the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. *) (** {3 Iterators} *) val iter: (elt -> unit) -> t -> unit - (** [iter f set] calls [f] on all elements of [set], in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + (** [iter f set] calls [f] on all elements of [set], in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) val filter: (elt -> bool) -> t -> t (** [filter f set] is the subset of [set] that only contains the elements that - satisfy [f]. [f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + satisfy [f]. [f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) val for_all: (elt -> bool) -> t -> bool (** [for_all f set] is [true] if [f] is [true] on all elements of [set]. - Short-circuits on first [false]. [f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) + Short-circuits on first [false]. [f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. *) val fold: (elt -> 'acc -> 'acc) -> t -> 'acc -> 'acc (** [fold f set acc] returns [f elt_n (... (f elt_1 acc) ...)], where [elt_1, ..., elt_n] are the elements of [set], in increasing {{!unsigned_lt}unsigned order} of - [Key.to_int] *) + {!KEY.to_int} *) val split: elt -> t -> t * bool * t (** [split elt set] returns [s_lt, present, s_gt] where [s_lt] contains all elements of [set] smaller than [elt], [s_gt] all those greater than [elt], and [present] is [true] if [elt] is in [set]. - Uses the {{!unsigned_lt}unsigned order} on [Key.to_int].*) + Uses the {{!unsigned_lt}unsigned order} on {!KEY.to_int}.*) val pretty : ?pp_sep:(Format.formatter -> unit -> unit) -> (Format.formatter -> elt -> unit) -> Format.formatter -> t -> unit (** Pretty prints the set, [pp_sep] is called once between each element, - it defaults to [Format.pp_print_cut] *) + it defaults to {{: https://v2.ocaml.org/api/Format.html#VALpp_print_cut}[Format.pp_print_cut]} *) (** {3 Functions on pairs of sets} *) @@ -790,10 +877,10 @@ module type SET = sig (** {3 Conversion functions} *) val to_seq : t -> elt Seq.t - (** [to_seq st] iterates the whole set, in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_seq st] iterates the whole set, in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) val to_rev_seq : t -> elt Seq.t - (** [to_rev_seq st] iterates the whole set, in decreasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_rev_seq st] iterates the whole set, in decreasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) val add_seq : elt Seq.t -> t -> t (** [add_seq s st] adds all elements of the sequence [s] to [st] in order. *) @@ -805,31 +892,44 @@ module type SET = sig (** [of_list l] creates a new set from the elements of [l]. *) val to_list : t -> elt list - (** [to_list s] returns the elements of [s] as a list, in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + (** [to_list s] returns the elements of [s] as a list, in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) end (** The typechecker struggles with forall quantification on values if they don't depend on the first parameter, this wrapping allows our code to pass typechecking by forbidding overly eager simplification. + Since the type is unboxed, it doesn't introduce any performance overhead. This is due to a bug in the typechecker, more info on {{: https://discuss.ocaml.org/t/weird-behaviors-with-first-order-polymorphism/13783} the OCaml discourse post}. *) type (_, 'b) snd = Snd of 'b [@@unboxed] -(** The signature for maps with a single type for keys and values. - Most of this interface should be shared with {{: https://ocaml.org/api/Map.S.html}[Stdlib.Map.S]}. *) -module type MAP = sig + +(** The signature for maps with a single type for keys and values, + a ['a map] binds [key] to ['a value]. + This is slightly more generic than {!MAP}, which just binds to ['a]. + It is used for maps that need to restrict their value type, namely {!hash_consed}. *) +module type MAP_WITH_VALUE = sig type key (** The type of keys. *) type 'a t - (** A map from keys to values of type 'a. *) + (** A map from [key] to values of type ['a value]. *) + + type 'a value + (** Type for values, this is a divergence from Stdlib's [Map], + but becomes equivalent to it when using {!MAP}, + which is just [MAP_WITH_VALUE with type 'a value = 'a]. + On the other hand, it allows defining maps with fixed values, which is useful + for hash-consing. + + @since v0.10.0 *) (** Underlying basemap, for cross map/set operations *) module BaseMap : HETEROGENEOUS_MAP - with type 'a t = 'a t + with type 'a t = 'a t and type _ key = key - and type ('a,'b) value = ('a,'b) snd + and type ('a,'b) value = ('a,'b value) snd (** {3 Basice functions} *) @@ -839,29 +939,29 @@ module type MAP = sig val is_empty : 'a t -> bool (** Test if a map is empty; O(1) complexity. *) - val unsigned_min_binding : 'a t -> (key * 'a) + val unsigned_min_binding : 'a t -> (key * 'a value) (** Returns the (key,value) where [Key.to_int key] is minimal (in the {{!unsigned_lt}unsigned representation} of integers); O(log n) complexity. @raises Not_found if the map is empty *) - val unsigned_max_binding : 'a t -> (key * 'a) + val unsigned_max_binding : 'a t -> (key * 'a value) (** Returns the (key,value) where [Key.to_int key] is maximal (in the {{!unsigned_lt}unsigned representation} of integers); O(log n) complexity. @raises Not_found if the map is empty *) - val singleton : key -> 'a -> 'a t + val singleton : key -> 'a value -> 'a t (** [singleton key value] creates a map with a single binding, O(1) complexity. *) val cardinal : 'a t -> int - (** The size of the map *) + (** The size of the map. O(n) complexity *) - val is_singleton : 'a t -> (key * 'a) option + val is_singleton : 'a t -> (key * 'a value) option (** [is_singleton m] is [Some (k,v)] iff [m] is [singleton k v] *) - val find : key -> 'a t -> 'a + val find : key -> 'a t -> 'a value (** Return an element in the map, or raise [Not_found], O(log(n)) complexity. *) - val find_opt : key -> 'a t -> 'a option + val find_opt : key -> 'a t -> 'a value option (** Return an element in the map, or [None], O(log(n)) complexity. *) val mem : key -> 'a t -> bool @@ -871,17 +971,17 @@ module type MAP = sig (** Returns a map with the element removed, O(log(n)) complexity. Returns a physically equal map if the element is absent. *) - val pop_unsigned_minimum : 'a t -> (key * 'a * 'a t) option + val pop_unsigned_minimum : 'a t -> (key * 'a value * 'a t) option (** [pop_unsigned_minimum m] returns [None] if [is_empty m], or [Some(key,value,m')] where [(key,value) = unsigned_min_binding m] and [m' = remove m key]. O(log(n)) complexity. - Uses the {{!unsigned_lt}unsigned order} on [Key.to_int]. *) + Uses the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. *) - val pop_unsigned_maximum : 'a t -> (key * 'a * 'a t) option + val pop_unsigned_maximum : 'a t -> (key * 'a value * 'a t) option (** [pop_unsigned_maximum m] returns [None] if [is_empty m], or [Some(key,value,m')] where [(key,value) = unsigned_max_binding m] and [m' = remove m key]. O(log(n)) complexity. - Uses the {{!unsigned_lt}unsigned order} on [Key.to_int]. *) + Uses the {{!unsigned_lt}unsigned order} on {!KEY.to_int}. *) - val insert : key -> ('a option -> 'a) -> 'a t -> 'a t + val insert : key -> ('a value option -> 'a value) -> 'a t -> 'a t (** [insert key f map] modifies or insert an element of the map; [f] takes [None] if the value was not previously bound, and [Some old] where [old] is the previously bound value otherwise. The function @@ -889,7 +989,7 @@ module type MAP = sig complexity. Preserves physical equality if the new value is physically equal to the old. *) - val update : key -> ('a option -> 'a option) -> 'a t -> 'a t + val update : key -> ('a value option -> 'a value option) -> 'a t -> 'a t (** [update key f map] modifies, insert, or remove an element from the map; [f] takes [None] if the value was not previously bound, and [Some old] where [old] is the previously bound value otherwise. The @@ -897,27 +997,27 @@ module type MAP = sig None if the element should be removed O(log(n)) complexity. Preserves physical equality if the new value is physically equal to the old. *) - val add : key -> 'a -> 'a t -> 'a t + val add : key -> 'a value -> 'a t -> 'a t (** Unconditionally adds a value in the map (independently from whether the old value existed). O(log(n)) complexity. Preserves physical equality if the new value is physically equal to the old. *) (** {3 Iterators} *) - val split : key -> 'a t -> 'a t * 'a option * 'a t + val split : key -> 'a t -> 'a t * 'a value option * 'a t (** [split key map] splits the map into: - submap of [map] whose keys are smaller than [key] - value associated to [key] (if present) - submap of [map] whose keys are bigger than [key] - Using the {{!unsigned_lt}unsigned order} is given by [Key.to_int]. *) + Using the {{!unsigned_lt}unsigned order} is given by {!KEY.to_int}. *) - val iter : (key -> 'a -> unit) -> 'a t -> unit + val iter : (key -> 'a value -> unit) -> 'a t -> unit (** Iterate on each (key,value) pair of the map, in increasing {{!unsigned_lt}unsigned order} of keys. *) - val fold : (key -> 'a -> 'acc -> 'acc) -> 'a t -> 'acc -> 'acc + val fold : (key -> 'a value -> 'acc -> 'acc) -> 'a t -> 'acc -> 'acc (** Fold on each (key,value) pair of the map, in increasing {{!unsigned_lt}unsigned order} of keys. *) - val fold_on_nonequal_inter : (key -> 'a -> 'a -> 'acc -> 'acc) -> + val fold_on_nonequal_inter : (key -> 'a value -> 'a value -> 'acc -> 'acc) -> 'a t -> 'a t -> 'acc -> 'acc (** [fold_on_nonequal_inter f m1 m2 acc] returns [f key_n value1_n value2n (... (f key_1 value1_1 value2_1 acc))] where @@ -925,7 +1025,7 @@ module type MAP = sig bindings that exist in both maps ([m1 ∩ m2]) whose values are physically different. Calls to [f] are performed in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) - val fold_on_nonequal_union: (key -> 'a option -> 'a option -> 'acc -> 'acc) -> + val fold_on_nonequal_union: (key -> 'a value option -> 'a value option -> 'acc -> 'acc) -> 'a t -> 'a t -> 'acc -> 'acc (** [fold_on_nonequal_union f m1 m2 acc] returns [f key_n value1_n value2n (... (f key_1 value1_1 value2_1 acc))] where @@ -934,11 +1034,11 @@ module type MAP = sig different. Calls to [f.f] are performed in the {{!unsigned_lt}unsigned order} of [Key.to_int]. *) - val filter : (key -> 'a -> bool) -> 'a t -> 'a t + val filter : (key -> 'a value -> bool) -> 'a t -> 'a t (** Returns the submap containing only the key->value pairs satisfying the given predicate. [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. *) - val for_all : (key -> 'a -> bool) -> 'a t -> bool + val for_all : (key -> 'a value -> bool) -> 'a t -> bool (** Returns true if the predicate holds on all map bindings. Short-circuiting. [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. *) @@ -948,7 +1048,7 @@ module type MAP = sig sharing the subtrees (using physical equality to detect sharing). *) - val map : ('a -> 'a) -> 'a t -> 'a t + val map : ('a value -> 'a value) -> 'a t -> 'a t (** [map f m] returns a map where the [value] bound to each [key] is replaced by [f value]. The subtrees for which the returned value is physically the same (i.e. [f key value == value] for @@ -956,12 +1056,12 @@ module type MAP = sig equal to the original subtree. O(n) complexity. [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. *) - val map_no_share : ('a -> 'b) -> 'a t -> 'b t + val map_no_share : ('a value -> 'b value) -> 'a t -> 'b t (** [map_no_share f m] returns a map where the [value] bound to each [key] is replaced by [f value]. O(n) complexity. [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. *) - val mapi : (key -> 'a -> 'a) -> 'a t -> 'a t + val mapi : (key -> 'a value -> 'a value) -> 'a t -> 'a t (** [mapi f m] returns a map where the [value] bound to each [key] is replaced by [f key value]. The subtrees for which the returned value is physically the same (i.e. [f key value == value] for @@ -969,12 +1069,12 @@ module type MAP = sig equal to the original subtree. O(n) complexity. [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. *) - val mapi_no_share : (key -> 'a -> 'b) -> 'a t -> 'b t + val mapi_no_share : (key -> 'a value -> 'b value) -> 'a t -> 'b t (** [mapi_no_share f m] returns a map where the [value] bound to each [key] is replaced by [f key value]. O(n) complexity. [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. *) - val filter_map : (key -> 'a -> 'a option) -> 'a t -> 'a t + val filter_map : (key -> 'a value -> 'a value option) -> 'a t -> 'a t (** [filter_map m f] returns a map where the [value] bound to each [key] is removed (if [f key value] returns [None]), or is replaced by [v] ((if [f key value] returns [Some v]). The @@ -984,7 +1084,7 @@ module type MAP = sig original subtree. O(n) complexity. [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. *) - val filter_map_no_share : (key -> 'a -> 'b option) -> 'a t -> 'b t + val filter_map_no_share : (key -> 'a value -> 'b value option) -> 'a t -> 'b t (** [filter_map m f] returns a map where the [value] bound to each [key] is removed (if [f key value] returns [None]), or is replaced by [v] ((if [f key value] returns [Some v]). O(n) @@ -1010,7 +1110,7 @@ module type MAP = sig this subtree with Empty; hence we provide union and inter operations. *) - val reflexive_same_domain_for_all2 : (key -> 'a -> 'a -> bool) -> 'a t -> 'a t -> bool + val reflexive_same_domain_for_all2 : (key -> 'a value -> 'a value -> bool) -> 'a t -> 'a t -> bool (** [reflexive_same_domain_for_all2 f map1 map2] returns true if [map1] and [map2] have the same keys, and [f key value1 value2] returns true for each mapping pair of keys. We assume that [f] @@ -1019,13 +1119,13 @@ module type MAP = sig complexity is O(log(n)*Delta) where Delta is the number of different keys between [map1] and [map2]. *) - val nonreflexive_same_domain_for_all2 : (key -> 'a -> 'b -> bool) -> 'a t -> 'b t -> bool + val nonreflexive_same_domain_for_all2 : (key -> 'a value -> 'b value -> bool) -> 'a t -> 'b t -> bool (** [nonreflexive_same_domain_for_all2 f map1 map2] returns true if map1 and map2 have the same keys, and [f key value1 value2] returns true for each mapping pair of keys. The complexity is O(min(|map1|,|map2|)). *) - val reflexive_subset_domain_for_all2 : (key -> 'a -> 'a -> bool) -> 'a t -> 'a t -> bool + val reflexive_subset_domain_for_all2 : (key -> 'a value -> 'a value -> bool) -> 'a t -> 'a t -> bool (** [reflexive_subset_domain_for_all2 f map1 map2] returns true if all the keys of [map1] also are in [map2], and [f key (find map1 key) (find map2 key)] returns [true] when both keys are present @@ -1035,7 +1135,7 @@ module type MAP = sig Delta is the number of different keys between [map1] and [map2]. *) - val idempotent_union : (key -> 'a -> 'a -> 'a) -> 'a t -> 'a t -> 'a t + val idempotent_union : (key -> 'a value -> 'a value -> 'a value) -> 'a t -> 'a t -> 'a t (** [idempotent_union f map1 map2] returns a map whose keys is the union of the keys of [map1] and [map2]. [f] is used to combine the values a key is mapped in both maps. We assume that [f] is @@ -1047,7 +1147,7 @@ module type MAP = sig [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. [f] is never called on physically equal values. *) - val idempotent_inter : (key -> 'a -> 'a -> 'a) -> 'a t -> 'a t -> 'a t + val idempotent_inter : (key -> 'a value -> 'a value -> 'a value) -> 'a t -> 'a t -> 'a t (** [idempotent_inter f map1 map2] returns a map whose keys is the intersection of the keys of [map1] and [map2]. [f] is used to combine the values a key is mapped in both maps. We assume that [f] is @@ -1059,7 +1159,7 @@ module type MAP = sig [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. [f] is never called on physically equal values. *) - val nonidempotent_inter_no_share : (key -> 'a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t + val nonidempotent_inter_no_share : (key -> 'a value -> 'b value -> 'c value) -> 'a t -> 'b t -> 'c t (** [nonidempotent_inter_no_share f map1 map2] returns a map whose keys is the intersection of the keys of [map1] and [map2]. [f] is used to combine the values a key is mapped in both maps. [f] does not @@ -1069,13 +1169,13 @@ module type MAP = sig [f] is called in increasing {{!unsigned_lt}unsigned order} of keys. [f] is called on every shared binding. *) - val idempotent_inter_filter : (key -> 'a -> 'a -> 'a option) -> 'a t -> 'a t -> 'a t + val idempotent_inter_filter : (key -> 'a value -> 'a value -> 'a value option) -> 'a t -> 'a t -> 'a t (** [idempotent_inter_filter f m1 m2] is like {!idempotent_inter} (assuming idempotence, using and preserving physically equal subtrees), but it also removes the key->value bindings for which [f] returns [None]. *) - val slow_merge : (key -> 'a option -> 'b option -> 'c option) -> 'a t -> 'b t -> 'c t + val slow_merge : (key -> 'a value option -> 'b value option -> 'c value option) -> 'a t -> 'b t -> 'c t (** [slow_merge f m1 m2] returns a map whose keys are a subset of the keys of [m1] and [m2]. The [f] function is used to combine keys, similarly to the [Map.merge] function. This funcion has @@ -1085,20 +1185,20 @@ module type MAP = sig val disjoint : 'a t -> 'a t -> bool (** Combination with other kinds of maps. - [Map2] must use the same [Key.to_int] function. *) + [Map2] must use the same {!KEY.to_int} function. *) module WithForeign(Map2 : BASE_MAP with type _ key = key):sig - type ('b,'c) polyfilter_map_foreign = { f: 'a. key -> ('a,'b) Map2.value -> 'c option } [@@unboxed] + type ('b,'c) polyfilter_map_foreign = { f: 'a. key -> ('a,'b) Map2.value -> 'c value option } [@@unboxed] val filter_map_no_share : ('b, 'c) polyfilter_map_foreign -> 'b Map2.t -> 'c t (** Like [filter_map_no_share], but takes another map. *) type ('value,'map2) polyinter_foreign = - { f: 'a. 'a Map2.key -> 'value -> ('a, 'map2) Map2.value -> 'value } [@@unboxed] + { f: 'a. 'a Map2.key -> 'value value-> ('a, 'map2) Map2.value -> 'value value } [@@unboxed] val nonidempotent_inter : ('a, 'b) polyinter_foreign -> 'a t -> 'b Map2.t -> 'a t (** Like [nonidempotent_inter], but takes another map as an argument. *) - type ('map1,'map2) polyupdate_multiple = { f: 'a. key -> 'map1 option -> ('a,'map2) Map2.value -> 'map1 option } [@@unboxed] + type ('map1,'map2) polyupdate_multiple = { f: 'a. key -> 'map1 value option -> ('a,'map2) Map2.value -> 'map1 value option } [@@unboxed] val update_multiple_from_foreign : 'b Map2.t -> ('a,'b) polyupdate_multiple -> 'a t -> 'a t (** This is equivalent to multiple calls to {!update} (but more efficient) [update_multiple_from_foreign m_from f m_to] is the same as calling @@ -1107,11 +1207,11 @@ module type MAP = sig i.e. [update_multiple_from_foreign m_from f m_to] calls [f.f] on every key of [m_from], says if the corresponding value also exists in [m_to], and adds or remove the element in [m_to] depending on the value of [f.f]. - [f.f] is called in the {{!unsigned_lt}unsigned order} of [Key.to_int]. + [f.f] is called in the {{!unsigned_lt}unsigned order} of {!KEY.to_int}. O(size(m_from) + size(m_to)) complexity. *) - type ('map1,'map2) polyupdate_multiple_inter = { f: 'a. key -> 'map1 -> ('a,'map2) Map2.value -> 'map1 option } [@@unboxed] + type ('map1,'map2) polyupdate_multiple_inter = { f: 'a. key -> 'map1 value -> ('a,'map2) Map2.value -> 'map1 value option } [@@unboxed] val update_multiple_from_inter_with_foreign: 'b Map2.t -> ('a,'b) polyupdate_multiple_inter -> 'a t -> 'a t (** [update_multiple_from_inter_with_foreign m_from f m_to] is the same as {!update_multiple_from_foreign}, except that instead of updating for all @@ -1121,49 +1221,61 @@ module type MAP = sig val pretty : ?pp_sep:(Format.formatter -> unit -> unit) -> - (Format.formatter -> key -> 'a -> unit) -> + (Format.formatter -> key -> 'a value -> unit) -> Format.formatter -> 'a t -> unit (** Pretty prints all bindings of the map. - [pp_sep] is called once between each binding pair and defaults to [Format.pp_print_cut]. *) + [pp_sep] is called once between each binding pair and defaults to {{: https://v2.ocaml.org/api/Format.html#VALpp_print_cut}[Format.pp_print_cut]}. *) (** {3 Conversion functions} *) - val to_seq : 'a t -> (key * 'a) Seq.t - (** [to_seq m] iterates the whole map, in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + val to_seq : 'a t -> (key * 'a value) Seq.t + (** [to_seq m] iterates the whole map, in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) - val to_rev_seq : 'a t -> (key * 'a) Seq.t - (** [to_rev_seq m] iterates the whole map, in decreasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + val to_rev_seq : 'a t -> (key * 'a value) Seq.t + (** [to_rev_seq m] iterates the whole map, in decreasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) - val add_seq : (key * 'a) Seq.t -> 'a t -> 'a t + val add_seq : (key * 'a value) Seq.t -> 'a t -> 'a t (** [add_seq s m] adds all bindings of the sequence [s] to [m] in order. *) - val of_seq : (key * 'a) Seq.t -> 'a t + val of_seq : (key * 'a value) Seq.t -> 'a t (** [of_seq s] creates a new map from the bindings of [s]. If a key is bound multiple times in [s], the latest binding is kept *) - val of_list : (key * 'a) list -> 'a t + val of_list : (key * 'a value) list -> 'a t (** [of_list l] creates a new map from the bindings of [l]. If a key is bound multiple times in [l], the latest binding is kept *) - val to_list : 'a t -> (key * 'a) list + val to_list : 'a t -> (key * 'a value) list (** [to_list m] returns the bindings of [m] as a list, - in increasing {{!unsigned_lt}unsigned order} of [Key.to_int] *) + in increasing {{!unsigned_lt}unsigned order} of {!KEY.to_int} *) end +(** The signature for maps with a single type for keys and values, + a ['a map] binds [key] to ['a]. + Most of this interface should be shared with {{: https://ocaml.org/api/Map.S.html}[Stdlib.Map.S]}. *) +module type MAP = MAP_WITH_VALUE with type 'a value = 'a + (** {1 Keys} *) (** Keys are the functor arguments used to build the maps. *) -(** The signature of keys when they are all of the same type. *) +(** The signature of homogeneous keys (non-generic, unparameterized keys). *) module type KEY = sig type t - (** The type of keys *) + (** The type of keys. + + {b It is recommended to use immutable keys.} + If keys are mutable, + any mutations to keys must preserve {!to_int}. Failing to do so will + break the patricia trees' invariants. *) (** A unique identifier for values of the type. Usually, we use a fresh counter that is increased to give a unique id to each object. Correctness of the operations requires that different values in a tree correspond to different integers. - Must be injective, and ideally fast. + {b Must be injective, and ideally fast.} + {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consing} keys is a good way to + generate such unique identifiers. Note that since Patricia Trees use {{!unsigned_lt}unsigned order}, negative keys are seen as bigger than positive keys. @@ -1180,7 +1292,12 @@ type (_, _) cmp = Eq : ('a, 'a) cmp | Diff : ('a, 'b) cmp (** The signature of heterogeneous keys. *) module type HETEROGENEOUS_KEY = sig type 'key t - (** The type of generic/heterogeneous keys *) + (** The type of generic/heterogeneous keys. + + {b It is recommended to use immutable keys.} + If keys are mutable, + any mutations to keys must preserve {!to_int}. Failing to do so will + break the patricia trees' invariants. *) val to_int : 'key t -> int @@ -1189,7 +1306,9 @@ module type HETEROGENEOUS_KEY = sig object. Correctness of the operations requires that different values in a tree correspond to different integers. - Must be injective, and ideally fast. + {b Must be injective, and ideally fast.} + {{: https://en.wikipedia.org/wiki/Hash_consing}hash-consing} keys is a good way to + generate such unique identifiers. Note that since Patricia Trees use {{!unsigned_lt}unsigned order}, negative keys are seen as bigger than positive keys. @@ -1198,69 +1317,510 @@ module type HETEROGENEOUS_KEY = sig val polyeq : 'a t -> 'b t -> ('a, 'b) cmp (** Polymorphic equality function used to compare our keys. - It should satisfy [(to_int a) = (to_int b) ==> polyeq a b = Eq] *) + It should satisfy [(to_int a) = (to_int b) ==> polyeq a b = Eq], and be + fast. *) end +(** {1 Values} *) + +(** Module type used for specifying custom homogeneous value types in {!MakeCustomMap}. + For most purposes, use the provided {!Value} implementation. + It sets ['a t = 'a], which is the desired effect (maps can map to any value). + This is the case in {!MakeMap}. + However, for maps like {!hash_consed}, it can be useful to restrict the type + of values in order to implement [hash] and [polyeq] functions on values. + See the {!HASHED_VALUE} module type for more details. + + @since 0.10.0 *) +module type VALUE = sig + type 'a t + (** The type of values. A ['map map] maps [key] to ['map value]. + Can be mutable if desired, unless it is being used in {!hash_consed}. *) +end -(** The moodule type of values, which can be heterogeneous. *) -module type VALUE = sig type ('key, 'map) t end +(** Default implementation of {!VALUE}, used in {!MakeMap}. + @since 0.10.0 *) +module Value : VALUE with type 'a t = 'a + +(** The module type of values, which can be heterogeneous. + This can be used to specify how the type of the value depends on that of the key. + If the value doesn't depend on the key type, you can use the provided default + implementations {!HomogeneousValue} and {!WrappedHomogeneousValue}. *) +module type HETEROGENEOUS_VALUE = sig + type ('key, 'map) t + (** The type of values. A ['map map] maps ['key key] to [('key, 'map) value]. + Can be mutable if desired, unless it is being used in {!hash_consed}. *) +end + +(** Default implementation of {!HETEROGENEOUS_VALUE}, to use when the type of the + value in a heterogeneous map does not depend on the type of the key, only on + the type of the map. *) +module HomogeneousValue : HETEROGENEOUS_VALUE with type ('a,'map) t = 'map + +(** Same as {!HomogeneousValue}, but uses a wrapper (unboxed) type instead of direct + equality. This avoids a problem in the typechecker with overly eager simplification of aliases. + More info on + {{: https://discuss.ocaml.org/t/weird-behaviors-with-first-order-polymorphism/13783} the OCaml discourse post}. *) +module WrappedHomogeneousValue : HETEROGENEOUS_VALUE with type ('a,'map) t = ('a,'map) snd + +(** {!VALUE} parameter for {!hash_consed}, as hash-consing requires hashing and comparing values. + + This is the parameter type for homogeneous maps, used in {!MakeHashconsedMap}. + A default implementation is provided in {!HashedValue}, using + {{: https://ocaml.org/api/Hashtbl.html#VALhash}[Hashtbl.hash]} + as [hash] function and physical equality as [polyeq]. + + @since 0.10.0 *) +module type HASHED_VALUE = sig + type 'a t + (** The type of values for a hash-consed maps. + + Unlike {!VALUE.t}, {b hash-consed values should be immutable}. + Or, if they do mutate, they must not change their {!hash} value, and + still be equal to the same values via {!polyeq} *) + + val hash : 'map t -> int + (** [hash v] should return an integer hash for the value [v]. + It is used for {{!hash_consed}hash-consing}. + + Hashing should be fast, avoid mapping too many values to the same integer + and compatible with {!polyeq} (equal values must have the same hash: + [polyeq v1 v2 = true ==> hash v1 = hash v2]). *) + + val polyeq : 'a t -> 'b t -> bool + (** Polymorphic equality on values. + + {b WARNING: if [polyeq a b] is true, then casting [b] to the type of [a] + (and [a] to the type of [b]) must be type-safe.} Eg. if [a : t1 t] and [b : t2 t] + yield [polyeq a b = true], then [let a' : t2 t = Obj.magic a] and + [let b' : t1 t = Obj.magic b] must be safe. + + Examples of safe implementations include: + {ul + {li Having a type ['a t] which doesn't depend on ['a], in which case casting + form ['a t] to ['b t] is always safe: + {[ + type _ t = foo + let cast : type a b. a t -> b t = fun x -> x + let polyeq : type a b. a t -> b t -> bool = fun x y -> x = y + ]}} + {li Using a GADT type and examining its constructors to only return [true] + when the constructors are equal: + {[ + type _ t = + | T_Int : int -> int t + | T_Bool : bool -> bool t + let polyeq : type a b. a t -> b t -> bool = fun x y -> + match x, y with + | T_Int i, T_Int j -> i = j (* Here type a = b = int, we can return true *) + | T_Bool i, T_Bool j -> i && j (* same here, but with a = b = bool *) + | _ -> false (* never return true on heterogeneous cases. *) + ]}} + {li Using physical equality: + {[ + let polyeq a b = a == Obj.magic b + ]} + While this contains an [Obj.magic], it is still type safe (OCaml just compares + the immediate values) and we can safely cast values from one type to the + other if they satisfy this (since they are already physically equal). + + This is the implementation used in {!HashedValue}. Note however that + using this function can lead to {b identifiers no longer being unique across + types}. They will still be unique and behave as expected within a certain type, + but since some values of different types can physically equal, we may have + identifer clashes: + {[ + let _ = 97 == Obj.magic 'a' (* This is true *) + + module HMap = MakeHashconsedMap(Int)(HashedValue) + + let m1 = HMap.singleton 5 97 (* int HMap.t *) + let m2 = HMap.singleton 5 'a' (* char HMap.t *) + let _ = HMap.to_int m1 = HMap.to_int m2 (* This is also true. *) + ]} + This can cause problems if you wish to use identifiers of different map + types together: + {[ + module MapOfMaps = MakeMap(struct + type t = Any : 'a HMap.t -> t + let to_int (Any x) = Node.to_int x + end) + + let m3 = MapOfMaps.of_list [ (m1, "foo"); (m2, "bar") ] + (* m3 has cardinal 1, the m1->foo binding has been overwritten. *) + ]} + This issue does not happen with the two previous variants, since they + both only return true on the same types.}} *) +end + +(** In order to build {!hash_consed}, we need to be able to hash and compare values. + + This is the heterogeneous version of {!HASHED_VALUE}, used to specify a value + for heterogeneous maps (in {!MakeHashconsedHeterogeneousMap}). + A default implementation is provided in {!HeterogeneousHashedValue}, using + {{: https://ocaml.org/api/Hashtbl.html#VALhash}[Hashtbl.hash]} + as [hash] function and physical equality as [polyeq]. + + @since 0.10.0 *) +module type HETEROGENEOUS_HASHED_VALUE = sig + type ('key, 'map) t + (** The type of values for a hash-consed maps. + + Unlike {!HETEROGENEOUS_VALUE.t}, {b hash-consed values should be immutable}. + Or, if they do mutate, they must not change their {!hash} value, and + still be equal to the same values via {!polyeq} *) + + val hash : ('key, 'map) t -> int + (** [hash v] should return an integer hash for the value [v]. + It is used for {{!hash_consed}hash-consing}. + + Hashing should be fast, avoid mapping too many values to the same integer + and compatible with {!polyeq} (equal values must have the same hash: + [polyeq v1 v2 = true ==> hash v1 = hash v2]). *) + + val polyeq : ('key, 'map_a) t -> ('key, 'map_b) t -> bool + (** Polymorphic equality on values. + + {b WARNING: if [polyeq a b] is true, then casting [b] to the type of [a] + (and [a] to the type of [b]) must be type-safe.} Eg. if [a : (k, t1) t] and [b : (k, t2) t] + yield [polyeq a b = true], then [let a' : (k,t2) t = Obj.magic a] and + [let b' : (k,t1) t = Obj.magic b] must be safe. + + Examples of safe implementations include: + {ul + {li Having a type [('key, 'map) t] which doesn't depend on ['map] (i can depend on ['key]), in which case casting + form [('key, 'a) t] to [('key, 'b) t] is always safe: + {[ + type ('k, _) t = 'k list + let cast : type a b. ('k, a) t -> ('k, b) t = fun x -> x + let polyeq : type a b. ('k, a) t -> ('k, b) t -> bool = fun x y -> x = y + ]}} + {li Using a GADT type and examining its constructors to only return [true] + when the constructors are equal: + {[ + type (_, _) t = + | T_Int : int -> (unit, int) t + | T_Bool : bool -> (unit, bool) t + let polyeq : type k a b. (k, a) t -> (k, b) t -> bool = fun x y -> + match x, y with + | T_Int i, T_Int j -> i = j (* Here type a = b = int, we can return true *) + | T_Bool i, T_Bool j -> i && j (* same here, but with a = b = bool *) + | _ -> false (* never return true on heterogeneous cases. *) + ]}} + {li Using physical equality: + {[ + let polyeq a b = a == Obj.magic b + ]} + While this contains an [Obj.magic], it is still type safe (OCaml just compares + the immediate values) and we can safely cast values from one type to the + other if they satisfy this (since they are already physically equal). + + This is the implementation used in {!HeterogeneousHashedValue}. Note however that + using this function can lead to {b identifiers no longer being unique across + types}. See {!HASHED_VALUE.polyeq} for more information on this.}} *) +end + +module HashedValue : HASHED_VALUE with type 'a t = 'a +(** Generic implementation of {!HASHED_VALUE}. + Uses {{: https://ocaml.org/api/Hashtbl.html#VALhash}[Hashtbl.hash]} for hashing + and physical equality for equality. + Note that this may lead to maps of different types having the same identifier + ({!MakeHashconsedMap.to_int}), see the documentation of {!HASHED_VALUE.polyeq} + for details on this. *) + +module HeterogeneousHashedValue : HETEROGENEOUS_HASHED_VALUE with type ('k, 'm) t = 'm +(** Generic implementation of {!HETEROGENEOUS_HASHED_VALUE}. + Uses {{: https://ocaml.org/api/Hashtbl.html#VALhash}[Hashtbl.hash]} for hashing + and physical equality for equality. + Note that this may lead to maps of different types having the same identifier + ({!MakeHashconsedHeterogeneousMap.to_int}), see the documentation of + {!HASHED_VALUE.polyeq} for details on this. *) -(** To use when the type of the value is the same (but the keys can still be heterogeneous). *) -module HomogeneousValue:VALUE with type ('a,'map) t = 'map -module WrappedHomogeneousValue:VALUE with type ('a,'map) t = ('a,'map) snd (** {1 Functors} *) +(** This section presents the functors which can be used to build patricia tree + maps and sets. *) (** {2 Homogeneous maps and sets} *) +(** These are homogeneous maps and set, their keys/elements are a single + non-generic type, just like the standard library's [Map] and [Set] modules. *) -module MakeMap(Key:KEY):MAP with type key = Key.t -module MakeSet(Key:KEY):SET with type elt = Key.t +module MakeMap(Key: KEY) : MAP with type key = Key.t +module MakeSet(Key: KEY) : SET with type elt = Key.t (** {2 Heterogeneous maps and sets} *) +(** Heterogeneous maps are ['map map], which store bindings of ['key key] + to [('key, 'map) value], where ['key key] is a GADT, as we must be able + to compare keys of different types together. + + Similarly, heterogeneous sets store sets of ['key key]. *) -module MakeHeterogeneousSet(Key:HETEROGENEOUS_KEY):HETEROGENEOUS_SET with type 'a elt = 'a Key.t -module MakeHeterogeneousMap(Key:HETEROGENEOUS_KEY)(Value:VALUE):HETEROGENEOUS_MAP +module MakeHeterogeneousSet(Key: HETEROGENEOUS_KEY) : HETEROGENEOUS_SET + with type 'a elt = 'a Key.t +module MakeHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_VALUE) : HETEROGENEOUS_MAP with type 'a key = 'a Key.t and type ('k,'m) value = ('k,'m) Value.t -(** {2 Maps with custom representation of Nodes} *) +(** {2 Maps and sets with custom nodes} *) (** We can also customize the representation and creation of nodes, to - gain space or time. - - Possibitities include having weak key and/or values, hash-consing, - giving unique number to nodes or keeping them in sync with the - disk, lazy evaluation and/or caching, etc. *) - -(** Create a Homogeneous Map with a custom {!NODE}. *) -module MakeCustom - (Key:KEY) - (NODE:NODE with type 'a key = Key.t and type ('key,'map) value = ('key,'map) snd) - :MAP + gain space or time. + + Possibitities include having weak key and/or values, hash-consing, + giving unique number to nodes or keeping them in sync with the + disk, lazy evaluation and/or caching, adding size information for + constant time [cardinal] functions, etc. + + See {!node_impl} for the provided implementations of {!NODE}, or create your own. *) + +(** Create a homogeneous map with a custom {!NODE}. Also allows + customizing the map values *) +module MakeCustomMap + (Key: KEY) + (Value: VALUE) + (Node: NODE with type 'a key = Key.t and type ('key,'map) value = ('key, 'map Value.t) snd) + : MAP_WITH_VALUE with type key = Key.t - and type 'm t = 'm NODE.t - -(** Create an Heterogeneous map with a custom {!NODE}. *) -module MakeCustomHeterogeneous - (Key:HETEROGENEOUS_KEY) - (Value:VALUE) - (NODE:NODE with type 'a key = 'a Key.t and type ('key,'map) value = ('key,'map) Value.t) - :HETEROGENEOUS_MAP + and type 'm value = 'm Value.t + and type 'm t = 'm Node.t + + +(** Create a homogeneous set with a custom {!NODE}. + @since v0.10.0 *) +module MakeCustomSet + (Key: KEY) + (Node: NODE with type 'a key = Key.t and type ('key,'map) value = unit) + : SET + with type elt = Key.t + and type 'a BaseMap.t = 'a Node.t + +(** Create an heterogeneous map with a custom {!NODE}. *) +module MakeCustomHeterogeneousMap + (Key: HETEROGENEOUS_KEY) + (Value: HETEROGENEOUS_VALUE) + (Node: NODE with type 'a key = 'a Key.t and type ('key,'map) value = ('key,'map) Value.t) + : HETEROGENEOUS_MAP with type 'a key = 'a Key.t and type ('k,'m) value = ('k,'m) Value.t - and type 'm t = 'm NODE.t + and type 'm t = 'm Node.t + +(** Create an heterogeneous set with a custom {!NODE}. + @since v0.10.0 *) +module MakeCustomHeterogeneousSet + (Key: HETEROGENEOUS_KEY) + (NODE: NODE with type 'a key = 'a Key.t and type ('key,'map) value = unit) + : HETEROGENEOUS_SET + with type 'a elt = 'a Key.t + and type 'a BaseMap.t = 'a NODE.t + +(** {2:hash_consed Hash-consed maps and sets} *) +(** Hash-consed maps and sets uniquely number each of their nodes. + Upon creation, they check whether a similar node has been created before, + if so they return it, else they return a new node with a new number. + With this unique numbering: + - [equal] and [compare] become constant time operations; + - two maps with the same bindings (where keys compared by {!KEY.to_int} and + values by {!HASHED_VALUE.polyeq}) will always be physically equal; + - functions that benefit from sharing, like {!BASE_MAP.idempotent_union} and + {!BASE_MAP.idempotent_inter} will see improved performance; + - constructors are slightly slower, as they now require a hash-table lookup; + - memory usage is increased: nodes store their tags inside themselves, and + a global hash-table of all built nodes must be maintained; + - hash-consed maps assume their values are immutable; + - {b WARNING:} when using physical equality as {!HASHED_VALUE.polyeq}, some + {b maps of different types may be given the same identifier}. See the end of + the documentation of {!HASHED_VALUE.polyeq} for details. + Note that this is the case in the default implementations {!HashedValue} + and {!HeterogeneousHashedValue}. + + All hash-consing functors are {b generative}, since each functor call will + create a new hash-table to store the created nodes. Calling a functor + twice with same arguments will lead to two numbering systems for identifiers, + and thus the types should not be considered compatible. *) + +(** Hash-consed version of {!MAP}. See {!hash_consed} for the differences between + hash-consed and non hash-consed maps. + + This is a generative functor, as calling it creates a new hash-table to store + the created nodes, and a reference to store the next unallocated identifier. + Maps/sets from different hash-consing functors (even if these functors have + the same arguments) will have different (incompatible) numbering systems and + be stored in different hash-tables (thus they will never be physically equal). + + @since v0.10.0 *) +module MakeHashconsedMap(Key: KEY)(Value: HASHED_VALUE)() : sig + include MAP_WITH_VALUE with type key = Key.t and type 'a value = 'a Value.t (** @closed *) + + val to_int : 'a t -> int + (** Returns the {{!hash_consed}hash-consed} id of the map. + Unlike {!NODE_WITH_ID.to_int}, hash-consing ensures that maps + which contain the same keys (compared by {!KEY.to_int}) and values (compared + by {!HASHED_VALUE.polyeq}) will always be physically equal + and have the same identifier. + + Note that when using physical equality as {!HASHED_VALUE.polyeq}, some + maps of different types [a t] and [b t] may be given the same identifier. + See the end of the documentation of {!HASHED_VALUE.polyeq} for details. *) + + val equal : 'a t -> 'a t -> bool + (** Constant time equality using the {{!hash_consed}hash-consed} nodes identifiers. + This is equivalent to physical equality. + Two nodes are equal if their trees contain the same bindings, + where keys are compare by {!KEY.to_int} and values are compared by + {!HASHED_VALUE.polyeq}. *) + + val compare : 'a t -> 'a t -> int + (** Constant time comparison using the {{!hash_consed}hash-consed} node identifiers. + This order is fully arbitrary, but it is total and can be used to sort nodes. + It is based on node ids which depend on the order in which the nodes where created + (older nodes having smaller ids). + + One useful property of this order is that + child nodes will always have a smaller identifier than their parents. *) +end + +(** Hash-consed version of {!SET}. See {!hash_consed} for the differences between + hash-consed and non hash-consed sets. + + This is a generative functor, as calling it creates a new hash-table to store + the created nodes, and a reference to store the next unallocated identifier. + Maps/sets from different hash-consing functors (even if these functors have + the same arguments) will have different (incompatible) numbering systems and + be stored in different hash-tables (thus they will never be physically equal). + + @since v0.10.0 *) +module MakeHashconsedSet(Key: KEY)() : sig + include SET with type elt = Key.t (** @closed *) + + val to_int : t -> int + (** Returns the {{!hash_consed}hash-consed} id of the map. + Unlike {!NODE_WITH_ID.to_int}, hash-consing ensures that maps + which contain the same keys (compared by {!KEY.to_int}) and values (compared + by {!HASHED_VALUE.polyeq}) will always be physically equal + and have the same identifier. + + Note that when using physical equality as {!HASHED_VALUE.polyeq}, some + maps of different types [a t] and [b t] may be given the same identifier. + See the end of the documentation of {!HASHED_VALUE.polyeq} for details. *) + + val equal : t -> t -> bool + (** Constant time equality using the {{!hash_consed}hash-consed} nodes identifiers. + This is equivalent to physical equality. + Two nodes are equal if their trees contain the same bindings, + where keys are compare by {!KEY.to_int} and values are compared by + {!HASHED_VALUE.polyeq}. *) + + val compare : t -> t -> int + (** Constant time comparison using the {{!hash_consed}hash-consed} node identifiers. + This order is fully arbitrary, but it is total and can be used to sort nodes. + It is based on node ids which depend on the order in which the nodes where created + (older nodes having smaller ids). + + One useful property of this order is that + child nodes will always have a smaller identifier than their parents. *) +end + +(** Hash-consed version of {!HETEROGENEOUS_SET}. See {!hash_consed} for the differences between + hash-consed and non hash-consed sets. + + This is a generative functor, as calling it creates a new hash-table to store + the created nodes, and a reference to store the next unallocated identifier. + Maps/sets from different hash-consing functors (even if these functors have + the same arguments) will have different (incompatible) numbering systems and + be stored in different hash-tables (thus they will never be physically equal). + + @since v0.10.0 *) +module MakeHashconsedHeterogeneousSet(Key: HETEROGENEOUS_KEY)() : sig + include HETEROGENEOUS_SET with type 'a elt = 'a Key.t (** @closed *) + + val to_int : t -> int + (** Returns the {{!hash_consed}hash-consed} id of the map. + Unlike {!NODE_WITH_ID.to_int}, hash-consing ensures that maps + which contain the same keys (compared by {!KEY.to_int}) and values (compared + by {!HASHED_VALUE.polyeq}) will always be physically equal + and have the same identifier. + + Note that when using physical equality as {!HASHED_VALUE.polyeq}, some + maps of different types [a t] and [b t] may be given the same identifier. + See the end of the documentation of {!HASHED_VALUE.polyeq} for details. *) + + val equal : t -> t -> bool + (** Constant time equality using the {{!hash_consed}hash-consed} nodes identifiers. + This is equivalent to physical equality. + Two nodes are equal if their trees contain the same bindings, + where keys are compare by {!KEY.to_int} and values are compared by + {!HASHED_VALUE.polyeq}. *) + + val compare : t -> t -> int + (** Constant time comparison using the {{!hash_consed}hash-consed} node identifiers. + This order is fully arbitrary, but it is total and can be used to sort nodes. + It is based on node ids which depend on the order in which the nodes where created + (older nodes having smaller ids). + + One useful property of this order is that + child nodes will always have a smaller identifier than their parents. *) +end + +(** Hash-consed version of {!HETEROGENEOUS_MAP}. See {!hash_consed} for the differences between + hash-consed and non hash-consed maps. + + This is a generative functor, as calling it creates a new hash-table to store + the created nodes, and a reference to store the next unallocated identifier. + Maps/sets from different hash-consing functors (even if these functors have + the same arguments) will have different (incompatible) numbering systems and + be stored in different hash-tables (thus they will never be physically equal). + + @since v0.10.0 *) +module MakeHashconsedHeterogeneousMap(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_HASHED_VALUE)() : sig + include HETEROGENEOUS_MAP + with type 'a key = 'a Key.t + and type ('k,'m) value = ('k, 'm) Value.t (** @closed *) + + val to_int : 'a t -> int + (** Returns the {{!hash_consed}hash-consed} id of the map. + Unlike {!NODE_WITH_ID.to_int}, hash-consing ensures that maps + which contain the same keys (compared by {!KEY.to_int}) and values (compared + by {!HASHED_VALUE.polyeq}) will always be physically equal + and have the same identifier. + + Note that when using physical equality as {!HASHED_VALUE.polyeq}, some + maps of different types [a t] and [b t] may be given the same identifier. + See the end of the documentation of {!HASHED_VALUE.polyeq} for details. *) + + val equal : 'a t -> 'a t -> bool + (** Constant time equality using the {{!hash_consed}hash-consed} nodes identifiers. + This is equivalent to physical equality. + Two nodes are equal if their trees contain the same bindings, + where keys are compare by {!KEY.to_int} and values are compared by + {!HASHED_VALUE.polyeq}. *) + + val compare : 'a t -> 'a t -> int + (** Constant time comparison using the {{!hash_consed}hash-consed} node identifiers. + This order is fully arbitrary, but it is total and can be used to sort nodes. + It is based on node ids which depend on the order in which the nodes where created + (older nodes having smaller ids). + + One useful property of this order is that + child nodes will always have a smaller identifier than their parents. *) +end -(** {1 Some implementations of NODE} *) +(** {1:node_impl Some implementations of NODE} *) +(** We provide a few different implementations of {!NODE}, they can be used with + the {!MakeCustomMap}, {!MakeCustomSet}, {!MakeCustomHeterogeneousMap} and + {!MakeCustomHeterogeneousSet} functors. *) -(** This module is such that ['map t = 'map view]. *) -module SimpleNode(Key : sig type 'k t end)(Value : VALUE):NODE +(** {2 Basic nodes} *) + +(** This module is such that ['map t = 'map view]. + This is the node used in {!MakeHeterogeneousMap} and {!MakeMap}. *) +module SimpleNode(Key: sig type 'k t end)(Value: HETEROGENEOUS_VALUE) : NODE with type 'a key = 'a Key.t and type ('key,'map) value = ('key,'map) Value.t (** Here, nodes also contain a unique id, e.g. so that they can be - used as keys of maps or hashtables. *) -module NodeWithId(Key : sig type 'k t end)(Value:VALUE):NODE_WITH_ID + used as keys of maps or hash-tables. *) +module NodeWithId(Key: sig type 'k t end)(Value: HETEROGENEOUS_VALUE) : NODE_WITH_ID with type 'a key = 'a Key.t and type ('key,'map) value = ('key,'map) Value.t @@ -1271,28 +1831,55 @@ module NodeWithId(Key : sig type 'k t end)(Value:VALUE):NODE_WITH_ID (** An optimized representation for sets, i.e. maps to unit: we do not store a reference to unit (note that you can further optimize when - you know the representation of the key). *) -module SetNode(Key : sig type 'k t end):NODE + you know the representation of the key). + This is the node used in {!MakeHeterogeneousSet} and {!MakeSet}. *) +module SetNode(Key: sig type 'k t end) : NODE with type 'a key = 'a Key.t and type ('key,'map) value = unit +(** {2 Weak nodes} *) (** NODE used to implement weak key hashes (the key-binding pair is an Ephemeron, the reference to the key is weak, and if the key is garbage collected, the binding disappears from the map *) -module WeakNode(Key : sig type 'k t end)(Value : VALUE):NODE +module WeakNode(Key: sig type 'k t end)(Value: HETEROGENEOUS_VALUE) : NODE with type 'a key = 'a Key.t and type ('key,'map) value = ('key,'map) Value.t (** Both a {!WeakNode} and a {!SetNode}, useful to implement Weak sets. *) -module WeakSetNode(Key : sig type 'k t end):NODE +module WeakSetNode(Key: sig type 'k t end) : NODE + with type 'a key = 'a Key.t + and type ('key,'map) value = unit + + +(** {2 Hashconsed nodes} *) + +(** Gives a unique number to each node like {!NodeWithId}, + but also performs hash-consing. So two maps with the same bindings will + always be physically equal. See {!hash_consed} for more details on this. + + This is a generative functor, as calling it creates a new hash-table to store + the created nodes, and a reference to store the next unallocated identifier. + Maps/sets from different hash-consing functors (even if these functors have + the same arguments) will have different (incompatible) numbering systems and + be stored in different hash-tables (thus they will never be physically equal). + + Using a single {!HashconsedNode} in multiple {!MakeCustomMap} functors will result in + all those maps being hash-consed together (stored in the same hash-table, + same numbering system). + + @since v0.10.0 *) +module HashconsedNode(Key: HETEROGENEOUS_KEY)(Value: HETEROGENEOUS_HASHED_VALUE)() : HASH_CONSED_NODE + with type 'a key = 'a Key.t + and type ('key,'map) value = ('key, 'map) Value.t + +(** Both a {!HashconsedNode} and a {!SetNode}. + @since v0.10.0 *) +module HashconsedSetNode(Key: HETEROGENEOUS_KEY)() : HASH_CONSED_NODE with type 'a key = 'a Key.t and type ('key,'map) value = unit (* TODO: Functor to make sets from maps. *) -(* TODO: consider the "shape" of a map, and use this to have functions - that filter a map, or update several elements. Maybe this just - amounts to providing a "view" function to nodes. *) (* TODO: A possibility of customizing the fixpoint in the recursive calls, so that we can cache operations or make lazy some of the operations. *) diff --git a/patriciaTreeTest.ml b/patriciaTreeTest.ml index 6d017a0..4fce639 100644 --- a/patriciaTreeTest.ml +++ b/patriciaTreeTest.ml @@ -82,7 +82,7 @@ let%test_module "TestHeterogeneous" = (module struct end - module Map = MakeCustomHeterogeneous(MyKey)(MyValue)(SimpleNode(MyKey)(MyValue)) + module Map = MakeCustomHeterogeneousMap(MyKey)(MyValue)(SimpleNode(MyKey)(MyValue)) open Map let _m1 = singleton (MyKey.Int 7) (MyValue.AString "seven") @@ -139,7 +139,7 @@ let%test_module "TestHeterogeneous" = (module struct | _ -> NBranch{prefix;branching_bit;tree0;tree1} end - module Map2 = MakeCustomHeterogeneous(MyKey)(MyValue)(SimpleNode(MyKey)(MyValue)) + module Map2 = MakeCustomHeterogeneousMap(MyKey)(MyValue)(SimpleNode(MyKey)(MyValue)) open Map2 let _m1 = singleton (MyKey.Int 7) (MyValue.AString "seven") @@ -179,119 +179,121 @@ let unsigned_compare x y = if unsigned_lt x y then -1 else if x = y then 0 else 1 -let%test_module _ = (module struct - - (* A model. *) - module IntMap = struct - module M = Map.Make(struct - type t = int - let compare = unsigned_compare - end) - include M - let subset_domain_for_all_2 m1 m2 f = - let exception False in - try - let res = M.merge (fun key v1 v2 -> match v1,v2 with - | None, None -> assert false - | Some _, None -> raise False - | None, Some _ -> None - | Some v1, Some v2 -> - if f key v1 v2 then None else raise False) m1 m2 in - assert (M.is_empty res); - true - with False -> false - - let same_domain_for_all_2 m1 m2 f = - let exception False in - try - let res = M.merge (fun key v1 v2 -> match v1,v2 with - | None, None -> assert false - | Some _, None -> raise False - | None, Some _ -> raise False - | Some v1, Some v2 -> - if f key v1 v2 then None else raise False) m1 m2 in - assert (M.is_empty res); - true - with False -> false - - let inter m1 m2 f = - M.merge (fun key a b -> - match a,b with - | None, _ | _, None -> None - | Some a, Some b -> Some (f key a b)) m1 m2 - - let update_multiple_from_foreign m1 m2 f = - M.merge (fun key a b -> - match a, b with - | a, None -> a - | a, Some b -> f key a b) m1 m2 - - let update_multiple_from_inter_with_foreign m1 m2 f = - M.fold (fun key value acc -> - match M.find key acc with - | exception Not_found -> acc - | v -> begin match f key v value with - | None -> M.remove key acc - | Some v' -> M.add key v' acc - end) m2 m1 - - let inter_filter m1 m2 f = - M.merge (fun key a b -> - match a,b with - | None, _ | _, None -> None - | Some a, Some b -> (f key a b)) m1 m2 - - let fold_on_nonequal_inter f m1 m2 acc = - let racc = ref acc in - ignore @@ M.merge (fun key a b -> - match a,b with - | None, _ | _, None -> None - | Some a, Some b -> - if a != b - then racc := f key a b !racc; - None) m1 m2; - !racc - - let fold_on_nonequal_union f ma mb acc = - let union = M.merge (fun _key a b -> - match a,b with - | None, None -> assert false - | Some a, Some b when a == b -> None - | None, Some _ | Some _, None | Some _, Some _ -> Some(a,b)) ma mb in - let elts = M.bindings union in - let elts = List.sort (fun (key1,_val1) (key2,_val2) -> unsigned_compare key1 key2) elts in - List.fold_left (fun acc (key,(val1,val2)) -> f key val1 val2 acc) acc elts - - let pop_unsigned_minimum m = - match M.min_binding m with - | exception Not_found -> None - | (key,value) -> Some(key,value,M.remove key m) - - let pop_unsigned_maximum m = - match M.max_binding m with - | exception Not_found -> None - | (key,value) -> Some(key,value,M.remove key m) - end - - (* An implementation. *) - module IntValue : sig - type ('a, 'b) t = int - val pretty : Format.formatter -> ('a, 'b) t -> unit - end = struct - type ('a,'b) t = int - let pretty fmt x = Format.pp_print_int fmt x - end - - module HIntKey : sig - type t = int - val to_int : t -> int - end = struct +module HIntKey : sig + type t = int + val to_int : t -> int +end = struct + type t = int + let to_int x = x +end + +(* A model. *) +module IntMap = struct + module M = Map.Make(struct type t = int - let to_int x = x - end - - (* module MyMap = Make(SimpleNode(IntKey)(IntValue))(IntKey)(IntValue);; *) - module MyMap = MakeMap(HIntKey) + let compare = unsigned_compare + end) + include M + let subset_domain_for_all_2 m1 m2 f = + let exception False in + try + let res = M.merge (fun key v1 v2 -> match v1,v2 with + | None, None -> assert false + | Some _, None -> raise False + | None, Some _ -> None + | Some v1, Some v2 -> + if f key v1 v2 then None else raise False) m1 m2 in + assert (M.is_empty res); + true + with False -> false + + let same_domain_for_all_2 m1 m2 f = + let exception False in + try + let res = M.merge (fun key v1 v2 -> match v1,v2 with + | None, None -> assert false + | Some _, None -> raise False + | None, Some _ -> raise False + | Some v1, Some v2 -> + if f key v1 v2 then None else raise False) m1 m2 in + assert (M.is_empty res); + true + with False -> false + + let inter m1 m2 f = + M.merge (fun key a b -> + match a,b with + | None, _ | _, None -> None + | Some a, Some b -> Some (f key a b)) m1 m2 + + let update_multiple_from_foreign m1 m2 f = + M.merge (fun key a b -> + match a, b with + | a, None -> a + | a, Some b -> f key a b) m1 m2 + + let update_multiple_from_inter_with_foreign m1 m2 f = + M.fold (fun key value acc -> + match M.find key acc with + | exception Not_found -> acc + | v -> begin match f key v value with + | None -> M.remove key acc + | Some v' -> M.add key v' acc + end) m2 m1 + + let inter_filter m1 m2 f = + M.merge (fun key a b -> + match a,b with + | None, _ | _, None -> None + | Some a, Some b -> (f key a b)) m1 m2 + + let fold_on_nonequal_inter f m1 m2 acc = + let racc = ref acc in + ignore @@ M.merge (fun key a b -> + match a,b with + | None, _ | _, None -> None + | Some a, Some b -> + if a != b + then racc := f key a b !racc; + None) m1 m2; + !racc + + let fold_on_nonequal_union f ma mb acc = + let union = M.merge (fun _key a b -> + match a,b with + | None, None -> assert false + | Some a, Some b when a == b -> None + | None, Some _ | Some _, None | Some _, Some _ -> Some(a,b)) ma mb in + let elts = M.bindings union in + let elts = List.sort (fun (key1,_val1) (key2,_val2) -> unsigned_compare key1 key2) elts in + List.fold_left (fun acc (key,(val1,val2)) -> f key val1 val2 acc) acc elts + + let pop_unsigned_minimum m = + match M.min_binding m with + | exception Not_found -> None + | (key,value) -> Some(key,value,M.remove key m) + + let pop_unsigned_maximum m = + match M.max_binding m with + | exception Not_found -> None + | (key,value) -> Some(key,value,M.remove key m) +end + +(* An implementation. *) +module IntValue : sig + type ('a, 'b) t = int + val pretty : Format.formatter -> ('a, 'b) t -> unit +end = struct + type ('a,'b) t = int + let pretty fmt x = Format.pp_print_int fmt x +end + + +module TestImpl(MyMap : MAP with type key = int)(Param : sig + val test_id : bool + val number_gen : int QCheck.arbitrary + (* val to_int : 'a MyMap.t -> int option *) +end) = struct (* Add a list of pair of ints to a map. *) let rec extend_map mymap alist = @@ -300,6 +302,12 @@ let%test_module _ = (module struct | (a,b)::rest -> extend_map (MyMap.add a b mymap) rest + let rec remove_map mymap alist = + match alist with + | [] -> mymap + | (a,_)::rest -> + remove_map (MyMap.remove a mymap) rest + let intmap_of_mymap m = MyMap.fold (fun key value acc -> IntMap.add key value acc) m IntMap.empty @@ -309,7 +317,7 @@ let%test_module _ = (module struct let third = extend_map first alist3 in (second,third) - let number_gen = QCheck.int + let number_gen = Param.number_gen let gen = QCheck.(triple (small_list (pair number_gen number_gen)) @@ -395,8 +403,10 @@ let%test_module _ = (module struct let chk_calls1 = check_increases () in let chk_calls2 = check_increases () in let f k x = if (x mod 3 == 0) then None else Some (x - k + 1) in - let res1 = intmap_of_mymap @@ MyMap.filter_map (fun k v -> chk_calls1 k; f k v) m1 in - let res2 = intmap_of_mymap @@ MyMap.filter_map_no_share (fun k v -> chk_calls2 k; f k v) m1 in + let res1 = intmap_of_mymap @@ MyMap.filter_map ( + fun k v -> chk_calls1 k; f k v) m1 in + let res2 = intmap_of_mymap @@ MyMap.filter_map_no_share ( + fun k v -> chk_calls2 k; f k v) m1 in let modelres = IntMap.filter_map f model1 in IntMap.equal (=) res1 modelres && IntMap.equal (=) res2 modelres) @@ -432,7 +442,7 @@ let%test_module _ = (module struct let modelres = IntMap.union (fun key a b -> Some (f key a b)) model1 model2 in (* dump_test model1 model2 myres modelres; *) IntMap.equal (=) modelres myres) - let () = QCheck.Test.check_exn test_idempotent_union;; + let () = QCheck.Test.check_exn test_idempotent_union let test_idempotent_inter = QCheck.Test.make ~count:1000 ~name:"idempotent_inter" @@ -468,7 +478,8 @@ let%test_module _ = (module struct let orig_f = sdbm3 in let chk_calls = check_increases () in let f : int -> int -> int -> int = fun key (a:int) b -> chk_calls key; orig_f key a b in - let myres = intmap_of_mymap @@ Foreign.nonidempotent_inter {f=fun k v (Snd v2) -> f k v v2 } m1 m2 in + let myres = intmap_of_mymap @@ Foreign.nonidempotent_inter {f= + fun k v (Snd v2) -> f k v v2 } m1 m2 in let modelres = IntMap.inter model1 model2 orig_f in (* dump_test model1 model2 myres modelres; *) IntMap.equal (=) modelres myres) @@ -486,7 +497,8 @@ let%test_module _ = (module struct in let chk_calls = check_increases () in let f = fun key a b -> chk_calls key; orig_f key a b in - let myres = intmap_of_mymap @@ Foreign.update_multiple_from_foreign m2 {f=fun k v (Snd v') -> f k v v' } m1 in + let myres = intmap_of_mymap @@ Foreign.update_multiple_from_foreign m2 {f= + fun k v (Snd v') -> f k v v' } m1 in let modelres = IntMap.update_multiple_from_foreign model1 model2 orig_f in (* dump_test model1 model2 myres modelres; *) IntMap.equal (=) modelres myres) @@ -501,7 +513,8 @@ let%test_module _ = (module struct in let chk_calls = check_increases () in let f key (a:int) b = chk_calls key; orig_f key a b in - let myres = intmap_of_mymap @@ Foreign.update_multiple_from_inter_with_foreign m2 {f=fun k v (Snd v') -> f k v v'} m1 in + let myres = intmap_of_mymap @@ Foreign.update_multiple_from_inter_with_foreign m2 {f= + fun k v (Snd v') -> f k v v' } m1 in let modelres = IntMap.update_multiple_from_inter_with_foreign model1 model2 orig_f in (* dump_test model1 model2 myres modelres; *) IntMap.equal (=) modelres myres) @@ -533,7 +546,9 @@ let%test_module _ = (module struct | Some a, Some b -> if ((a - b - key) == 0) then None else Some(a-b-key) | None, None -> assert false in - let myres = intmap_of_mymap @@ MyMap.slow_merge f m1 m2 in + let myres = intmap_of_mymap @@ MyMap.slow_merge (fun key a b -> + f key a b + ) m1 m2 in let modelres = IntMap.merge f model1 model2 in (* dump_test model1 model2 myres modelres; *) (* Printf.printf "res is %b\n%!" @@ IntMap.equal (=) modelres myres; *) @@ -597,13 +612,52 @@ let%test_module _ = (module struct | Branch{prefix; branching_bit; _} -> Format.printf "%x : %x@." (Obj.magic prefix) (Obj.magic branching_bit) | _ -> () ); *) - MyMap.to_list map = [(0,0)] && + MyMap.to_list map = [(0, 0)] && MyMap.to_list map2 = [(0,0); (min_int,5)] && MyMap.to_list map3 = [(0,0); (max_int,8); (min_int,5)] && MyMap.to_list map4 = [(0,0); (25,8); (min_int,5)] && MyMap.to_list map5 = MyMap.to_list map2 -end) + let test_id_unique = QCheck.Test.make ~count:1000 ~name:"unique_hashcons_id" + gen (fun (one,two,three) -> + (* Remove duplicates *) + let two = List.filter (fun (x, _) -> not (List.mem_assoc x one)) two in + let three = List.filter (fun (x, _) -> not (List.mem_assoc x one || List.mem_assoc x two)) three in + let m = extend_map MyMap.empty one in + let m1 = extend_map (extend_map m two) three in + m1 == extend_map (extend_map (extend_map MyMap.empty three) one) two && + m1 == extend_map (extend_map (extend_map MyMap.empty two) three) one && + m1 == extend_map (extend_map (extend_map MyMap.empty three) two) one && + m1 == extend_map (extend_map (extend_map MyMap.empty one) three) two && + m1 == extend_map m1 one && + m == remove_map (extend_map m two) two && + MyMap.empty == remove_map m one + ) + let () = if Param.test_id then QCheck.Test.check_exn test_id_unique +end + +module MyMap = MakeMap(HIntKey) +module MyHashedMap = MakeHashconsedMap(HIntKey)(HashedValue)() + +let%test_module "TestMap_SmallNat" = (module TestImpl(MyMap)(struct + let test_id = false + let number_gen = QCheck.small_nat +end)) + +let%test_module "TestMap_Int" = (module TestImpl(MyMap)(struct + let test_id = false + let number_gen = QCheck.int +end)) + +let%test_module "TestHashconsedMap_SmallNat" = (module TestImpl(MyHashedMap)(struct + let test_id = true + let number_gen = QCheck.small_nat +end)) + +let%test_module "TestHashconsedMap_Int" = (module TestImpl(MyHashedMap)(struct + let test_id = true + let number_gen = QCheck.int +end)) let%test_module "TestWeak" = (module struct @@ -612,8 +666,8 @@ let%test_module "TestWeak" = (module struct let to_int (Block x) = x end - module NODE = WeakNode(struct type 'a t = MyKey.t end)(WrappedHomogeneousValue) - module Map = MakeCustom(MyKey)(NODE) + module Node = WeakNode(struct type 'a t = MyKey.t end)(WrappedHomogeneousValue) + module Map = MakeCustomMap(MyKey)(Value)(Node) open Map let _m1 = singleton (MyKey.Block 7) "seven"