-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashconsed maps nodes #1
Conversation
patriciaTree.ml
Outdated
| NBranch{prefix=prefixa;branching_bit=branching_bita;tree0=tree0a;tree1=tree1a;_}, | ||
NBranch{prefix=prefixb;branching_bit=branching_bitb;tree0=tree0b;tree1=tree1b;_} -> | ||
prefixa == prefixb && branching_bita == branching_bitb && | ||
get_id tree0a = get_id tree0b && get_id tree1a = get_id tree1b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use == instead of get_id here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not, because these values don't have the same type... This is also why I need an Obj.magic
in the leaf case.
However, the more I think about it, the more I believe we should get rid of arbitrary value maps for hash-consing. It makes little sense to hashcons maps from 'a key
to ('a, 'b) value
together with map from 'a key
to ('a, 'c) value
. So I'll probably change it to use a 'a value
type instead of the more generic ('a, 'b) value
type.
Otherwise we might have problems with different types having equal values.
So I've commited to the single value type for hash-consed version, this leads to a few interface changes from the previous version:
I've also added test:
|
I don't think we needed to change the existing interface. The following code should work on 0.9.0: module HashConsedNode(Key:sig type 'a t end)(Value:VALUE):NODE_WITH_ID
with type 'key key = 'key Key.t
and type ('key,'map) value = ('key,'map) Value.t
= struct
type 'a key = 'a Key.t
type ('key,'map) value = ('key,'map) Value.t
type 'map view =
| Empty: 'map view
| Branch: {prefix:intkey;branching_bit:mask;tree0:'map t;tree1:'map t} -> 'map view
| Leaf: {key:'key key; value:('key,'map) value} -> 'map view
and 'map t =
| NEmpty: 'map t
| NBranch: {prefix:intkey;branching_bit:mask;tree0:'map t;tree1:'map t;id:int} -> 'map t
| NLeaf: {key:'key key;value:('key,'map) value;id:int} -> 'map t
let view = function
| NEmpty -> Empty
| NBranch{prefix;branching_bit;tree0;tree1;_} -> Branch{prefix;branching_bit;tree0;tree1}
| NLeaf{key;value;_} -> Leaf{key;value}
let get_id = function
| NEmpty -> 0
| NBranch{id;_} -> id
| NLeaf{id;_} -> id
type any_map = Any_map: 'map t -> any_map [@@unboxed]
(* We can't get rid of Obj.magic because of the float hack, which prevents unboxing here. *)
(* type any_key = Any_key: 'a Key.t -> any_key [@@unboxed] *)
(* type any_value = Any_value: ('key,'map) Value.t -> any_value [@@unboxed] *)
module WeakHash = Weak.Make(struct
type t = any_map
let sdbm x y = y + (x lsl 16) + (x lsl 6) - x;;
let hash (Any_map x) = match x with
| NEmpty -> 0
| NLeaf{key=_;value=_;id=_} -> assert false (* XXX: We have
the int for the
key, but we may
want a hashing
function for
the value
too. *)
| NBranch{prefix;branching_bit;tree0;tree1;id=_} ->
let hash = (get_id tree0) in
let hash = sdbm (get_id tree1) hash in
let hash = sdbm prefix hash in
let hash = sdbm branching_bit hash in
hash
;;
let equal (Any_map a) (Any_map b) = match a,b with
| NEmpty,NEmpty -> true
| NLeaf{key=keya;value=valuea;id=_},NLeaf{key=keyb;value=valueb;id=_} ->
(Obj.magic keya) == keyb && (Obj.magic valuea) == valueb
| NBranch{prefix=pa;branching_bit=bba;tree0=tree0a;tree1=tree1a;id=_},
NBranch{prefix=pb;branching_bit=bbb;tree0=tree0b;tree1=tree1b;id=_} ->
pa == pb && bba == bbb && (Any_map tree0a) == (Any_map tree0b) && (Any_map tree1a) == (Any_map tree1b)
| _ -> false
end)
let count = ref 1;;
let hashtbl = WeakHash.create 17;;
let empty = NEmpty
let is_empty x = x == NEmpty
let leaf (type k m) (key: k key) (value: (k,m) value): m t =
let tentative = NLeaf{key;value;id= !count} in
let any_tentative = Any_map tentative in
let res = WeakHash.merge hashtbl any_tentative in
if(any_tentative == res) then incr count;
let Any_map x = res in
(* Obj.magic is fine, as the returned object has the same
content than the tentative one. *)
let x:m t = Obj.magic x in
x
;;
let branch: type m. prefix:int -> branching_bit:int -> tree0:m t ->tree1:m t-> m t =
fun ~prefix ~branching_bit ~tree0 ~tree1 ->
match tree0,tree1 with
| NEmpty, x -> x
| x, NEmpty -> x
| _ ->
let tentative = NBranch{prefix;branching_bit;tree0;tree1;id=(!count)} in
let any_tentative = Any_map tentative in
let res = WeakHash.merge hashtbl any_tentative in
if(any_tentative == res) then incr count;
let Any_map x = res in
(* Obj.magic is fine, as the returned object has the same
content than the tentative one. *)
let x:m t = Obj.magic x in
x
end```
Some Obj.magic seem unavoidable, but they are all correct. We can further improve on this code by using 2 Hashtbl (and using GADT to distinguish the leaf and branch cases, to avoid pattern matching). |
No, any Obj.magic on values is unsafe, since different types can have equal representations: # 97 == Obj.magic 'a';;
- : bool = true |
We need to clarify what safe means here.
An exemple of a f function that does not have 1. but has the problem described in 2. is The Another way to be convinced that this code is safe to to consider the following fragment: type any = Any: 'a -> any [@@unboxed]
let f a b = (Any a) == (Any b) The only reason why this code does not work is because of the float array hack; if it were removed, it should then work, and this code is equivalent to the It is true that values of different types can have equal representation; which is why we want to hashcons values of different types that have the same memory representation. |
Ok I see your point. I guess one final objection I'd have is that this can result in trees of different types (but same representation, like On the other hand, the only things we really lose with the new interface are:
So sure, it is a bit less user friendly, but I'll argue it also avoids confusion (different functor calls make it explicit the ids from different value types may clash). |
We explicitly require different keys in the map to be mapped to different integers. So the problem would be a user that succeeds in putting in the map two keys with the same representation but different types. I don't think it is possible to write polyeq in this case. Well, I have trouble finding an example that would be problematic.
I don't see the confusion (I don't say there is none), so all I can see for now is the more complicated interface. If there is a possible legitimate confusion (or worse, a type safety bug), and if the change of interface is the best way to avoid it, then changing the interface could be a good idea (I don't think the change is a good idea is an improvement if there are no confusion). So, I propose that you write a problematic example first, and then we can see if avoiding this problem requires an interface change. |
My example shows the problem happening with using the same key module Key = HeterogeneousKeyFromKey(struct
type t = int
let to_int x = x
end)
module Node = HashconsedNode(Key)
module HMap = MakeCustomMap(Key)(Node)
let m1 = HMap.singleton 5 97 (* int HMap.t *)
let m2 = HMap.singleton 5 'a'
(* char HMap.t, but with same id and physically equal to m1 *) Using this, you can easily arrive at surprising behaviors: module MapOfMaps = MakeMap(struct
type t = Any : 'a HMap.t -> t
let to_int (Any x) = Node.get_id x
end)
let m3 = MapOfMaps.of_list [ (m1, "foo"); (m2, "bar") ]
(* m3 has cardinal 1, the m1->foo binding has been overwritten.
Users might reasonably assume there shouldn't be a conflict here *) Furthermore, if users assume id's are unique, it becomes very easy to write unsafe code which looks safe: type _ value =
| OfInt : string -> int value
| OfChar : unit -> char value
module HetMapOfMaps = MakeHeterogeneousMap(struct
type 'a t = 'a HMap.t
let to_int = Node.get_id
let polyeq: type a b. a t -> b t -> (a, b) cmp = fun a b ->
if to_int a = to_int b
then Obj.magic Eq (* This looks safe, if one assumes id are unique *)
else Diff
end)(struct type ('a, 'b) t = 'a value end)
let m4 = HetMapOfMaps.of_list [ (m2, OfChar ()); (m1; OfInt "hello") ]
(* Same as m3, m4 only has a single binding *)
let error = HetMapOfMaps.get m2 m4
(* Typechecker thinks this is an [char value], with unit as only argument,
it is in fact a [int value] with "hello" as only argument *)
let not_unit = match error with OfChar x -> x
(* This should be unit, but it isn't. The match will either fail
or be optimized away (which allows creating a value of type unit which isn't ()... *) This is quite a subtle issue. It is admittedly very rare (you need trees with same keys and physically equal values for it to crop up), but that also makes it very hard to debug when it does appear. I've used these sorts of unique id's into obj.magic for poly equality before, since it is generally a safe pattern. We can write documentation explaining the issue, but we can never be sure users have seen that warning or will remember it when the time counts. Another option is to rename This cannot happen if we split Hashconsed maps by value as in the new interface (eg. require knowing the value type when instantiating a Hashconsed functor), since there is no single Also I should mention in case it isn't clear: the new interface only impacts the hashconsing functors, all other functors (non-hashconsed maps and sets) work just like before and don't exhibit any breaking changes. |
I've thought of another advantage of requiring explicit value types for hash-consed functors: it enables custom hash and equality functions on values. In the fully generic version, the only hash we can use on values is let l1 = 5 :: 4 :: []
let l2 = 5 :: 4 :: [] (* equal to l1, but not physically equal *)
let m1 = HMap.singleton 5 l1 (* int list HMap.t *)
let m2 = HMap.singleton 5 l2 (* int list HMap.t *)
let b = HMap.equal m1 m2 (* false: values aren't physically equal *) That seems like a pretty hefty limitation don't you think ? |
I agree with this one. Some comments:
|
@mlemerre this should be ready for merge now, let me know what you think of the interface and included documentation. If it suits you, we'll merge and release. |
Needs: