-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validator cause hasher return different hash for the same key #31
Comments
Spoiler: it's actually a problem for theine as well. Since a key containing strings will select a random shard and hit ratio will be very small. |
If you just need a hasher for comparable types, I can recommend https://github.com/dolthub/maphash that uses a set of dirty tricks to pull a hash function from a standard map |
@zhenzou as @maypok86 pointed out, this is not related to validator. This is because current hash function of Theine calculate hash based on object's memory bytes. String is pointer to bytes so two strings have different hash even their value is same . I copy this from https://github.com/tidwall/hashmap because first version of Theine use that as internal map. I think the best way to handle this is adding an optional Hash interface, instead of using the maphash package which hack Go's runtime a lot. Let me do some investigations first and decide how to handle this properly |
@Yiling-J Yes, I don't believe it is issue of validator either. The strange thing is the issue will be disappears if remove the validate step. In my opinion, I create a new string every time, it should be different every time for current version even removed the validate step. func escape(obj any) {
if obj == nil {
panic("nil")
}
val := reflect.ValueOf(obj)
if val.Kind() != reflect.Struct {
panic("not struct")
}
} |
@Yiling-J Hi, We are using theine in our new project, and this issue block us, would you provide a workaround for this. type CacheKey interface{
BuildCacheKey() string
} |
@zhenzou I do a quick fix on master branch so you are not blocked. I still need some investigations before release a new version. About your validator question, I think "123" is interning automatically by Go(in new interface: type StringKey interface {
StringKey() string
} |
@zhenzou Actually there might be the case your key struct already has a StringKey method and do something else. So what i think is adding an extra method to builder, so the options are:
builder := theine.NewBuilder[Foo, string](1000)
builder.StringKey(func(key Foo) string {
return key.name
}) which one you think is better? |
@Yiling-J You are correct, add a custom StringKey builder instead of StringKey interface is a better solution. |
@Yiling-J I noted maphash because all other solutions will raise the same question, "What will you do if the key contains multiple strings?". |
@zhenzou main branch updated, please use the new builder method: builder := theine.NewBuilder[Foo, int](10000)
builder.StringKey(func(k Foo) string { return k.Bar }) |
@maypok86 seems the only special type as map key is string? because bytes is not allowed in comparable and pointers are always not equal. if so it's also possible to do a recursive check with reflect when hasher initialized, if there is string field and user doesn't provide a custom StringKey func(or maybe hash function), just panic. |
@Yiling-J Eh, if only it were that simple... Since Go 1.20 interfaces are comparable and this example will not pass: package kek
import (
"fmt"
"github.com/dolthub/maphash"
"github.com/zeebo/xxh3"
"testing"
"unsafe"
)
type Hasher[K comparable] struct {
ksize int
kstr bool
}
func NewHasher[K comparable]() *Hasher[K] {
h := &Hasher[K]{}
var k K
switch ((interface{})(k)).(type) {
case string:
h.kstr = true
default:
h.ksize = int(unsafe.Sizeof(k))
}
return h
}
func (h *Hasher[K]) Hash(key K) uint64 {
var strKey string
if h.kstr {
strKey = *(*string)(unsafe.Pointer(&key))
} else {
strKey = *(*string)(unsafe.Pointer(&struct {
data unsafe.Pointer
len int
}{unsafe.Pointer(&key), h.ksize}))
}
return xxh3.HashString(strKey)
}
type LolKek struct {
lol any
kek any
}
func TestHasher(t *testing.T) {
lol := 236
kek := [34]byte{1, 2, 3}
hasher := NewHasher[LolKek]()
h1 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h1)
h2 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h2)
if h1 != h2 {
t.Fatal("hashes should be the same")
}
}
func TestMapHash(t *testing.T) {
lol := 236
kek := [34]byte{1, 2, 3}
hasher := maphash.NewHasher[LolKek]()
h1 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h1)
h2 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h2)
if h1 != h2 {
t.Fatal("hashes should be the same")
}
} But it's more rare than this one (LolKek is just a composite key): type LolKek struct {
LolID string
KekID string
} |
If you're heading toward the route of needing I'd really prefer to see an API that doesn't force me to combine multiple I haven't looked at the internals of theine to know if this is feasible, but one way around this would be to say that a cache key must implement package main
import (
"bytes"
"encoding/binary"
"hash/maphash"
)
type cacheKey struct {
foo uint64
bar []byte
quux []byte
}
func (c cacheKey) Equals(other cacheKey) bool {
return c.foo == other.foo && bytes.Equal(c.bar, other.bar) && bytes.Equal(c.quux, other.quux)
}
func (c cacheKey) Hash(sum *maphash.Hash) {
var tmp [8]byte
binary.LittleEndian.PutUint64(tmp[:], c.foo)
sum.Write(tmp[:])
sum.Write(c.bar)
sum.Write(c.quux)
} |
@tv42 I think comparable is the right key type, and if we consider cache a hashmap, it's nature to use comparable as key. Go knows exactly how to handle comparable, it just didn't export interface or useful functions. The best 2 options to me is: Go provide a generic hash function or Go make hashmap key type implementable, like Rust. Currently I'm still thinking do a recursive reflect check when cache initilaized, if there is string and StringKey method is not implemented, just panic(any is valid in this check though, but when saving to real map, Go will panic). Another option is rewrite and avoid sharded map, because the hash value is only used to choose a shard. |
Golang: 1.21.4
And it will be ok if remove the validate step.
It maybe not the problem of the theine, but would you like give some suggestions to solve this problem, Validator is a wild used library, many other uses maybe face the same issues.
The text was updated successfully, but these errors were encountered: