-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Switch to hasher based hashing #3701
base: nightly
Are you sure you want to change the base?
[stdlib] Switch to hasher based hashing #3701
Conversation
This is a draft because there were some unexpected compiler issues see tests where code is commented out with the message: @JoeLoser I would appreciate some help from the team as I think it should not have happened and might be a compiler issue. |
21e6dab
to
21fe6e9
Compare
21fe6e9
to
a5aff82
Compare
!sync |
I'm seeing some crashes internally as well in interpreter due to
which is failing in the comp-time interpreter. Do you mind trying to find a minimal repro (I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other high-level comment: perhaps it's useful to split up the fnv1a changes to a separate PR and we can land those?
I introduced fnv1a mainly because of ahash crashing at compile time. I can create a separate PR for fnv1a Hasher. |
I figured, yeah. How bad is the load factor etc if we were to temporarily make |
AFAIK the quality of fnv1a is good it's just slow for long inputs as it works one byte at a time. So it should be fine to flip the default to be fnv1a. |
840b2b9
to
75bc7f3
Compare
@JoeLoser I switched the default to be Fnv1a. Is this enough or should I still create another PR with just Fnv1a algorithm addition? |
Let's keep it as is and see if it still crashes internally or not and go from there. Does that work for you? |
!sync |
75bc7f3
to
c21d14a
Compare
Hi @JoeLoser it seems like a big rebase, lots of stuff have change I might need to do some experiments first. So it will take some time. |
No worries, totally understand. Thank you! Let me know if we can help at all - still excited about this change! |
Signed-off-by: Maxim Zaks <[email protected]>
c21d14a
to
21a475d
Compare
Just finished the rebase, sadly the compiler bug is still there, see for example: https://github.com/modularml/mojo/actions/runs/12376410778/job/34543473009?pr=3701#step:6:56 |
I see, these definitely look like parameter inference bugs. Do you mind please filing some (hopefully minimal) reproducers as a GitHub issue and reference this PR, and I'll make sure they get routed/prioritized appropriately internally? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other random/drive-by: I noticed our existing dunder hash functions return UInt
instead of UInt64
(like finish
API returns). Do you have any interest in explicitly changing that in top-of-tree today to align those until we can get rid of it in favor of this PR (once the compiler bugs are shaken out)?
from memory import UnsafePointer | ||
|
||
|
||
struct Fnv1a(Hasher): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion Worth teasing this and its tests out to a separate PR since this one is blocked on compiler bugs?
@JoeLoser #3708 contains a minimal repro, is it not enough? |
Oops, I missed that - sorry. Confirmed it still repros. I'll ask around internally to get it prioritized on the compiler team. Thanks! |
This is a very large PR which does a complete switch to the Hasher based hash value computations.
It uses AHasher as default hash algorithm.
This PR also introduces a Fnv1a hasher which can be used for compile time hash value computation (AHasher algorithm does not to work at compile time at this point in time, it might be a compiler bug)
In the next PR we will introduce parametrisation for the Dict type, in order to allow users to inject non default hasher.