Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Switch to hasher based hashing #3701

Draft
wants to merge 1 commit into
base: nightly
Choose a base branch
from

Conversation

mzaks
Copy link
Contributor

@mzaks mzaks commented Oct 22, 2024

This is a very large PR which does a complete switch to the Hasher based hash value computations.
It uses AHasher as default hash algorithm.
This PR also introduces a Fnv1a hasher which can be used for compile time hash value computation (AHasher algorithm does not to work at compile time at this point in time, it might be a compiler bug)

In the next PR we will introduce parametrisation for the Dict type, in order to allow users to inject non default hasher.

@mzaks
Copy link
Contributor Author

mzaks commented Oct 22, 2024

This is a draft because there were some unexpected compiler issues see tests where code is commented out with the message:
# TODO: this was working before, figure out what happened

@JoeLoser I would appreciate some help from the team as I think it should not have happened and might be a compiler issue.

@mzaks mzaks force-pushed the feature/complete-switch-to-hasher-based-hashing branch 6 times, most recently from 21e6dab to 21fe6e9 Compare October 24, 2024 12:10
@mzaks mzaks force-pushed the feature/complete-switch-to-hasher-based-hashing branch from 21fe6e9 to a5aff82 Compare October 29, 2024 08:19
@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Oct 31, 2024
@JoeLoser
Copy link
Collaborator

JoeLoser commented Nov 1, 2024

I'm seeing some crashes internally as well in interpreter due to AHash not being able to be used at compile time. I recommend adding some unit tests to check that out and report compiler bugs. We'll need that functionality work to make it easy to land this PR/some derivative of it internally (as we use Dict at compile time internally in some places). I think the common theme is:

note: failed to fold operation pop.bitcast(#pop<simd 16622636600618719510355724550245202455> : !pop.scalar<ui128>)

which is failing in the comp-time interpreter. Do you mind trying to find a minimal repro (I think hash.update_with_simd or the like with a ui128 type of yours would do the trick)? The less coupled it is to a change (i.e. this PR) would be even better so compiler devs can work off of something in nightly/standalone repro.

Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other high-level comment: perhaps it's useful to split up the fnv1a changes to a separate PR and we can land those?

@mzaks
Copy link
Contributor Author

mzaks commented Nov 1, 2024

I introduced fnv1a mainly because of ahash crashing at compile time. I can create a separate PR for fnv1a Hasher.

@JoeLoser
Copy link
Collaborator

JoeLoser commented Nov 1, 2024

I introduced fnv1a mainly because of ahash crashing at compile time. I can create a separate PR for fnv1a Hasher.

I figured, yeah. How bad is the load factor etc if we were to temporarily make fnv1a the default hasher until we allow injecting hasher for Dict? Then we could flip AHash to be the default and I could inject fnva1 internally where it's currently crashing as a workaround.

@mzaks
Copy link
Contributor Author

mzaks commented Nov 1, 2024

I introduced fnv1a mainly because of ahash crashing at compile time. I can create a separate PR for fnv1a Hasher.

I figured, yeah. How bad is the load factor etc if we were to temporarily make fnv1a the default hasher until we allow injecting hasher for Dict? Then we could flip AHash to be the default and I could inject fnva1 internally where it's currently crashing as a workaround.

AFAIK the quality of fnv1a is good it's just slow for long inputs as it works one byte at a time. So it should be fine to flip the default to be fnv1a.

@mzaks mzaks force-pushed the feature/complete-switch-to-hasher-based-hashing branch 2 times, most recently from 840b2b9 to 75bc7f3 Compare November 1, 2024 14:04
@mzaks
Copy link
Contributor Author

mzaks commented Nov 1, 2024

@JoeLoser I switched the default to be Fnv1a. Is this enough or should I still create another PR with just Fnv1a algorithm addition?

@JoeLoser
Copy link
Collaborator

JoeLoser commented Nov 1, 2024

@JoeLoser I switched the default to be Fnv1a. Is this enough or should I still create another PR with just Fnv1a algorithm addition?

Let's keep it as is and see if it still crashes internally or not and go from there. Does that work for you?

@JoeLoser
Copy link
Collaborator

JoeLoser commented Nov 1, 2024

!sync

@mzaks mzaks force-pushed the feature/complete-switch-to-hasher-based-hashing branch from 75bc7f3 to c21d14a Compare November 11, 2024 13:55
@JoeLoser
Copy link
Collaborator

@mzaks do you mind rebasing and see if this works correctly yet? I can sync it internally to see if it breaks anything internally too after that. I know @lattner has fixed several compiler bugs in this space in the past few weeks.

@JoeLoser JoeLoser added the waiting for response Needs action/response from contributor before a PR can proceed label Dec 16, 2024
@mzaks
Copy link
Contributor Author

mzaks commented Dec 17, 2024

@mzaks do you mind rebasing and see if this works correctly yet? I can sync it internally to see if it breaks anything internally too after that. I know @lattner has fixed several compiler bugs in this space in the past few weeks.

Hi @JoeLoser it seems like a big rebase, lots of stuff have change I might need to do some experiments first. So it will take some time.

@JoeLoser
Copy link
Collaborator

@mzaks do you mind rebasing and see if this works correctly yet? I can sync it internally to see if it breaks anything internally too after that. I know @lattner has fixed several compiler bugs in this space in the past few weeks.

Hi @JoeLoser it seems like a big rebase, lots of stuff have change I might need to do some experiments first. So it will take some time.

No worries, totally understand. Thank you! Let me know if we can help at all - still excited about this change!

@mzaks mzaks force-pushed the feature/complete-switch-to-hasher-based-hashing branch from c21d14a to 21a475d Compare December 17, 2024 15:42
@mzaks
Copy link
Contributor Author

mzaks commented Dec 17, 2024

@mzaks do you mind rebasing and see if this works correctly yet? I can sync it internally to see if it breaks anything internally too after that. I know @lattner has fixed several compiler bugs in this space in the past few weeks.

Hi @JoeLoser it seems like a big rebase, lots of stuff have change I might need to do some experiments first. So it will take some time.

No worries, totally understand. Thank you! Let me know if we can help at all - still excited about this change!

Just finished the rebase, sadly the compiler bug is still there, see for example: https://github.com/modularml/mojo/actions/runs/12376410778/job/34543473009?pr=3701#step:6:56

@JoeLoser
Copy link
Collaborator

@mzaks do you mind rebasing and see if this works correctly yet? I can sync it internally to see if it breaks anything internally too after that. I know @lattner has fixed several compiler bugs in this space in the past few weeks.

Hi @JoeLoser it seems like a big rebase, lots of stuff have change I might need to do some experiments first. So it will take some time.

No worries, totally understand. Thank you! Let me know if we can help at all - still excited about this change!

Just finished the rebase, sadly the compiler bug is still there, see for example: https://github.com/modularml/mojo/actions/runs/12376410778/job/34543473009?pr=3701#step:6:56

I see, these definitely look like parameter inference bugs. Do you mind please filing some (hopefully minimal) reproducers as a GitHub issue and reference this PR, and I'll make sure they get routed/prioritized appropriately internally?

Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other random/drive-by: I noticed our existing dunder hash functions return UInt instead of UInt64 (like finish API returns). Do you have any interest in explicitly changing that in top-of-tree today to align those until we can get rid of it in favor of this PR (once the compiler bugs are shaken out)?

from memory import UnsafePointer


struct Fnv1a(Hasher):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion Worth teasing this and its tests out to a separate PR since this one is blocked on compiler bugs?

@soraros
Copy link
Contributor

soraros commented Dec 19, 2024

I see, these definitely look like parameter inference bugs. Do you mind please filing some (hopefully minimal) reproducers as a GitHub issue and reference this PR, and I'll make sure they get routed/prioritized appropriately internally?

@JoeLoser #3708 contains a minimal repro, is it not enough?

@JoeLoser
Copy link
Collaborator

I see, these definitely look like parameter inference bugs. Do you mind please filing some (hopefully minimal) reproducers as a GitHub issue and reference this PR, and I'll make sure they get routed/prioritized appropriately internally?

@JoeLoser #3708 contains a minimal repro, is it not enough?

Oops, I missed that - sorry. Confirmed it still repros. I'll ask around internally to get it prioritized on the compiler team. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. waiting for response Needs action/response from contributor before a PR can proceed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants