Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix point hash to be well distributed. #76153

Merged
merged 1 commit into from
Sep 6, 2024

Conversation

akrieger
Copy link
Member

@akrieger akrieger commented Sep 2, 2024

Summary

Performance "Improve point hash functions to eliminate map overhead in nps los checks and elsewhere"

Purpose of change

Bad hash functions can have catastrophic effects on performance. #76009 indirectly revealed that the custom std::hash specialization for point was returning poorly distributed values, including only even values. This results in collisions in hash tables like the lru_cache used for potential LOS lookups which results in 6-10x slowdowns in that function. Absoutely gross. Fix the hash so all tables keying off point don't suffer.

Describe the solution

Find a substantiated good 'low bias' public domain 64 bit mixer function. Apply it as-is to the existing point hashers. Don't worry about truncation in 32 bit builds cause who uses those anyway and in theory any bits in the output are good bits.

Describe alternatives you've considered

Testing

Waited 30 ingame minutes in the save from #76009, cost of has_potential_los has almost been eliminated entirely from ~300ms to 50ms, with the get call completely inlined.

image

Additional context

Bad hash functions are really bad.

I will look at tripoint later as it has a similar hash algorithm but using 3 terms might mitigate the negative effects. I will need to find or implement a hash combiner to handle three ints.

@github-actions github-actions bot added [C++] Changes (can be) made in C++. Previously named `Code` Code: Performance Performance boosting code (CPU, memory, etc.) json-styled JSON lint passed, label assigned by github actions labels Sep 2, 2024
@GuardianDll
Copy link
Member

is it possible to backport both of this to 0.H?

@akrieger
Copy link
Member Author

akrieger commented Sep 2, 2024

Yep, easily.

@GuardianDll
Copy link
Member

i'll do it then

@github-actions github-actions bot added the astyled astyled PR, label is assigned by github actions label Sep 2, 2024
@GuardianDll
Copy link
Member

GuardianDll commented Sep 2, 2024

hmm, point map::sees_cache_key from #76009 is not even presented in 0.H
i suddenly lost my confidency in my ability to backport it

GuardianDll added a commit that referenced this pull request Sep 2, 2024
@GuardianDll GuardianDll mentioned this pull request Sep 2, 2024
@akrieger
Copy link
Member Author

akrieger commented Sep 2, 2024

Yeah we don't need to backport #76009, this is enough for the performance benefit.

@akrieger
Copy link
Member Author

akrieger commented Sep 2, 2024

It's ironic that we apparently have a test for the hasher that is now broken.

@kevingranade
Copy link
Member

It's populating a vector with all the hash results for the coordinates from -300.-300 to 300.300, uniquifying them, then asserting that there are more than 0.9 * total_values_generated unique values.
It's failing with 181201 (0x2c3d1) > 325080.9 i.e. it expects at least 325,080 distinct values, but there are only 181,201.
Quick fix is to just remove that test, it's not really indicative of anything because having one average one collision from that set is fine, actually.

It might be interesting to do something like count the number of instances that collide with each other instead,but I don't have any really good grounding here.

The other complication is a given std::unordered_map instance isn't necessarily going to even look at all the bits of the returned hash, so I'm not sure that evaluating the whole number returned from the hash is even meaningful?

@akrieger
Copy link
Member Author

akrieger commented Sep 2, 2024

Thanks for that context.

I did some more thinking and I'm even more bothered by the point hasher. The output is functionally dominated by the y component which actually touches all the bits of the hash. The x component touches at most the lower 32 bits (maybe 33 if it overflows), but especially in this map it only touches the lower 7-8 bits, so the hash is functionally mostly just hashing mapping the y component.

@github-actions github-actions bot added the Code: Tests Measurement, self-control, statistics, balancing. label Sep 2, 2024
@CLIDragon
Copy link
Contributor

CLIDragon commented Sep 3, 2024

Given that std::hash is just the identity function on most implementations, I suspect we can do quite a bit better in terms of entropy. At the very least, we know that the Z value will not vary between -10 and 10, and even considering future changes should never be more than 8 bits.

Actually, -10 will use only the upper bits and none of the lower bits. I think that's probably a flaw in my implementation in #76009. Thinking about it again, we still get a difference in value as long as we include at least 9 bits, and I think I included 10 - e.g. -3 will be 11111100 in the bottom 8 bits. My implementation is still flawed.

We could use something like the following to measure the quality of the hash function. I would at least like to approach optimality (badness -> 0) for the case of bubble coordinates (or, to future proof, double the current bubble coordinates and 10 bits for z-levels).

https://artificial-mind.net/blog/2021/10/09/unordered-map-badness

template <class Map> 
double unordered_map_badness(Map const& map)
{
    auto const lambda = map.size() / double(map.bucket_count());

    auto cost = 0.;
    for (auto const& [k, _] : map)
        cost += map.bucket_size(map.bucket(k));
    cost /= map.size();

    return std::max(0., cost / (1 + lambda) - 1);
}

@CLIDragon
Copy link
Contributor

The other complication is a given std::unordered_map instance isn't necessarily going to even look at all the bits of the returned hash, so I'm not sure that evaluating the whole number returned from the hash is even meaningful?

On VS and for libc++ at least the internal map from hash to index is something like i = h & (bucket_count - 1) so for our use case of having < 4,294,967,296 entries we don't have the top 32 bits, and for the common use case of having ~132×132×20 entries, we only have the lower ~19 bits.

As a side note, do we use libc++? And if we do, are we using powers-of-2 or prime numbers mode?

@akrieger
Copy link
Member Author

akrieger commented Sep 3, 2024

Given that std::hash is just the identity function on most implementations

That is wild, considering (I thought) that an ideal hash should have negligible correlation between input bits and output bits. Changing one bit in the input 'should' change half the output bits.

I just checked msvc and it's using fnv hash for all trivial types including integers, but indeed godbolt asserts gcc/clang (with their default settings which I guess is linking libstdc++) do not.

We don't do any standard library shenanigans, but idk which we are linking against for releases. Probably not llvm except on mac builds.

@kevingranade
Copy link
Member

Oh hmm, I checked running the previous interleave-based value through std::hash and yep it had exactly the same results as without hashing.
So basically we want a "real" hash function that scrambles the bits. Something with a simple reference implementation is fine, and something in the std is great, and something complex but fast from a header only library is ok if it's better by enough to be worth it.

@kevingranade
Copy link
Member

I've been drafting a test that tries to characterize unordered map badness, but with much less refinement, I'll see about switching to that snippet.
Taking a look at some published fast hashers like xor-shift type stuff after concatenating x and y.

@akrieger
Copy link
Member Author

akrieger commented Sep 3, 2024

I will grab an OSS fnv hash which is what msvc already uses. This also then solves tripoint as well.

@kevingranade
Copy link
Member

In practice, the current state of the PR that just concatenates x and y (and runs it through the hash function that may not do anything) is getting a 0.0 on that badness evaluation when I run the coordinates { -300, -300 } to { 300, 300 } through it.

I'm not against chasing down further improvements, but this is fine, actually, and I need to double check but I think it's actually collision-free in the positive quadrant.

struct new_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        return std::hash<uint64_t> {}( static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y ) );
    }
};

struct old_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        constexpr uint64_t a = 2862933555777941757;
        size_t result = k.y;
        result *= a;
        result += k.x;
        return result;
    }
};

struct xor_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        uint64_t x = static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y );
        x ^= x << 13;
        x ^= x >> 7;
        x ^= x << 17;
        return x;
    }
};


template <class Set>
double unordered_set_badness( Set &test_set )
{
    for( int x = -MAX_COORDINATE; x <= MAX_COORDINATE; ++x ) {
        for( int y = -MAX_COORDINATE; y <= MAX_COORDINATE; ++y ) {
            test_set.emplace( x, y );
        }
    }
    size_t num_buckets = test_set.bucket_count();
    WARN( "buckets: " <<  num_buckets );
    double const lambda = test_set.size() / double( num_buckets );
    std::array<uint64_t, 3> histogram = {0};
    double cost = 0.0;
    for( int i = 0; i < num_buckets; ++i ) {
        int size = test_set.bucket_size( i );
        cost += size * size;
        if( test_set.bucket_size( i ) >= 3 ) {
            WARN( "Encountered bucket with " << test_set.bucket_size( i ) << " elements." );
        } else {
            histogram[test_set.bucket_size( i )] += 1;
        }
    }
    cost /= test_set.size();
    INFO( "histogram[1] counts the number of hash buckets with a single element." );
    INFO( "histogram[2] counts the number of hash buckets with 2 elements." );
    INFO( "A failure here means that there are more elements landing in shared buckets" );
    INFO( "(and experiencing worse performance) than there are elements alone in their bucket." );
    CHECK( histogram[1] > histogram[2] * 2 );
    return std::max( 0.0, cost / ( 1 + lambda ) - 1 );
}

TEST_CASE( "point_set_collision_check" )
{
    std::unordered_set<point, old_test_hash> old_set;
    CHECK( unordered_set_badness( old_set ) == 0.0 );
    std::unordered_set<point, new_test_hash> new_set;
    CHECK( unordered_set_badness( new_set ) == 0.0 );
    std::unordered_set<point, xor_test_hash> xor_set;
    CHECK( unordered_set_badness( xor_set ) == 0.0 );
}

@CLIDragon
Copy link
Contributor

CLIDragon commented Sep 3, 2024

In practice, the current state of the PR that just concatenates x and y (and runs it through the hash function that may not do anything) is getting a 0.0 on that badness evaluation when I run the coordinates { -300, -300 } to { 300, 300 } through it.

Are you building on MSVC or gcc/clang? And if building on clang with libc++, are you using powers of 2 or prime numbers?

@kevingranade
Copy link
Member

kevingranade commented Sep 3, 2024

I'm building on a kind of ancient gcc 9.4.0
I also noticed that if I restrict the data inputs to the range that covers the map (0, 0) to (132, 132) the current pure concatenation is not evaluating to 0, just 0.2345278512.
Just for the heck of it I dropped in an xor shift and it's yielding a badness of 0.0204669294 in that same domain.
I dropped in a FNV implementation based on the reference at http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1a but it's in between concatenation and the xor-shift with a badness score of 0.1352797591

So the current lead is this xor-shift, also it's much faster than FNV since it's a whole integer operation instead of a bunch of individual octet operations.

struct xor_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        uint64_t x = static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y );
        x ^= x << 13;
        x ^= x >> 7;
        x ^= x << 17;
        return x;
    }
};

For reference and review, this is my FNV

#define FNV_64_PRIME static_cast<uint64_t>(0x100000001b3ULL)
#define FNV1_64_INIT static_cast<uint64_t>(0xcbf29ce484222325ULL)

struct fnv_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        uint64_t hval = FNV1_64_INIT;
        hval ^= static_cast<uint64_t>( k.x & 0x000F );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( k.x & 0x00F0 );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( k.x & 0x0F00 );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( k.x & 0xF000 );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( k.y & 0x000F );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( k.y & 0x00F0 );
	hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( k.y & 0x0F00 );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( k.y & 0xF000 );
        hval *= FNV_64_PRIME;

        return hval;
    }
};

EDIT
Oh I'm a dumb, terrible bug in the FNV, I forgot to shift my input octets so they're discarding entropy right and left.

@kevingranade
Copy link
Member

After sorting out the issues in the FNV algorithm around unpacking the members into octets, it's scoring a 0.0 on the badness algorithm with the map-scale test.
It's performing worse in the (-300,-300) to (300,300) test, I might have some error around negative number handling?

@CLIDragon
Copy link
Contributor

CLIDragon commented Sep 3, 2024

std::unordered_set appears to be incapable of creating buckets with multiple entries. Changing the code to use std::unordered_map gives a more reasonable badness of 1.305571 with the new hash, 29.393343 with the old hash, and 213862.01 with the xor hash. Additionally, all elements of the xor hash are stored in buckets of size 601.

Then, switching from MSVC std::hash to using static_cast<size_t> (to mimic identity hash) gives 213862 badness, as should be expected.

@kevingranade
Copy link
Member

std::unordered_set appears to be incapable of creating buckets with multiple entries.

That's library specific then because I'm seeing results with multiple entries per bucket. I'll switch to map and see how it goes though.

Additionally, all elements of the xor hash are stored in buckets of size 601.

This is the same symptom that was appearing earlier with concatenation, sign extension of the y member was stomping over the data from the x member.

@akrieger
Copy link
Member Author

akrieger commented Sep 3, 2024

msvc with fnv (and unordered_map) is a clear winner over the symmetric range -300,-300 -> 300,300. 'new hash' and 'fnv hash' giving identical results is good for validating the handrolled fnv matches spec.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(145): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  2106 (0x83a) > 4212 (0x1074)
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(152): FAILED:
  CHECK( unordered_set_badness(old_set) == 0.0 )
with expansion:
  3.1786902483 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(154): FAILED:
  CHECK( unordered_set_badness(new_set) == 0.0 )
with expansion:
  0.0092805273 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(145): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  0 > 0
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(156): FAILED:
  CHECK( unordered_set_badness(xor_set) == 0.0 )
with expansion:
  354.845287745 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(158): FAILED:
  CHECK( unordered_set_badness(fnv_set) == 0.0 )
with expansion:
  0.0092805273 == 0.0

For the positive quadrant, fnv gives perfect results and xor is worst again.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(145): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  2106 (0x83a) > 4212 (0x1074)
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(152): FAILED:
  CHECK( unordered_set_badness(old_set) == 0.0 )
with expansion:
  1.1155708737 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(145): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  0 > 0
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(156): FAILED:
  CHECK( unordered_set_badness(xor_set) == 0.0 )
with expansion:
  176.9768938933 == 0.0

I hacked up some braindead equivalents of the hashes for tripoint, and it does not suffer the downsides of the old hash as badly. Basic hash is still bad at 1.3 something but the rest pass, even just expanding concathash to 'xor with hash of z'.

@CLIDragon
Copy link
Contributor

Try replacing the xor_test (given that it is clearly bad) with a simple xorshift. I get 0.069279 with this and 1.11878 with FNV across (-300,-300) -> (300,300) on MSVC 19.40.33811

struct xor_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        uint64_t x = static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y );
        return x *  0xd989bcacc137dcd5ull >> 32u;
    }
};
Code
#include <iostream>
#include <unordered_map>
#include <array>

struct point {
    int x;
    int y;
    
    friend bool operator==(const point &p, const point & other) {
        return p.x == other.x && p.y == other.y;
    }
    
    point(int x, int y) : x (x), y(y) {};
};
  
const int MAX_COORDINATE = 300;

struct new_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        return std::hash<uint64_t> {}((static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y )));
    }
};


struct old_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        constexpr uint64_t a = 2862933555777941757;
        size_t result = k.y;
        result *= a;
        result += k.x;
        return result;
    }
};

struct xor_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        uint64_t x = static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y );
        return x *  0xd989bcacc137dcd5ull >> 32u;
    }
};

#define FNV_64_PRIME static_cast<uint64_t>(0x100000001b3ULL)
#define FNV1_64_INIT static_cast<uint64_t>(0xcbf29ce484222325ULL)

struct fnv_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        const uint32_t x = k.x;
        const uint32_t y = k.y;
        uint64_t hval = FNV1_64_INIT;
        hval ^= static_cast<uint64_t>( x & 0xFF );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( x >> 8 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( x >> 16 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( x >> 24 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( y & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( y >> 8 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( y >> 16 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( y >> 24 ) & 0xFFul );
        hval *= FNV_64_PRIME;

        return hval;
    }
};



template <class Map>
double unordered_set_badness( Map &test_map )
{
    new_test_hash hash;
    for( int x = -MAX_COORDINATE; x <= MAX_COORDINATE; ++x ) {
        for( int y = -MAX_COORDINATE; y <= MAX_COORDINATE; ++y ) {
            test_map.emplace( point(x, y), 1 );
        }
    }

    size_t num_buckets = test_map.bucket_count();
    double const lambda = test_map.size() / double( num_buckets );
    std::unordered_map<uint64_t, uint64_t> histogram;
    double cost = 0.0;
    for( auto it = test_map.begin(); it != test_map.end(); ++it ) {
        auto i = test_map.bucket(it->first);
        // printf("(x, y): (%d, %d) b: %x h: %x \n", it->first.x, it->first.y, i, hash(it->first));
        int size = test_map.bucket_size( i );
        cost += size * size;
        auto result = histogram.find(size);
        if (result != histogram.end()) {
            result->second += 1;
        } else {
            histogram.emplace(size, 1);
        }
    }

    printf("Bucket Size Distribution\n");
    for (auto const &x : histogram) {
        printf("Size %lld: %lld \n", x.first, x.second);
    }
    cost /= test_map.size();
    return std::max( 0.0, cost / ( 1 + lambda ) - 1 );
}


int main() {
    std::unordered_map<point, char, new_test_hash> new_map;
    std::unordered_map<point, char, fnv_test_hash> fnv_map;
    std::unordered_map<point, char, xor_test_hash> xor_map;


    printf("New Hash:\n");
    double bad = unordered_set_badness( new_map );
    printf("%f \n\n", bad);

    printf("FNV Hash:\n");
    double bad2 = unordered_set_badness( fnv_map );
    printf("%f \n\n", bad2);

    printf("XOR Hash:\n");
    double bad3 = unordered_set_badness( xor_map );
    printf("%f \n\n", bad3);

    return 0;
}

@akrieger
Copy link
Member Author

akrieger commented Sep 3, 2024

I'm kind of surprised you're getting different results than me with fnv hash. 64 bit right?

@kevingranade
Copy link
Member

kevingranade commented Sep 3, 2024

Hey CLIDragon I think I see why you're seeing higher numbers, you're cubing the bucket cost instead of squaring it.
In the article it does:

for each key:
    cost += bucket_size

in my test it does:

for each bucket:
    cost += bucket_size * bucket_size

in your test you do:

for each key:
    cost += bucket_size * bucket_size

@CLIDragon
Copy link
Contributor

Hey CLIDragon I think I see why you're seeing higher numbers, you're cubing the bucket cost instead of squaring it.

Yep, that was the issue. I now get 0.009281 for FNV (matching akrieger) and 0.039888 for concatenation then FNV (presumably due to poor handling of negatives), and 0.000000 for the xorshift based approach after concatenation.

@kevingranade
Copy link
Member

I'm starting to get results from the test PR:
Windows build, windows-2019 container, MINGW64_NT-10.0-17763 fv-az1494-998 3.5.3-d8b21b8c.x86_64 2024-07-09 18:03 UTC x86_64 Msys

This matches what you both are reporting from windows.
old hash isn't THAT bad in the positive domain, but terrible otherwise.
The XOR-shift I added is terrible on this platform, I repeat this looks like a bug, either x or y is stomping on the other which is why we're getting 601 entries per bucket on the 600x600 test and 133 per bucket on the 132x132 test.
Concatenate and std::hash is great-to-perfect in both settings.
FNV is great-to-perfect in both settings.

point_set_collision_check
601x601 SET

  OLD HASH
  buckets: 524288
  Encountered bucket with 7 elements.
  2106 (0x83a) > 4212 (0x1074)
  CHECK( unordered_set_badness( old_set, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  3.1786902483 == 0.0

  NEW HASH
  buckets: 524288
  Encountered bucket with 4 elements.
  CHECK( unordered_set_badness( new_set, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.0092805273 == 0.0

  XOR HASH
  buckets: 524288
  Encountered bucket with 601 elements. (!!!)
  0 > 0 (!!!!!!!!)
  CHECK( unordered_set_badness( xor_set, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  354.845287745 == 0.0  (!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!)

  FNV HASH
  buckets: 524288
  Encountered bucket with 4 elements.
  CHECK( unordered_set_badness( fnv_set, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.0092805273 == 0.0

POSITIVE QUADRANT SET

  OLD HASH
  buckets: 32768
  5209 (0x1459) > 12480 (0x30c0)
  CHECK( unordered_set_badness( map_sized_old, 0, 11 * 12 ) == 0.0 )
  0.1076081501 == 0.0

  NEW HASH
  buckets: 32768

  XOR HASH
  buckets: 32768
  Encountered bucket with 133 elements.
  0 > 0
  CHECK( unordered_set_badness( map_sized_xor, 0, 11 * 12 ) == 0.0 )
  85.3734268783 == 0.0

  FNV HASH
  buckets: 32768

point_map_collision_check
601x601 MAP

  OLD HASH
  buckets: 524288
  Encountered bucket with 7 elements.
  2106 (0x83a) > 4212 (0x1074)
  CHECK( unordered_map_badness( old_map, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  3.1786902483 == 0.0

  NEW HASH
  buckets: 524288
  Encountered bucket with 4 elements.
  CHECK( unordered_map_badness( new_map, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.0092805273 == 0.0

  XOR HASH
  buckets: 524288
  Encountered bucket with 601 elements.
  0 > 0
  CHECK( unordered_map_badness( xor_map, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  354.845287745 == 0.0

  FNV HASH
  buckets: 524288
  Encountered bucket with 4 elements.
  CHECK( unordered_map_badness( fnv_map, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.0092805273 == 0.0

POSITIVE QUADRANT MAP

  OLD HASH
  buckets: 32768
  5209 (0x1459) > 12480 (0x30c0)
  CHECK( unordered_map_badness( map_sized_old, 0, 11 * 12 ) == 0.0 )
  0.1076081501 == 0.0

  NEW HASH
  buckets: 32768

  XOR HASH
  buckets: 32768
  Encountered bucket with 133 elements.
  0 > 0
  CHECK( unordered_map_badness( map_sized_xor, 0, 11 * 12 ) == 0.0 )
  85.3734268783 == 0.0

  FNV HASH
  buckets: 32768

On the linux build that has completed so far it matches my local results despite being on a different compiler.
Old is bad, always.
concatenate-and-std::hash is perfect on the large/negative set, bad on the positive set.
XOR is excellent everywhere.
FNV is good in the positive coordinates and bad in the large/negative coordinates.

point_set_collision_check

  OLD HASH
  buckets: 712697
  Encountered bucket with 3 elements.
  75307 (0x1262b) > 237954 (0x3a182)
  CHECK( unordered_set_badness( old_set, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.2770256725 == 0.0

  NEW HASH
  buckets: 712697

  XOR HASH
  buckets: 712697
  Encountered bucket with 3 elements.

  FNV HASH
  buckets: 712697
  Encountered bucket with 4 elements.
  CHECK( unordered_set_badness( fnv_set, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.2047147495 == 0.0

POSITIVE QUADRANT SET

  OLD HASH
  buckets: 20753
  Encountered bucket with 11 elements.
  594 (0x252) > 1188 (0x4a4)
  CHECK( unordered_set_badness( map_sized_old, 0, 11 * 12 ) == 0.0 )
  2.4355648745 == 0.0

  NEW HASH
  buckets: 20753
  Encountered bucket with 3 elements.
  1254 (0x4e6) > 10108 (0x277c)
  CHECK( unordered_set_badness( map_sized_new, 0, 11 * 12 ) == 0.0 )
  0.2345278512 == 0.0

  XOR HASH
  buckets: 20753
  Encountered bucket with 3 elements.
  CHECK( unordered_set_badness( map_sized_xor, 0, 11 * 12 ) == 0.0 )
  0.0204669294 == 0.0

  FNV HASH
  buckets: 20753
  Encountered bucket with 3 elements.

point_map_collision_check
601x601 MAP

  OLD HASH
  buckets: 712697
  Encountered bucket with 3 elements.
  75307 (0x1262b) > 237954 (0x3a182)
  CHECK( unordered_map_badness( old_map, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.2770256725 == 0.0

  NEW HASH
  buckets: 712697

  XOR HASH
  buckets: 712697
  Encountered bucket with 3 elements.

  FNV HASH
  buckets: 712697
  Encountered bucket with 4 elements.
  CHECK( unordered_map_badness( fnv_map, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.2047147495 == 0.0

POSITIVE QUADRANT MAP

  OLD HASH
  buckets: 20753
  Encountered bucket with 11 elements.
  594 (0x252) > 1188 (0x4a4)
  CHECK( unordered_map_badness( map_sized_old, 0, 11 * 12 ) == 0.0 )
  2.4355648745 == 0.0

  NEW HASH
  buckets: 20753
  Encountered bucket with 3 elements.
  1254 (0x4e6) > 10108 (0x277c)
  CHECK( unordered_map_badness( map_sized_new, 0, 11 * 12 ) == 0.0 )
  0.2345278512 == 0.0

  XOR HASH
  buckets: 20753
  Encountered bucket with 3 elements.
  CHECK( unordered_map_badness( map_sized_xor, 0, 11 * 12 ) == 0.0 )
  0.0204669294 == 0.0

  FNV HASH
  buckets: 20753
  Encountered bucket with 3 elements.

@kevingranade
Copy link
Member

kevingranade commented Sep 3, 2024

Now we have mac results.
OLD HASH IS FINE, ACTUALLY???
Mac hates new hash with the large/negative input set.
Mac is fine with anything, actually?
It looks like the mac std is actually doing a hash inside of unordered map/set?
This is bizzare though, because I would expect "just concatenate and feed it to std::hash()" to be a good option then?

point_set_collision_check
601x601 SET

  OLD HASH
  buckets: 411527

  NEW HASH
  buckets: 411527
  Encountered bucket with 6 elements.
  19400 (0x4bc8) > 38800 (0x9790)
  CHECK( unordered_set_badness( new_set, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.9750146534 == 0.0

  XOR HASH
  buckets: 411527
  Encountered bucket with 4 elements.

  FNV HASH
  buckets: 411527
  Encountered bucket with 3 elements.

POSITIVE QUADRANT SET
  OLD HASH
  buckets: 25717

  NEW HASH
  buckets: 25717
  8113 (0x1fb1) > 9576 (0x2568)

  XOR HASH
  buckets: 25717
  Encountered bucket with 3 elements.

  FNV HASH
  buckets: 25717

point_map_collision_check
601x601 MAP
  OLD HASH
  buckets: 411527

  NEW HASH
  buckets: 411527
  Encountered bucket with 6 elements.
  19400 (0x4bc8) > 38800 (0x9790)
  CHECK( unordered_map_badness( new_map, -MAX_COORDINATE, MAX_COORDINATE ) == 0.0 )
  0.9750146534 == 0.0

  XOR HASH
  buckets: 411527
  Encountered bucket with 4 elements.

  FNV HASH
  buckets: 411527
  Encountered bucket with 3 elements.

POSITIVE QUADRANT SET
  OLD HASH
  buckets: 25717

  NEW HASH
  buckets: 25717
  8113 (0x1fb1) > 9576 (0x2568)

  XOR HASH
  buckets: 25717
  Encountered bucket with 3 elements.

  FNV HASH
  buckets: 25717

@CLIDragon
Copy link
Contributor

CLIDragon commented Sep 4, 2024

My questions are 1) why does XORSHIFT perform so terribly in windows?

  1. What does Mac do differently that increases the performance of basically all hashes? (and why does the new hash perform worse?)
    Mac links against libc++

  2. Linux and Mac seem to return lower results than windows. Why is that?

I suspect this is due to using a prime number bucket size instead of a power of 2 bucket size. See llvm/llvm-project@4cb38a8 for more information.

@kevingranade
Copy link
Member

Based on the symptoms it's only incorporating variations from either the x or y component into the active hash (the part std::unordered_map is using). I'm not at all clear why, this is a very standard and simple hash/prng step.

I'm probably not going to have much testing time soon, the next thing I would check is whether it's rows or columns that are ending up in the same buckets, and looking at inouts vs outputs of the xor-shift to see if it's doing anything obviously weird.

Based on how it's performing I'm suspicious it's doing something very weird like re-hashing the input hash.

Regardinging bucket count growth, I agree with your theory, also I'm not particularly concerned about it.

@CLIDragon
Copy link
Contributor

CLIDragon commented Sep 4, 2024

Based on the symptoms it's only incorporating variations from either the x or y component into the active hash (the part std::unordered_map is using). I'm not at all clear why, this is a very standard and simple hash/prng step.

A sample of output from the XOR hash is below (b is the bucket, h is the hash output):

(x, y): (0, -9) b: 11dc8 h: c5911dc8
(x, y): (1, -9) b: 11dc8 h: c7911dc8
(x, y): (2, -9) b: 11dc8 h: c1911dc8
...
(x, y): (290, -8) b: 6fe07 h: 460efe07
(x, y): (289, -8) b: 6fe07 h: 400efe07
(x, y): (288, -8) b: 6fe07 h: 420efe07

Altering the x value with a constant y only alters the top few bytes. Adding a simple shift at the end of the process means that these bytes are part of the section considered by the map. As this discards the bottom 16 bytes, I suspect this will cause a degenerate case around $2^{32}$ map elements, however that is so large as to be a nonissue.

However, there is a degenerate case with maps with less than ~40000 elements when including negative numbers, and less than ~10000 with only positive numbers.

struct xor_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        uint64_t x = static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y );
        x ^= x << 13;
        x ^= x >> 7;
        x ^= x << 17;
        return x >> 16;  //Only this line has been changed. 16 seems to be optimal.
    }
};

@kevingranade
Copy link
Member

kevingranade commented Sep 4, 2024

How about we do a circular shift:
return x >> 16 | ( x & 0xFFFF ) << 48;

Or a chunky interleave:

return ( x & 0xFF ) |
       ( x & 0xFF00 ) << 16 |
       ( x & 0xFF0000 ) << 32 |
       ( x & 0xFF000000 ) << 48 |
       ( x & 0xFF00000000 ) >> 24 |
       ( x & 0xFF0000000000 ) >> 16 |
       ( x & 0xFF000000000000 ) >> 8;

Alternately, there are other constants for xor-shift that should better propogate the low order x bits to the low order bits of the hash.

If you have a candidate that sorts things out for you locally just paste it here and I can slap it into the test PR.

Additionally, the ideal might be a full interleave as outlined here https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableObvious but that's potentially high impact for runtime.

This is what the previous hash was trying to do.

@akrieger
Copy link
Member Author

akrieger commented Sep 4, 2024

I did more digging into the literature around hash prospector and found https://nullprogram.com/blog/2018/07/31/ where he states that for 64 bit hashes, there are a couple which he hasn't found better, and one is public domain: https://xoshiro.di.unimi.it/splitmix64.c.

uint64_t
splittable64(uint64_t x)
{
    x ^= x >> 30;
    x *= 0xbf58476d1ce4e5b9U;
    x ^= x >> 27;
    x *= 0x94d049bb133111ebU;
    x ^= x >> 31;
    return x;
}

and for 32 bit from the original source

uint32_t
lowbias32(uint32_t x)
{
    x ^= x >> 16;
    x *= 0x21f0aaad;
    x ^= x >> 15;
    x *= 0xd35a2d97;
    x ^= x >> 15;
    return x;
}

So I would say we should try those, and dispatch to the right one for 64 or 32 bit hashes. imul is definitely Fast Enough on any silicon from the past decade.

@akrieger
Copy link
Member Author

akrieger commented Sep 4, 2024

-------------------------------------------------------------------------------
point_set_collision_check
-------------------------------------------------------------------------------
C:\code\Cataclysm-DDA\tests\hash_test.cpp(133)
...............................................................................

C:\code\Cataclysm-DDA\tests\hash_test.cpp(137):
warning:
  Symmetric points

C:\code\Cataclysm-DDA\tests\hash_test.cpp(139):
warning:
  Old hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107):
warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(118):
warning:
  Encountered bucket with 7 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(129): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  2106 (0x83a) > 4212 (0x1074)
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(141): FAILED:
  CHECK( unordered_set_badness(old_set, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  3.1786902483 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(144): warning:
  New hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(118): warning:
  Encountered bucket with 4 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(146): FAILED:
  CHECK( unordered_set_badness(new_set, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  0.0092805273 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(149): warning:
  Xor hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(118): warning:
  Encountered bucket with 601 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(129): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  0 > 0
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(151): FAILED:
  CHECK( unordered_set_badness(xor_set, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  354.845287745 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(154): warning:
  Slosh hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(118): warning:
  Encountered bucket with 3 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(156): FAILED:
  CHECK( unordered_set_badness(slosh_set, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  0.0022056447 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(159): warning:
  Fnv hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(118): warning:
  Encountered bucket with 4 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(161): FAILED:
  CHECK( unordered_set_badness(fnv_set, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  0.0092805273 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(165): warning:
  Positive points

C:\code\Cataclysm-DDA\tests\hash_test.cpp(167): warning:
  Old hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(129): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  5209 (0x1459) > 12480 (0x30c0)
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(169): FAILED:
  CHECK( unordered_set_badness(set_sized_old, 0, 11 * 12) == 0.0 )
with expansion:
  0.1076081501 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(172): warning:
  New hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(177): warning:
  Xor hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(118): warning:
  Encountered bucket with 133 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(129): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  0 > 0
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(179): FAILED:
  CHECK( unordered_set_badness(set_sized_xor, 0, 11 * 12) == 0.0 )
with expansion:
  85.3734268783 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(182): warning:
  Slosh hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(118): warning:
  Encountered bucket with 3 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(184): FAILED:
  CHECK( unordered_set_badness(set_sized_slosh, 0, 11 * 12) == 0.0 )
with expansion:
  0.000184588 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(187): warning:
  Fnv hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(107): warning:
  buckets: 32768

-------------------------------------------------------------------------------
point_map_collision_check
-------------------------------------------------------------------------------
C:\code\Cataclysm-DDA\tests\hash_test.cpp(230)
...............................................................................

C:\code\Cataclysm-DDA\tests\hash_test.cpp(233): warning:
  Symmetric points

C:\code\Cataclysm-DDA\tests\hash_test.cpp(235): warning:
  Old hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(215): warning:
  Encountered bucket with 7 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(226): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  2106 (0x83a) > 4212 (0x1074)
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(237): FAILED:
  CHECK( unordered_map_badness(old_map, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  3.1786902483 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(240): warning:
  New hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(215): warning:
  Encountered bucket with 4 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(242): FAILED:
  CHECK( unordered_map_badness(new_map, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  0.0092805273 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(245): warning:
  Xor hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(215): warning:
  Encountered bucket with 601 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(226): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  0 > 0
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(247): FAILED:
  CHECK( unordered_map_badness(xor_map, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  354.845287745 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(250): warning:
  Slosh hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(215): warning:
  Encountered bucket with 3 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(252): FAILED:
  CHECK( unordered_map_badness(slosh_map, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  0.0022056447 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(255): warning:
  Fnv hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 524288

C:\code\Cataclysm-DDA\tests\hash_test.cpp(215): warning:
  Encountered bucket with 4 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(257): FAILED:
  CHECK( unordered_map_badness(fnv_map, -MAX_COORDINATE, MAX_COORDINATE) == 0.0 )
with expansion:
  0.0092805273 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(261): warning:
  Positive points

C:\code\Cataclysm-DDA\tests\hash_test.cpp(263): warning:
  Old hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(226): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  5209 (0x1459) > 12480 (0x30c0)
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(265): FAILED:
  CHECK( unordered_map_badness(map_sized_old, 0, 11 * 12) == 0.0 )
with expansion:
  0.1076081501 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(268): warning:
  New hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(273): warning:
  Xor hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(215): warning:
  Encountered bucket with 133 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(226): FAILED:
  CHECK( histogram[1] > histogram[2] * 2 )
with expansion:
  0 > 0
with messages:
  histogram[1] counts the number of hash buckets with a single element.
  histogram[2] counts the number of hash buckets with 2 elements.
  A failure here means that there are more elements landing in shared buckets
  (and experiencing worse performance) than there are elements alone in their
  bucket.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(275): FAILED:
  CHECK( unordered_map_badness(map_sized_xor, 0, 11 * 12) == 0.0 )
with expansion:
  85.3734268783 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(278): warning:
  Slosh hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 32768

C:\code\Cataclysm-DDA\tests\hash_test.cpp(215): warning:
  Encountered bucket with 3 elements.

C:\code\Cataclysm-DDA\tests\hash_test.cpp(280): FAILED:
  CHECK( unordered_map_badness(map_sized_slosh, 0, 11 * 12) == 0.0 )
with expansion:
  0.000184588 == 0.0

C:\code\Cataclysm-DDA\tests\hash_test.cpp(283): warning:
  Fnv hash

C:\code\Cataclysm-DDA\tests\hash_test.cpp(204): warning:
  buckets: 32768

I call it 'slosh hash' in the test because they self describe it as 'sloshing bits around' (xor to the right, mul to the left). It does better than fnv on msvc and its only 3 shifts, 2 muls, and 3 xors for point.

For 64-to-32 we can just xor the high bits over the low bits and keep the same base hash.

I pushed that hash and some more messaging to #76162.

@kevingranade
Copy link
Member

kevingranade commented Sep 5, 2024

Oh see I tried to look up slosh hash and found
https://www.computer.org/csdl/proceedings-article/wacv/2024/189200c554/1W0dPcG9KKc

"SLoSH: Set Locality Sensitive Hashing via Sliced-Wasserstein Embeddings"

@kevingranade
Copy link
Member

Looks like it's "fine" (< 0.02 badness for all tests) on both linux and windows.
Just need to see it working on mac and I think we're set.

@akrieger
Copy link
Member Author

akrieger commented Sep 5, 2024

Looks like on mac the worst is set with badness 0.0012366973 which is approximately zero. And the rest were badness zero.

@akrieger akrieger force-pushed the pointing_fingers branch 2 times, most recently from 8a1ac59 to bcac8ac Compare September 5, 2024 03:29
@akrieger
Copy link
Member Author

akrieger commented Sep 5, 2024

I just realized I had copied the MIT licensed hash and not the public one I had referenced in comments. They are supposed to perform similarly well, but I have resubmitted the hash tests with the actually public domain one.

@github-actions github-actions bot added the BasicBuildPassed This PR builds correctly, label assigned by github actions label Sep 5, 2024
@CLIDragon
Copy link
Contributor

CLIDragon commented Sep 5, 2024

A comparison of the hash functions (on Windows) across different input ranges shows that SplitMix64 is the clear winner, though honestly the difference between all 4 is tiny. Personally, I advocate SplitMix64, as it seems to be the best quality, it's public domain, and well-tested.

EDIT: Added the old hash for comparison.
image

Hash Code

XOR Hash (actually has nothing to do with XOR, and is poorly named).

struct xor_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        uint64_t x = static_cast<uint64_t>( k.x ) << 32 | static_cast<uint32_t>( k.y );
        return x *  0xd989bcacc137dcd5ull >> 32u;
    }
};

Split Mix 64 (public domain, linked above by akrieger)

struct split_mix_64 {
uint64_t operator()(const point &k) const noexcept {
    uint64_t x = static_cast<uint64_t>(k.x) << 32 | static_cast<uint32_t> (k.y);
    x ^= x >> 30;
    x *= 0xbf58476d1ce4e5b9U;
    x ^= x >> 27;
    x *= 0x94d049bb133111ebU;
    x ^= x >> 31;
    return x;
}
};

Xoshiro Hash (MIT licensed)

struct xoshiro_hash {
uint64_t operator()(const point &k) const noexcept {
    uint64_t x = static_cast<uint64_t>(k.x) << 32 | static_cast<uint32_t> (k.y);
    x ^= x >> 32;
    x *= 0xd6e8feb86659fd93U;
    x ^= x >> 32;
    x *= 0xd6e8feb86659fd93U;
    x ^= x >> 32;
    return x;
}
};

FNV

#define FNV_64_PRIME static_cast<uint64_t>(0x100000001b3ULL)
#define FNV1_64_INIT static_cast<uint64_t>(0xcbf29ce484222325ULL)

struct fnv_test_hash {
    std::size_t operator()( const point &k ) const noexcept {
        const uint32_t x = k.x;
        const uint32_t y = k.y;
        uint64_t hval = FNV1_64_INIT;
        hval ^= static_cast<uint64_t>( x & 0xFF );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( x >> 8 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( x >> 16 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( x >> 24 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( y & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( y >> 8 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( y >> 16 ) & 0xFFul );
        hval *= FNV_64_PRIME;
        hval ^= static_cast<uint64_t>( ( y >> 24 ) & 0xFFul );
        hval *= FNV_64_PRIME;

        return hval;
    }
};

@akrieger
Copy link
Member Author

akrieger commented Sep 5, 2024

Re-test confirms everything is just as 'fine' as before.

@Maleclypse Maleclypse merged commit 0d9a5dc into CleverRaven:master Sep 6, 2024
22 of 26 checks passed
@GuardianDll
Copy link
Member

Re: #76154, may i ask you to open PR to backport it? i don't feel myself skillful enough to backport it properly

@akrieger
Copy link
Member Author

akrieger commented Sep 6, 2024

Sure I can do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
astyled astyled PR, label is assigned by github actions BasicBuildPassed This PR builds correctly, label assigned by github actions [C++] Changes (can be) made in C++. Previously named `Code` Code: Performance Performance boosting code (CPU, memory, etc.) Code: Tests Measurement, self-control, statistics, balancing. json-styled JSON lint passed, label assigned by github actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants