Skip to content

Commit

Permalink
further clarifications
Browse files Browse the repository at this point in the history
  • Loading branch information
wjakob committed Apr 18, 2024
1 parent b30e0e7 commit d8e652c
Showing 1 changed file with 39 additions and 8 deletions.
47 changes: 39 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,14 +275,6 @@ int main() {
}
```

#### Element erasure

Besides the regular ``iterator erase(iterator)`` method also provided by STL
map and set types, ``robin_map`` and ``robin_set`` provide a ``void
erase_it(iterator)`` method that does _not_ return an iterator. Computing a
valid iterator following element erasure comes at a computational cost that can
be avoided with this method when the return value is not needed.

#### Serialization

The library provides an efficient way to serialize and deserialize a map or a set so that it can be saved to a file or send through the network.
Expand Down Expand Up @@ -485,6 +477,45 @@ int main() {
}
```

#### Performance pitfalls

Two potential performance pitfalls involving `tsl::robin_map` and
`tsl::robin_set` are noteworthy:

1. *Bad hashes*. Hash functions that produce many collisions can lead to the
following surprising behavior: when the number of collisions exceeds a
certain threshold, the hash table will automatically expand to fix the
problem. However, in degenerate cases, this expansion might have _no effect_
on the collision count, causing a failure mode where a linear sequence of
insertion leads to exponential storage growth.

This case has mainly been observed when using the default power-of-two
growth strategy with the default STL `std::hash<T>` for arithmetic types
`T`, which is often an identity! See issue
[#39](https://github.com/Tessil/robin-map/issues/39) for an example. The
solution is simple: use a better hash function and/or `tsl::robin_pg_set` /
`tsl::robin_pg_map`.

2. *Element erasure and low load factors*. `tsl::robin_map` and
`tsl::robin_set` mirror the STL map/set API, which exposes an `iterator
erase(iterator)` method that removes an element at a certain position,
returning a valid iterator that points to the next element.

Constructing this new iterator object requires walking to the next nonempty
bucket in the table, which can be a expensive operation when the hash table
has a low *load factor* (i.e., when `capacity()` is much larger then
`size()`).

The `erase()` method furthermore never shrinks & re-hashes the table as
this is not permitted by the specification of this function. A linear
sequence of random removals without intermediate insertions can then lead to
a degenerate case with quadratic runtime cost.

In such cases, an iterator return value is often not even needed, so the
cost is entirely unnecessary. Both `tsl::robin_set` and `tsl::robin_map`
therefore provide an alternative erasure method `void erase_it(iterator)`
that does not return an iterator to avoid having to find the next element.

### License

The code is licensed under the MIT license, see the [LICENSE file](LICENSE) for details.

0 comments on commit d8e652c

Please sign in to comment.