Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe it would make sense for the self-insertion/emplacement to work (suggestion) #25

Open
Predelnik opened this issue Oct 1, 2019 · 5 comments

Comments

@Predelnik
Copy link

Currently the following code leads to crash/undefined behavior with tsl::robin_map:

int main ()
{
  tsl::robin_map<int, std::string> v;
  v[0] = "23";
  for (int i = 2; i < 3300; ++i)
    v[i] = v[0];
}

It happens because we pass the reference to our own element which gets destroyed during reallocation and since construction happens late it leads to the unfortunate results. I understand that this might be a design decision to treat it as UB however I have the following reason for suggesting to fix this:

  1. It works fine with node-based (std::) maps.

  2. It works with std::vector. I can't quote standard paragraph exactly but here's link to the statement from Microsoft of fixing this problem in their implementation https://devblogs.microsoft.com/cppblog/stl-fixes-in-vs-2017-rtm/ (Fixed aliasing bugs)

  3. It actually seems to be much more frequent operation for unordered map rather than for vector, simply copying a past value to the newly created field feels like natural operation in many cases.

  4. A fix seems to be the same as for vector since it is the structure robin map built upon so should not be that hard to implement.

@Tessil
Copy link
Owner

Tessil commented Oct 1, 2019

Hi,

Thank you for the suggestion.

It's effectively an undefined behaviour. As specified in the documentation, any operation modifying the hash map may invalidate the iterators and references to the objects in the hash map. The v[i] operation may thus invalidate the v[0] parameter that it gets by reference.

I didn't know that v.push_back(v[0]) with an std::vector was allowed when v.capacity() == v.size(), I thought it was an undefined behaviour.

I have to see how this can be done as there are two problems:

  • A rehash may occur invalidating all the references to the elements in the hash map
  • A element in the hash map may change of place (and thus invalidating any reference to the element) due to the robin hood hashing mechanism. The inserted element may "steal" the bucket of an element and we then have to move the element that was in the bucket further away.

I have to think a bit if it can be done and how this can be done efficiently and come back to you in a couple of days.

For the second point I can just check if the element in the bucket I'm stealing hasn't the same address as the one I want to insert.

@brenoguim
Copy link

You can see the fix in the link:

Fixed aliasing bugs. For example, the Standard permits v.emplace_back(v[0]), which we were mishandling at runtime, and v.push_back(v[0]), which we were guarding against with deficient code (asking “does this object live within our memory block?” doesn’t work in general). The fix involves performing our actions in a careful order, so we don’t invalidate whatever we’ve been given. Occasionally, to defend against aliasing, we must construct an element on the stack, which we do only when there’s no other choice (e.g. emplace(), with sufficient capacity, not at the end).

So, if you know invalidation is going to happen, construct the element in the stack first, do the invalidation, then move the object from the stack into place.

@Tessil
Copy link
Owner

Tessil commented Oct 6, 2019

Thank you, I checked a bit further the issue and it seems to be effectively possible without too much troubles.

I'll be a bit busy the coming week but I'll check to implement the fix when I can find some time.

@Tessil
Copy link
Owner

Tessil commented Oct 27, 2019

Hello,

Working a bit on the issue I realized it's not really possible for the following code to work with an open-adressing implementation:

int main ()
{
  tsl::robin_map<int, std::string> v;
  v[0] = "23";
  for (int i = 2; i < 3300; ++i)
    v[i] = v[0];
}

In the loop, the v[0] expression is executed first and returns a reference to an element inside the bucket array. Afterwards the v[i] expression is executed which may incur a rehash of the bucket array without being aware of v[0]. If it happens, the previous reference we got with v[0] is no more valid and the assignment operation will be undefined.

The best that can be done is to make the following code valid by being careful on rehash:

int main ()
{
  tsl::robin_map<int, std::string> v;
  v[0] = "23";
  for (int i = 2; i < 3300; ++i)
    v.try_emplace(i, v[0]);
}

@Predelnik
Copy link
Author

Sorry for late response but I do think that v.try_emplace(i, v[0]); is enough actually, since it will be sort of analogous to v.push_back(v[0]) for vector. While original case might still lead to surprising results for some people, I guess it's similar to possible troubles due to instability of references in case of robin_map versus standard maps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants