Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference : Make m_plugEdits key more unique #5506

Merged
merged 1 commit into from
Oct 25, 2023

Conversation

ericmehl
Copy link
Collaborator

This fixes a rare (about 1 in 1,000) test failure on Windows of GafferTest.ReferenceTest.testAddingChildPlugs. Though it works, there are a few things I'm not certain of :

  1. Is there maybe a better solution? I guess this isn't a terribly performance-critical section and a smart pointer for storage isn't all that much of a burden.
  2. Shouldn't this not be an issue because if a node is getting released and it's memory address is up for grabs, shouldn't it also have been removed from the m_plugEdits map?
  3. Is it worth the performance hit to test script.load() and assertExpectedChildren() in a loop for something like 1000 iterations? On my workstation this takes about 15 seconds.
  4. Does this deserve a mention in changes.md? If so, I assume it's a bug fix but what bug is being fixed? If it triggered in real use, it would attempt to set a value on a plug that shouldn't be set?

Thanks to @johnhaddon for pointing me in the direction of the pointer address reuse!

Checklist

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have tested my change(s) in the test suite, and added new test cases where necessary.
  • My code follows the Gaffer project's prevailing coding style and conventions.

@johnhaddon
Copy link
Member

Shouldn't this not be an issue because if a node is getting released and it's memory address is up for grabs, shouldn't it also have been removed from the m_plugEdits map?

This is a good question, and I think we need to answer it before considering the question of whether or not there might be a better solution. Could you remind me of the exact output from the test failure so I can try to wrap my head around it please?

@ericmehl
Copy link
Collaborator Author

The full error is

======================================================================
FAIL: testAddingChildPlugs (GafferTest.ReferenceTest.ReferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "K:\build\gaffer-1.3\python\GafferTest\ReferenceTest.py", line 1775, in testAddingChildPlugs
    assertExpectedChildren( script["reference"] )
  File "K:\build\gaffer-1.3\python\GafferTest\ReferenceTest.py", line 1754, in assertExpectedChildren
    self.assertFalse( reference.isChildEdit( reference["compoundData"]["m1"]["value"] ) )
AssertionError: True is not false

What I found while debugging is that https://github.com/ericmehl/gaffer/blob/cc7c02b15070c373fd16a17eac8d567bd9e9eaed/src/Gaffer/Reference.cpp#L330 is finding an entry for plug (m1.value) when it shouldn't.

But I also did a check to see if plug->getName() == "value" && plug->parent()->getName() == "m1" just above https://github.com/ericmehl/gaffer/blob/cc7c02b15070c373fd16a17eac8d567bd9e9eaed/src/Gaffer/Reference.cpp#L348 and that condition never triggered. So I was trying to figure out how m1.value plug was being inserted into m_plugEdits when we talked and you suggested the memory address reuse idea.

@johnhaddon
Copy link
Member

But I also did a check to see if plug->getName() == "value" && plug->parent()->getName() == "m1" and that condition never triggered

That makese sense - isChildEdit() is calling plugEdit() with the parent plug, not the child plug itself. So I'd expect plug->getName() to be "m1".

@johnhaddon
Copy link
Member

I think the fundamental issue here is that we're using childRemoved() to erase old entries from m_plugEdits, but that only works for children parented directly to the Reference. Storing smart pointers kindof makes that worse, because then not only do we have nonsense entries hanging around, they're also keeping zombie plugs alive unnecessarily. Not sure what the solution is yet...

@ericmehl
Copy link
Collaborator Author

Ah right, I did test against plug->getName() == "m1" as well after realizing that. I just double checked and still don't get that to trigger.

Just to be clear, I'm doing

if( plug->getName() == "m1" )
{
    std::cerr << "plug->getName() == \"m1\"\n";
}

immediately above https://github.com/ericmehl/gaffer/blob/cc7c02b15070c373fd16a17eac8d567bd9e9eaed/src/Gaffer/Reference.cpp#L348

That never outputs that text or triggers a breakpoint on the std::cerr line, it just triggers the test failure.

@johnhaddon
Copy link
Member

immediately above #L348

That's the line where we create a new entry if there's not one there already. The problem in this case is that there is an entry already, created earlier when there was a different plug (with a different name) at that address. I think you want to be putting that check inside the if( it != m_plugEdits.end() ) block higher up in that function.

@ericmehl
Copy link
Collaborator Author

Right, putting it in the if( it != m_plugEdits.end() ) was my original check, apologies for skipping over that in my discussion here. It happily finds m1, and only when the test is about to fail. Most runs it never finds m1 as expected.

The test farther down, above line 348, was my attempt to figure out what was adding the plug to the map. That's where I was scratching my head wondering how it ended up in the map without ever being added.

@johnhaddon
Copy link
Member

That's where I was scratching my head wondering how it ended up in the map without ever being added.

It won't necessarily have been something called "m1" that was added to the map - it'll have been some other plug that once had the address that "m1" now has, but was destroyed without being removed from m_plugEdits. As I noted above, we're only removing top-level plugs from m_plugEdits correctly.

@ericmehl
Copy link
Collaborator Author

I think the fundamental issue here is that we're using childRemoved() to erase old entries from m_plugEdits, but that only works for children parented directly to the Reference

Based on that I added

for( Plug::RecursiveIterator it( plug ); !it.done(); ++it )
{
	m_plugEdits.erase( it->get() );
}

to childRemoved() ( https://github.com/ericmehl/gaffer/blob/cc7c02b15070c373fd16a17eac8d567bd9e9eaed/src/Gaffer/Reference.cpp#L390-L399 ) and it successfully fixes the test failure.

It also worked with just Iterator instead of RecursiveIterator. I suppose that may be more efficient but isn't quite as correct as iterating over all descendants.

@johnhaddon
Copy link
Member

That RecursiveIterator approach sounds good - could you update the PR with it please?

Previously, we were only removing edit entries for the immediate
children of a Reference. This could leave stale edits in the edit
cache for plugs at the grandchild level and deeper. Usually that does
not cause a problem because `m_plugEdits` is keyed using the plug
pointer address. But on Windows in particular, the address of a
destroyed plug was occasionally (something like 1 in 1000) being reused
for a new plug, causing a false positive lookup in `m_plugEdits`.
@ericmehl
Copy link
Collaborator Author

Sure thing, that's pushed in commit 445a9d1.

@johnhaddon johnhaddon merged commit cacbae1 into GafferHQ:1.3_maintenance Oct 25, 2023
4 checks passed
@ericmehl ericmehl deleted the referenceTestFix branch October 27, 2023 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants