TaskCollaboration improvements for review #5493

johnhaddon · 2023-10-10T17:27:16Z

Following on from #5489, this presents a solution to issue #4978, in a form ready for review. As discussed in the meeting today, the question of whether to use task_arena or isolated_task_group is still an open one, with each working better in some scenarios, and in truth, both being a poor fit for what we need. But that shouldn't prevent a thorough review of the rest of the collaboration framework, which is largely independent of the exact method used to launch tasks.

danieldresser-ie

I've now taken a look through all this, I haven't been able to spot anything wrong.

It sounds like you either need to back out the isolated task group stuff though ( or do the slightly goofy but probably effective "only use isolated task group 90% of the time )?

danieldresser-ie · 2023-10-13T06:40:49Z

python/GafferTest/ProcessTest.py

+		#     / | \
+		#    9 10 11
+		#     \ | /
+		#      12        (for width 3 and depth 3)


As per offline conversation, I had originally just thought this diagram was upside down, but you corrected me that the numbers just go in a different order in this test.

At a basic level, it might help to label the root computation somehow ... since I don't think there is any global convention for whether the root goes at the top or bottom.

And actually, this is a bit more confusing, because as far as I can tell, the root computation is implicit, not given an explicit key? I can't see where testCollaborationTaskDistribution creates a root named "0", or where testFanOutGatherPerformance creates a root named "12" - it seems like the dependency dict starts one level away? So it's a bit deceptive to label is as "12" in one case, and "0" in the other? Could it just be in the diagram?

At a basic level, it might help to label the root computation somehow ... since I don't think there is any global convention for whether the root goes at the top or bottom.

I was just following the Gaffer convention, where pretty much all graphs flow top-to-bottom, but I've pushed a fixup containing a comment to make this explicit : 91ee00c.

I can't see where testCollaborationTaskDistribution creates a root named "0", or where testFanOutGatherPerformance creates a root named "12" - it seems like the dependency dict starts one level away?

In all cases, the root process is passed as an argument to runTestProcess().

danieldresser-ie · 2023-10-16T04:07:27Z

src/GafferTestModule/ProcessTest.cpp

+	return result;
+}
+
+void runTestProcess( const Plug *plug, int expectedResult, dict dependenciesDict )


As a name, GafferTest.runTestProcess is about the least specific name I can think of ... maybe not a problem, but if I saw this name in isolation, I would have no idea what it's used for.

Do you have any suggestions for an improvement? It definitely does run a TestProcess, so I guess the problem is that it could be running some non-specific "test process". Would the solution be to rename TestProcess to something else? We do have a load of TestFoo classes which are instances of Foo designed only for testing Foos though, so the name seems reasonable to me.

I guess the reason it feels odd to me, is that even aside from "Process" being a generic name, this feels like it's for testing a specific aspect of Processes, and it's running a network of Processes, not a single one.

Something like runTestDependencyProcesses or something would feel clearer to me, but I don't feel strongly if you're happy with this name.

danieldresser-ie · 2023-10-16T04:29:24Z

src/Gaffer/ValuePlug.cpp

+					}
+					else
+					{
+						result = Process::acquireCollaborativeResult<HashProcess>( cacheKey, p, plug, computeNode );


This logic looks correct, but for some reason it felt like I had to read it more times than I should have to follow it.

I think I would find this fair bit clearer ( though maybe the duplicated line is bad? ):

if( forceMonitoring ) { return Process::acquireCollaborativeResult<HashProcess>( cacheKey, p, plug, computeNode ); } std::optional<IECore::MurmurHash> cachedValue = g_cache.getIfCached( cacheKey ); if( cachedValue ) { return *cachedValue; } else { return Process::acquireCollaborativeResult<HashProcess>( cacheKey, p, plug, computeNode ); }

( This might be totally subjective, feel free to ignore it ... just stuck out to me as something I was tripping over )

I do appreciate more you wanting to just rip out the forceMonitoring stuff now ... though I do still feel like it has enough potential usefulness that it might be premature to do that now.

Yeah, this does feel a bit fiddly to me too. I don't think your suggestion quite works as-is though because we don't want to return but rather assign result = . So we'd require another level of nesting with the cache lookup in an else block, which would reduce its clarity a bit.

I did do what I did fairly consciously because I noticed a pre-existing bug where we were failing to perform any kind of TaskCollaboration when forcing monitoring. So I felt the slightly more awkward construction was worth it to preclude the possibility of that sort of thing happening again.

So on balance, I'd like to keep this as it is.

I do appreciate more you wanting to just rip out the forceMonitoring stuff now

It's not the forceMonitoring stuff that I want to rip out - that's essential. What I'm fairly keen to rip out is the HashCacheMode stuff, but I thought that should at least wait until a new major version.

Good points. It makes that this logic is actually a bit complex - it probably should actual give a bit more pause when reading it than most 2 if blocks.

danieldresser-ie · 2023-10-16T04:53:49Z

include/Gaffer/ValuePlug.h

+			Default,
+			/// Deprecated synonym for Default. Will be removed in a future
+			/// release.
+			Legacy = Default


Default seems like a reasonable name for how things currently function ... if we have better support for collaborations to join forward with processes that are currently working in the future ... then it's a bit unclear whether Default would be the best name, rather than something more explicit like "Lightweight"?

It feels bit weird to suggest that "unlikely to required from multiple threads concurrently" would be true by default?

Yeah, I do quite like the idea of something like Lightweight, but for now I was trying to stick more closely to the existing names, which are about what the cache does, rather than being explicitly about what the process does. If I can get this alternative scheduler off the ground then I think that might be the time for a wholesale renaming of the options.

It feels bit weird to suggest that "unlikely to required from multiple threads concurrently" would be true by default?

It is the assumption we make though, so I think it should be advertised?

If we're not trying to anticipate future changes, these names definitely look fine for now

johnhaddon · 2023-10-17T12:10:18Z

It sounds like you either need to back out the isolated task group stuff though ( or do the slightly goofy but probably effective "only use isolated task group 90% of the time )?

I've backed it out for now - I reverted it in 2c845a4 so that we still have the history for it if we want to revisit later. It might be worth revisting with the 10% trick, but the next step for me is to do some performance testing on production scenes, and for that I think it'll be nice to know that I'm just measuring any overhead of the new acquireCollaborativeResult() stuff and not doing an apples-to-oranges comparison of task_arena to isolated_task_group.

johnhaddon · 2023-10-23T08:20:08Z

Rebased to fix merge conflict - everything else is the same.

murraystevenson · 2023-10-25T06:28:53Z

A variety of tests of the current state of this PR run on a random selection of IE production shots hasn't raised any red flags, so it feels like we're on the path to getting this merged... 🚀

This will be used to replace the TaskMutex collaboration mechanism currently used via the LRUCaches in ValuePlug. The underlying collaboration mechanism is still a `task_arena/task_group` combo, but this arrangement has a few key benefits : - Moving the collaboration out from the guts of the LRUCache gives us much more control over how it is performed, particularly for collaborating threads. - We are now tracking interprocess dependencies completely ourselves, so we are no longer vulnerable to the deadlocks that TaskMutex couldn't prevent. - We are no longer using `task_arena_observer`, which gives us much greater freedom to change implementation in future.

Fixes GafferHQ#4978

All these tests would hang if ValuePlug wasn't now using `Process::acquireCollaborativeResult()`.

Our rationale is as follows : - Standard has no right to exist - if a compute is slow then it should be multithreaded and TaskCollaboration should be used. - TaskIsolation has no right to exist either. If a compute is heavy enough to use TBB then it's essential that we can collaborate on it. And since we're no longer using a `task_arena` to perform collaboration, we have less concern about potential overhead. - Legacy is actually a good policy for lightweight computes. True, it can mean redundant identical computes being performed concurrently. But this is better than making one thread wait for the other because it allows both threads to collaborate on an upstream TaskCollaboration. - Changing the behaviour of the Standard policy would be a breaking change, so Default is introduced as the new "blessed" name for Legacy.

This also disables the test for `Standard` and `TaskIsolation` policies because they are now the same as `TaskCollaboration`. This does highlight the fact that we now have the possibility of new stalls in the UI, but only for cache policies that aren't ideal anyway. We need to focus our energies in to fixing the shared cancellation for TaskCollaboration, and making as many processes as possible us TaskCollaboration.

This provides some substantial performance improvements : ``` - testFanOutPerformance (GafferTest.CollectTest.CollectTest) : was 0.83s now 0.44s (47% reduction) - testLoop (GafferTest.CollectTest.CollectTest) : was 1.64s now 0.09s (95% reduction) - testDeepTreePerformance (GafferTest.ProcessTest.ProcessTest) : was 15.86s now 0.04s (100% reduction) ``` Full disclosure : it also shows a roughly 30% increase in runtime for `ProcessTest.testCollaborationTransitionPerformance`, and I'm not sure why. But that's small beans compared to the orders of magnitude in `testLoop` and `testDeepTreePerformance`. Note that `isolated_task_group` is actually a TBB preview feature, so technically could be taken away from us in the future. But it is still present in the latest TBB release (2021.10.0) and is still in the latest source code at https://github.com/oneapi-src/oneTBB/blob/master/include/oneapi/tbb/task_group.h. It is also [used in Embree](https://github.com/embree/embree/blob/master/man/man3/rtcCommitScene.4embree4#L34-L38) where the documentation cautions against the alternative of using a `task_arena` "due to its high runtime overhead". So I think we should be OK. If it ever disappears, alternatives include : - Rolling our own, which should be doable provided that TBB continues to expose `isolate_within_arena()`, or might also be doable using public API by constructing a regular `task_group` from inside `this_task_arena::isolate()`. - Using `tbb::collaborative_call_once()`, which is not available in our current TBB version but is available in future versions.

This was a poor man's deadlock-avoidance scheme which is no longer needed now that ValuePlug is using `Process:acquireCollaborativeResult()`.

Now that WorkerRead recursion has been removed, writability is determined entirely by AquireMode, so we can replace conditionals with asserts.

This reverts commit 148291d.

johnhaddon · 2023-10-25T17:13:09Z

Rebased to squash fixups in and get Changes.md in order. Should be ready to merge.

johnhaddon requested a review from danieldresser-ie October 10, 2023 17:27

johnhaddon self-assigned this Oct 10, 2023

johnhaddon mentioned this pull request Oct 10, 2023

TaskCollaboration improvements #5489

Closed

danieldresser-ie reviewed Oct 16, 2023

View reviewed changes

johnhaddon force-pushed the improvedCollaborationPR branch from 2c845a4 to e0bb9d6 Compare October 23, 2023 08:18

ComputeNodeTest : Test exception handling for all cache policies

14a2a1d

johnhaddon force-pushed the improvedCollaborationPR branch from e0bb9d6 to 684a5b9 Compare October 25, 2023 17:11

johnhaddon added 10 commits October 25, 2023 18:12

ProcessTest : Add tests for Process::acquireCollaborativeResult()

9399899

ValuePlug : Use Process::acquireCollaborativeResult()

62ccbff

Fixes GafferHQ#4978

GafferTest, GafferSceneTest : Exercise hash aliasing deadlock avoidance

070085e

All these tests would hang if ValuePlug wasn't now using `Process::acquireCollaborativeResult()`.

TaskMutex : Remove WorkerRead lock type

34251d0

This was a poor man's deadlock-avoidance scheme which is no longer needed now that ValuePlug is using `Process:acquireCollaborativeResult()`.

LRUCache : Simplify handling of Handle::isWritable()

06dd220

Now that WorkerRead recursion has been removed, writability is determined entirely by AquireMode, so we can replace conditionals with asserts.

Revert "Process : Replace task_arena with isolated_task_group"

a62f892

This reverts commit 148291d.

johnhaddon force-pushed the improvedCollaborationPR branch from 684a5b9 to a62f892 Compare October 25, 2023 17:12

johnhaddon merged commit 1b156f8 into GafferHQ:1.3_maintenance Oct 27, 2023
4 checks passed

johnhaddon mentioned this pull request Nov 2, 2023

ValuePlug : Stop getValueInternal() ignoring precomputedHash #5522

Merged

johnhaddon deleted the improvedCollaborationPR branch November 8, 2023 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TaskCollaboration improvements for review #5493

TaskCollaboration improvements for review #5493

johnhaddon commented Oct 10, 2023

danieldresser-ie left a comment

danieldresser-ie Oct 13, 2023

johnhaddon Oct 17, 2023

danieldresser-ie Oct 16, 2023

johnhaddon Oct 17, 2023 •

edited

Loading

danieldresser-ie Oct 18, 2023

danieldresser-ie Oct 16, 2023

johnhaddon Oct 17, 2023

johnhaddon Oct 17, 2023

danieldresser-ie Oct 18, 2023

danieldresser-ie Oct 16, 2023

johnhaddon Oct 17, 2023

danieldresser-ie Oct 18, 2023

johnhaddon commented Oct 17, 2023

johnhaddon commented Oct 23, 2023

murraystevenson commented Oct 25, 2023

johnhaddon commented Oct 25, 2023

TaskCollaboration improvements for review #5493

TaskCollaboration improvements for review #5493

Conversation

johnhaddon commented Oct 10, 2023

danieldresser-ie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnhaddon Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnhaddon commented Oct 17, 2023

johnhaddon commented Oct 23, 2023

murraystevenson commented Oct 25, 2023

johnhaddon commented Oct 25, 2023

johnhaddon Oct 17, 2023 •

edited

Loading