Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskCollaboration improvements for review #5493

Merged

Conversation

johnhaddon
Copy link
Member

Following on from #5489, this presents a solution to issue #4978, in a form ready for review. As discussed in the meeting today, the question of whether to use task_arena or isolated_task_group is still an open one, with each working better in some scenarios, and in truth, both being a poor fit for what we need. But that shouldn't prevent a thorough review of the rest of the collaboration framework, which is largely independent of the exact method used to launch tasks.

Copy link
Contributor

@danieldresser-ie danieldresser-ie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now taken a look through all this, I haven't been able to spot anything wrong.

It sounds like you either need to back out the isolated task group stuff though ( or do the slightly goofy but probably effective "only use isolated task group 90% of the time )?

# / | \
# 9 10 11
# \ | /
# 12 (for width 3 and depth 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per offline conversation, I had originally just thought this diagram was upside down, but you corrected me that the numbers just go in a different order in this test.

At a basic level, it might help to label the root computation somehow ... since I don't think there is any global convention for whether the root goes at the top or bottom.

And actually, this is a bit more confusing, because as far as I can tell, the root computation is implicit, not given an explicit key? I can't see where testCollaborationTaskDistribution creates a root named "0", or where testFanOutGatherPerformance creates a root named "12" - it seems like the dependency dict starts one level away? So it's a bit deceptive to label is as "12" in one case, and "0" in the other? Could it just be in the diagram?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a basic level, it might help to label the root computation somehow ... since I don't think there is any global convention for whether the root goes at the top or bottom.

I was just following the Gaffer convention, where pretty much all graphs flow top-to-bottom, but I've pushed a fixup containing a comment to make this explicit : 91ee00c.

I can't see where testCollaborationTaskDistribution creates a root named "0", or where testFanOutGatherPerformance creates a root named "12" - it seems like the dependency dict starts one level away?

In all cases, the root process is passed as an argument to runTestProcess().

return result;
}

void runTestProcess( const Plug *plug, int expectedResult, dict dependenciesDict )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a name, GafferTest.runTestProcess is about the least specific name I can think of ... maybe not a problem, but if I saw this name in isolation, I would have no idea what it's used for.

Copy link
Member Author

@johnhaddon johnhaddon Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions for an improvement? It definitely does run a TestProcess, so I guess the problem is that it could be running some non-specific "test process". Would the solution be to rename TestProcess to something else? We do have a load of TestFoo classes which are instances of Foo designed only for testing Foos though, so the name seems reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the reason it feels odd to me, is that even aside from "Process" being a generic name, this feels like it's for testing a specific aspect of Processes, and it's running a network of Processes, not a single one.

Something like runTestDependencyProcesses or something would feel clearer to me, but I don't feel strongly if you're happy with this name.

}
else
{
result = Process::acquireCollaborativeResult<HashProcess>( cacheKey, p, plug, computeNode );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks correct, but for some reason it felt like I had to read it more times than I should have to follow it.

I think I would find this fair bit clearer ( though maybe the duplicated line is bad? ):

if( forceMonitoring )
{
    return Process::acquireCollaborativeResult<HashProcess>( cacheKey, p, plug, computeNode );
}

std::optional<IECore::MurmurHash> cachedValue = g_cache.getIfCached( cacheKey );
if( cachedValue )
{
    return *cachedValue;
}
else
{
    return Process::acquireCollaborativeResult<HashProcess>( cacheKey, p, plug, computeNode );
}

( This might be totally subjective, feel free to ignore it ... just stuck out to me as something I was tripping over )

I do appreciate more you wanting to just rip out the forceMonitoring stuff now ... though I do still feel like it has enough potential usefulness that it might be premature to do that now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this does feel a bit fiddly to me too. I don't think your suggestion quite works as-is though because we don't want to return but rather assign result = . So we'd require another level of nesting with the cache lookup in an else block, which would reduce its clarity a bit.

I did do what I did fairly consciously because I noticed a pre-existing bug where we were failing to perform any kind of TaskCollaboration when forcing monitoring. So I felt the slightly more awkward construction was worth it to preclude the possibility of that sort of thing happening again.

So on balance, I'd like to keep this as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do appreciate more you wanting to just rip out the forceMonitoring stuff now

It's not the forceMonitoring stuff that I want to rip out - that's essential. What I'm fairly keen to rip out is the HashCacheMode stuff, but I thought that should at least wait until a new major version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. It makes that this logic is actually a bit complex - it probably should actual give a bit more pause when reading it than most 2 if blocks.

Default,
/// Deprecated synonym for Default. Will be removed in a future
/// release.
Legacy = Default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default seems like a reasonable name for how things currently function ... if we have better support for collaborations to join forward with processes that are currently working in the future ... then it's a bit unclear whether Default would be the best name, rather than something more explicit like "Lightweight"?

It feels bit weird to suggest that "unlikely to required from multiple threads concurrently" would be true by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I do quite like the idea of something like Lightweight, but for now I was trying to stick more closely to the existing names, which are about what the cache does, rather than being explicitly about what the process does. If I can get this alternative scheduler off the ground then I think that might be the time for a wholesale renaming of the options.

It feels bit weird to suggest that "unlikely to required from multiple threads concurrently" would be true by default?

It is the assumption we make though, so I think it should be advertised?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not trying to anticipate future changes, these names definitely look fine for now

@johnhaddon
Copy link
Member Author

It sounds like you either need to back out the isolated task group stuff though ( or do the slightly goofy but probably effective "only use isolated task group 90% of the time )?

I've backed it out for now - I reverted it in 2c845a4 so that we still have the history for it if we want to revisit later. It might be worth revisting with the 10% trick, but the next step for me is to do some performance testing on production scenes, and for that I think it'll be nice to know that I'm just measuring any overhead of the new acquireCollaborativeResult() stuff and not doing an apples-to-oranges comparison of task_arena to isolated_task_group.

@johnhaddon johnhaddon force-pushed the improvedCollaborationPR branch from 2c845a4 to e0bb9d6 Compare October 23, 2023 08:18
@johnhaddon
Copy link
Member Author

Rebased to fix merge conflict - everything else is the same.

@murraystevenson
Copy link
Contributor

A variety of tests of the current state of this PR run on a random selection of IE production shots hasn't raised any red flags, so it feels like we're on the path to getting this merged... 🚀

@johnhaddon johnhaddon force-pushed the improvedCollaborationPR branch from e0bb9d6 to 684a5b9 Compare October 25, 2023 17:11
This will be used to replace the TaskMutex collaboration mechanism currently used via the LRUCaches in ValuePlug. The underlying collaboration mechanism is still a `task_arena/task_group` combo, but this arrangement has a few key benefits :

- Moving the collaboration out from the guts of the LRUCache gives us much more control over how it is performed, particularly for collaborating threads.
- We are now tracking interprocess dependencies completely ourselves, so we are no longer vulnerable to the deadlocks that TaskMutex couldn't prevent.
- We are no longer using `task_arena_observer`, which gives us much greater freedom to change implementation in future.
All these tests would hang if ValuePlug wasn't now using `Process::acquireCollaborativeResult()`.
Our rationale is as follows :

- Standard has no right to exist - if a compute is slow then it should be multithreaded and TaskCollaboration should be used.
- TaskIsolation has no right to exist either. If a compute is heavy enough to use TBB then it's essential that we can collaborate on it. And since we're no longer using a `task_arena` to perform collaboration, we have less concern about potential overhead.
- Legacy is actually a good policy for lightweight computes. True, it can mean redundant identical computes being performed concurrently. But this is better than making one thread wait for the other because it allows both threads to collaborate on an upstream TaskCollaboration.
- Changing the behaviour of the Standard policy would be a breaking change, so Default is introduced as the new "blessed" name for Legacy.
This also disables the test for `Standard` and `TaskIsolation` policies because they are now the same as `TaskCollaboration`. This does highlight the fact that we now have the possibility of new stalls in the UI, but only for cache policies that aren't ideal anyway. We need to focus our energies in to fixing the shared cancellation for TaskCollaboration, and making as many processes as possible us TaskCollaboration.
This provides some substantial performance improvements :

```
- testFanOutPerformance (GafferTest.CollectTest.CollectTest) : was 0.83s now 0.44s (47% reduction)
- testLoop (GafferTest.CollectTest.CollectTest) : was 1.64s now 0.09s (95% reduction)
- testDeepTreePerformance (GafferTest.ProcessTest.ProcessTest) : was 15.86s now 0.04s (100% reduction)
```

Full disclosure : it also shows a roughly 30% increase in runtime for `ProcessTest.testCollaborationTransitionPerformance`, and I'm not sure why. But that's small beans compared to the orders of magnitude in `testLoop` and `testDeepTreePerformance`.

Note that `isolated_task_group` is actually a TBB preview feature, so technically could be taken away from us in the future. But it is still present in the latest TBB release (2021.10.0) and is still in the latest source code at
https://github.com/oneapi-src/oneTBB/blob/master/include/oneapi/tbb/task_group.h. It is also [used in Embree](https://github.com/embree/embree/blob/master/man/man3/rtcCommitScene.4embree4#L34-L38) where the documentation cautions against the alternative of using a `task_arena` "due to its high runtime overhead". So I think we should be OK. If it ever disappears, alternatives include :

- Rolling our own, which should be doable provided that TBB continues to expose `isolate_within_arena()`, or might also be doable using public API by constructing a regular `task_group` from inside `this_task_arena::isolate()`.
- Using `tbb::collaborative_call_once()`, which is not available in our current TBB version but is available in future versions.
This was a poor man's deadlock-avoidance scheme which is no longer needed now that ValuePlug is using `Process:acquireCollaborativeResult()`.
Now that WorkerRead recursion has been removed, writability is determined entirely by AquireMode, so we can replace conditionals with asserts.
@johnhaddon johnhaddon force-pushed the improvedCollaborationPR branch from 684a5b9 to a62f892 Compare October 25, 2023 17:12
@johnhaddon
Copy link
Member Author

Rebased to squash fixups in and get Changes.md in order. Should be ready to merge.

@johnhaddon johnhaddon merged commit 1b156f8 into GafferHQ:1.3_maintenance Oct 27, 2023
4 checks passed
@johnhaddon johnhaddon deleted the improvedCollaborationPR branch November 8, 2023 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants