Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC for creation and use of NUMA-constrained arenas #1559

Open
wants to merge 2 commits into
base: dev/vossmjp/rfc_numa_support
Choose a base branch
from

Conversation

aleksei-fedotov
Copy link
Contributor

Description

Add sub-RFC to #1535 for creation and use of NUMA-constrained arenas.

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information


Usually the users of oneTBB employ this technique to tie oneTBB worker threads
up within NUMA nodes and yet have all the parallelism of a platform utilized.
The pattern allows to find out how many NUMA nodes are on the system. With that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The pattern allows to find out how many NUMA nodes are on the system. With that
The pattern starts by finding the number of NUMA nodes on the system. With that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

Usually the users of oneTBB employ this technique to tie oneTBB worker threads
up within NUMA nodes and yet have all the parallelism of a platform utilized.
The pattern allows to find out how many NUMA nodes are on the system. With that
number user creates that many ~tbb::task_arena~ objects, constraining each to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
number user creates that many ~tbb::task_arena~ objects, constraining each to a
number, user creates that many ~tbb::task_arena~ objects, constraining each to a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

The example above requires new class named ~tbb::constrained_task_arena~. On one
hand, it is a ~tbb::task_arena~ class that isolates the work execution from
other parallel stuff executed by oneTBB. On the other hand, it is a constrained
arena that represents an arena associated to a certain NUMA node and allows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be additional functions added to create arenas constrained to core type too? Or will this be exclusively for NUMA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to decide on this whether it is needed right away or can be added later in future RFCs. Initially, I wanted writing RFC that addresses the specific concern about verbose and error-prone API for creation of NUMA-constrained arenas.

in the previous bullet, but since it is a synchronization point, usually the
blocking call is used.

The proposal below addresses these issues.
Copy link
Contributor

@akukanov akukanov Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer it not to address all these issues :)

Specifically, I believe that [4] and [5] are really orthogonal to NUMA, and need to be resolved for task_arena in general rather than only for its derived class. And even for this subclass I am not sure that "hiding" a task_group inside is the right thing to do.

Keeping task_group and task_arena separate and merely simplifying the combined use of those with some "syntax sugar" allows creating independent "work queues" in the same arena. Also we can think if a similar approach with a flow graph instead of task_group might be useful. Of course all that can be implemented with the hidden task_group, but likely at the expense of some overhead.

And it seems that if task_group is kept separate and e.g. task_arena::wait(task_group&) is added, the need for a derived class diminishes if not disappears.

Copy link
Contributor

@akukanov akukanov Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code would be in between of the "verbose" and "concise" variants showed in the document, like this:

std::vector<tbb::task_arena> numa_arenas =
    tbb::initialize_constrained_arenas(/*maybe some arguments*/);
std::vector<tbb::task_group> task_groups(numa_arenas.size());

for(unsigned j = 0; j < numa_arenas.size(); j++) {
    numa_arenas[j].enqueue( (){/*some parallel stuff*/}, task_groups[j] );
}

for(unsigned j = 0; j < numa_arenas.size(); j++) {
    numa_arenas[j].wait( task_groups[j] );
}

or, with modern C++ (C++23 is required for std::views::zip), like this:

std::vector<tbb::task_arena> numa_arenas =
    tbb::initialize_constrained_arenas(/*maybe some arguments*/);
std::vector<tbb::task_group> task_groups(numa_arenas.size());

for(auto& [arena, tg]: std::views::zip(numa_arenas, task_groups)) {
    arena.enqueue( (){/*some parallel stuff*/}, tg );
}

for(auto& [arena, tg]: std::views::zip(numa_arenas, task_groups)) {
    arena.wait( tg );
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that coupling of separate entities is rather a bad thing. Here we would like to improve usability of current interfaces without sacrificing their flexibility. Then the proposal boils down to:

  • Introduce the interface that would simplify creation of arenas, each bind to its own NUMA node.
  • Introduce the interface that would allow to avoid common mistakes related to loading with work of such arenas.

Copy link
Contributor

@akukanov akukanov Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections to these two goals.

But I believe the second goal can be achieved without introducing a new class, by extending the existing methods of task_arena to better integrate with task_groups. That would potentially be useful beyond NUMA scenarios and would give users more control with rather small increase in code complexity.

At the very least, it's an alternative to mention.

Updated: also regarding this:

Here we would like to improve usability of current interfaces without sacrificing their flexibility.

In fact, the proposal introduces a new arena interface with reduced flexibility :)

@aleksei-fedotov aleksei-fedotov force-pushed the dev/aleksei-fedotov/numa-constrained-arenas-rfc branch from c25e7b1 to 3a2f55b Compare November 14, 2024 09:45
Comment on lines +46 to +49
- [2] - Separate step for instantiation the same number of ~tbb::task_group~
objects, in which the actual work is going to be submitted. Note that user
also needs to make sure the size of ~arenas~ matches the size of
~task_groups~.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the second sentence sounds like a rephrase of the first one, without new information or argumentation. I mean, I see no difference between "the same number of tbb::task_group objects" and "the size of arenas matches the size of task_groups"

Comment on lines +51 to +52
nodes. Note that user needs to make sure the indices of ~tbb::task_arena~
objects match corresponding indices of NUMA nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point about the need to match the indices is kind of strange. A single loop that works with several arrays/vectors is a typical pattern, you just use the loop index consistently. Moreover, with modern C++ you can rewrite the loop to not have any indices at all, e.g.

std::vector<tbb::numa_node_id> numa_indexes = tbb::info::numa_nodes();
std::vector<tbb::task_arena> arenas;      // note that the size is not set
std::vector<tbb::task_group> task_groups; // same for task groups

for (auto idx: numa_indexes) {
    arenas.emplace_back( tbb::task_arena::constraints(idx) );
    task_groups.emplace_back();
    arenas.back().execute([&tg = task_groups.back()]{
        tg.run([]{/*some parallel stuff*/});
    });
}

If you meant something else, perhaps try explaining it better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants