Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NUMA support and optimizations #11871

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tlichwala
Copy link

Summary:
This pull request adds support for Non-Uniform Memory Access (NUMA) to Apache Traffic Server, enhancing performance on NUMA systems.

Key Changes:

  • Added CMake options to enable NUMA support and debugging.
  • Introduced new configuration options for NUMA optimizations.
  • Enhanced thread and memory management to be NUMA-aware.
  • Added RamCacheContainer for cache duplication across NUMA nodes.
  • Integrated NUMA debugging utilities.

Performance testing has shown increased throughput, reduced CPU usage, improved latency, decreased UPI bus load

These changes aim to optimize memory access patterns and reduce latency on NUMA systems.

To build the application with NUMA support, use the following CMake command:
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_MIMALLOC=ON -DENABLE_NUMA=ON -DENABLE_HWLOC=ON -Dmimalloc_DIR=""

 - Enable NUMA optimizations via CMake options
 - Introduce NUMA-aware thread assignment and memory allocation
 - Add NUMA debugging utilities
 - Enhance cache management for NUMA systems
@apache apache deleted a comment from ezelkow1 Nov 15, 2024
@zwoop
Copy link
Contributor

zwoop commented Nov 15, 2024

Nice! The PR uses the old Debug() statements, which needs to be changed to the new Dbg / Ctl features:

../src/iocore/net/Server.cc:247:3: error: use of undeclared identifier 'Debug'
  Debug("numa", "[Server::listen] Attempting to create socket with family: %d, type: %d, protocol: %d", addr.sa.sa_family,
  ^
../src/iocore/net/Server.cc:256:3: error: use of undeclared identifier 'Debug'
  Debug("numa", "[Server::listen] Attempting to set up fd for listen with non_blocking: %d, options: %d", non_blocking, opt);
  ^

@zwoop zwoop added this to the 10.1.0 milestone Nov 15, 2024
Copy link
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this PR! It looks really interesting. Just a few comments to get ci happy. Mainly migrating Debug->Dbg. Looking forward to trying this out!

{
return details::splitter<Cript::string_view>(input, delim);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this chunk, perhaps a merge conflict issue.

int my_thread_id = this_ethread()->id;
int my_numa_node = this_ethread()->get_numa_node();

Debug("numa_sequencer", "[NUMASequencer] Thread %d (NUMA node %d) entered run_sequential.", my_thread_id, my_numa_node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug has been removed, please use the Dbg macro instead.

if (opt.f_mptcp) {
Dbg(dbg_ctl_connection, "Define socket with MPTCP");
prot = IPPROTO_MPTCP;
}

// Create the socket
Debug("numa", "[Server::listen] Attempting to create socket with family: %d, type: %d, protocol: %d", addr.sa.sa_family,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Debug lines were maybe for your debug? Could they be removed?

}

RamCache *
RamCacheContainer::get_cache(unsigned int my_node, unsigned int node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RamCacheContainer::get_cache(unsigned int my_node, unsigned int node)
RamCacheContainer::get_cache(unsigned int /* my_node ATS_UNUSED */, unsigned int node)


// returns true if consistent
static bool
check_pages_consistency(void *data, size_t size, const char *name = "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unused unless NUMA_CONSISTENCY_CHECK is defined. Please add precompile check here.

}

static void
move_pages_to_current_numa_zone(void *data, size_t size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unused


// If all threads have been added (assuming their number is equal to eventProcessor.net_threads), sort the thread IDs and set
// ready_to_run to true
if (thread_ids.size() == eventProcessor.net_threads) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a signed/unsigned compare error for this line.

Copy link
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is failing for systems without numa.h. I noted one place, but there could be other similar issues.

getcpu(&cpu, &node);
this->numa_node = node;
}
return this->numa_node;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should handle when numa.h is not available

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getcpu() function is included in the numactl package. Adding a conditional directive should solve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants