-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify core 2 #13
base: master
Are you sure you want to change the base?
Simplify core 2 #13
Conversation
…ation Details: - Disable DpdkDriver in CMakeLists.txt temporarily - Remove unused method Driver::Packet::getMaxPayloadSize() - Remove param `driver` in handleXXXPacket - Change Driver::Packet to become a POD struct - Change Driver::Packet::{address,priority} into params in Driver::sendPacket - Remove the opaque Driver::Address - Use IP packets as the common interface between transport and driver - Extend Homa packet headers to include L4 src/dst port numbers - Use SocketAddress (i.e., ip + port) as opposed to Driver::Address to identify the src/dst address of a message
- Reenable DPDK in CMakeLists.txt - Initialize an ARP table at driver startup using the content of /proc/net/arp - Select eth port via the symbolic name of the network interface (e.g., eno1d1) (the current implementation uses ioctl to obtain the IP and MAC addresses of a network interface) - Add a system test for DPDK driver: test/dpdk_test.cc
- Change IpAddress from typedef to a POD type to provide some type safety - Add a third argument to Driver::receivePackets() to hold source addresses of ingress packets when the method returns - Eliminate Driver::Packet (use Homa::PacketSpec instead) - Move L4 header fields sport/dport into header prefix
Timeouts are now checked incrementally with one bucket checked every poll iteration. Previously, timeout checking was more concentrated; checking was triggered less frequently but all buckets would be checked once triggered. The previous design allows the poll iteration to complete quickly when no timeout checking is triggered but significantly increases poll() execution times when checking needs to occur. This causes large, periodic latency spikes when using the transport. In the new design sightly increases the minimum poll() execution times but allows the work of checking timeouts to be more evenly distributed over time and thus reduces latency spikes.
Homa will now ensure that an OutMessage is fully sent before it is cleaned up if an application releases the message after calling send(). Previously, released messages that were in progress of being sent were cancelled and immediately cleaned up. Canceling a message now requires an explicit call to cancel().
Make the Perf::Counters::Stat::add(T val) operation thread unsafe. This significantly reduces the overhead of updating a Stat and is ok because add() is only called on thread-local Stat instances.
Previously, the number of messages completed per call to trySend() was limited by an std::array allocated on the stack. This was done to reduce the allocation overhead. New Perf microbenchmarks show that reserving capacity in an std::vector is likely efficient enough and would allow trySend() to seamlessly handle an unlimited number of message completions per call to trySend().
The Sender will now skip pinging a message if the message still has granted but unsent packets (e.g. it is waiting on itself).
Will be used in future integration with non-polling interface.
Will be used by a future non-polling interface.
…sed on Collin's comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clean up, especially of the interfaces, looks pretty good; I don't have any issues there. My main concern is with the locking.
Overall, it seems like this PR reduces the size/scope of the bucket and queue/scheduler mutex critical sections in most places. Unfortunately, doing this seems to have introduced potential race conditions and TOCTOU bugs. Perhaps I missed some critical factors that rule out these issues, but the locking did not make it obvious that the code is race-free. I would argue that using a locking style more like monitors with the larger critical sections makes it easier to convince ourselves that the logic is thread-safe. What was the motivation for using fine-grain critical sections? Can we achieve your goals and still have obviously thread-safe code? Let's find a time to discuss.
@@ -179,7 +177,6 @@ Driver::Packet* | |||
DpdkDriver::Impl::allocPacket() | |||
{ | |||
DpdkDriver::Impl::Packet* packet = nullptr; | |||
SpinLock::Lock lock(packetLock); | |||
static const int MBUF_ALLOC_LIMIT = NB_MBUF - NB_MBUF_RESERVED; | |||
if (mbufsOutstanding < MBUF_ALLOC_LIMIT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mbufsOutstanding
is now unprotected. Have you double checked to make sure there aren't syncronization and TOCTOU issues? At the very least, it should probablly be declared atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to make the thread-safety of ObjectPool
configurable via a template argument, which is generally useful anyway. Since I am lazy, I am going to revert all changes on DpdkDriver.
src/Timeout.h
Outdated
{ | ||
list.remove(&timeout->node); | ||
timeout->expirationCycleTime = | ||
PerfUtils::Cycles::rdtsc() + timeoutIntervalCycles; | ||
timeout->expirationCycleTime = now + timeoutIntervalCycles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a caller provided now
is problematic. There is no gaurantee that the caller will provide monotically increasing now
values, which would break the timer list's ordering invariant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it; revert the change. By the way, this method won't be called very often anyway once my changes on the timeout mechanism are merged.
src/Receiver.cc
Outdated
{ | ||
SpinLock::Lock lock_bucket(bucket->mutex); | ||
message = bucket->findMessage(id, lock_bucket); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message might be invalid after bucket mutex release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended the critical section to the entire method.
src/Receiver.cc
Outdated
// Things that must be true (sanity check) | ||
assert(id == message->id); | ||
assert(message->driver == driver); | ||
// Copy the payload into the message buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why copy the data out while processing the packet instead of when the application calls receive()
? I can think of potential pros and cons but I'm wondering what you are considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be able to release the packet buffer back to the driver when this method returns, you would have to do the copy here. In general, it's going to be faster to do copy while the packet data is still hot; this also helps reduce the number of RX buffers in use in the driver.
src/Receiver.cc
Outdated
message->numResendTimeouts++; | ||
if (message->numResendTimeouts >= MESSAGE_TIMEOUT_INTERVALS) { | ||
// Message timed out before being fully received; drop the message. | ||
messageAllocator.destroy(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this deadlock on ~Message()
? The bucket->mutex
is held and also used the destructor, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good catch. Now I used a UniqueLock
and released the lock before calling dropMessage
(the alternative is to duplicate functionality from dropMessage
like the original code). Yes, I also revived dropMessage
because it feels like information leakage to detach a message from the transport inside its destructor.
There are three advantages with the fine-grain critical section approach. First, the code is more explicit about what need to be protected. Second, we can flip the locking order constraint to be queueMutex before bucket mutex and make the code in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Mostly small comments. The possible remaining discussion topics are:
- John's comments about the copying interface.
- Test coverage
- C++ version
- Next steps
@@ -99,38 +102,66 @@ class ObjectPool { | |||
template <typename... Args> | |||
T* construct(Args&&... args) | |||
{ | |||
void* backing = NULL; | |||
void* backing = nullptr; | |||
enterCS(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea of making the locking optional. These methods are small so I doesn't matter all that much but I'm wondering if there is an easy RAII version of locking and unlocking; they tend to be less error prone.
// A new Message needs to be entered into the scheduler. | ||
SpinLock::Lock lock_scheduler(schedulerMutex); | ||
schedule(message, lock_scheduler); | ||
} else if (message->scheduled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 148 already says if (message->scheduled)
. One of these lines is probably wrong. Does this patch include test coverage? If so, a unit test is likely missing to test this case.
|
||
// Deliver the message to the user of the transport. | ||
SpinLock::Lock lock_received_messages(receivedMessages.mutex); | ||
receivedMessages.queue.push_back(&message->receivedMessageNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we safely access the message
's members without the bucket_lock? If so, it should be documented as to why.
@@ -785,30 +599,27 @@ Receiver::trySendGrants() | |||
* Reminder to hold the Receiver::schedulerMutex during this call. | |||
*/ | |||
void | |||
Receiver::schedule(Receiver::Message* message, const SpinLock::Lock& lock) | |||
Receiver::schedule(Receiver::Message* message, | |||
[[maybe_unused]] SpinLock::Lock& lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version of C++/gcc are you assuming? The project currently checks for C++11.
Made some more improvements atop #11.