-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue: 4260620 infinite loop latency spec #1104
base: master
Are you sure you want to change the base?
Conversation
@@ -469,8 +469,7 @@ int io_mux_call::call() | |||
|
|||
__log_funcall(""); | |||
|
|||
if (!m_b_sysvar_select_poll_os_force // TODO: evaluate/consider this logic | |||
&& (*m_p_num_all_offloaded_fds == 0)) | |||
if (*m_p_num_all_offloaded_fds == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no offlaoded sockets are mapped then we poll the OS fd anyway, in this case
this PARAM documentation should also be changed, so if this PARAM is enabled
it will modify POLL_OS_RATIO and SKIP_OS to 1 to POLL the OS everytime we poll Offloaded soockets.
also seeing same issue in XLIO, need to fix there too, what you think?
"VMA_SELECT_POLL_OS_FORCE
This flag forces to poll the OS file descriptors while user thread calls
select(), poll() or epoll_wait() even when no offloaded sockets are mapped.
Enabling this flag causes VMA to set VMA_SELECT_POLL_OS_RATIO and
VMA_SELECT_SKIP_OS to 1. This will result in VMA_SELECT_POLL number of
times VMA will poll the OS file descriptors, along side with offloaded
sockets, if such sockets exists.
Note that setting VMA_SELECT_SKIP_OS and VMA_SELECT_POLL_OS_RATIO
directly will override the values these parameters gets while
VMA_SELECT_POLL_OS_FORCE is enabled."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is single place in the code for VMA/XLIO that uses m_b_sysvar_select_poll_os_force
.
In case it is removed here so what is a meaning of VMA_SELECT_POLL_OS_FORCE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you both - I'll remove it and open a Task to check the same bug in XLIO (probably the same bug reoccurs there as well if libxlio indeed inherited this logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a second thought - this flag has the meaning of setting
"VMA_SELECT_POLL_OS_RATIO and
VMA_SELECT_SKIP_OS to 1."
I think the correct way is to keep it and just update the documentation. updated the commit. wdyt @igor-ivanov @BasharRadya ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BasharRadya - FYI
[XLIO] Task #4262473: [VMA inheritance] infinite loop when no offloaded sockets | Assignee: Gal N. | Status: New
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
igor please notice that poll_os_force meaning is to set
VMA_SELECT_POLL_OS_RATIO and VMA_SELECT_SKIP_OS to 1 as i mentioned in my first review (so it will override their default values).
you can claim that we can simply set this adjusted params to 1 instead of using different ENV PARAM to set them, but removing this OLD param is a bit problematic in POV of compatibility (if our many clients already use this param and expect it to exists in newer xlio versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, igor please notice that poll_os_force meaning is to set VMA_SELECT_POLL_OS_RATIO and VMA_SELECT_SKIP_OS to 1 as i mentioned in my first review (so it will override their default values). you can claim that we can simply set this adjusted params to 1 instead of using different ENV PARAM to set them, but removing this OLD param is a bit problematic in POV of compatibility (if our many clients already use this param and expect it to exists in newer xlio versions).
Actually I think the same if VMA_SELECT_POLL_OS_FORCE
behavior can be controlled one to one by VMA_SELECT_POLL_OS_RATIO
and VMA_SELECT_SKIP_OS
, so it is better to set VMA_SELECT_POLL_OS_RATIO
and VMA_SELECT_SKIP_OS
to the right values internally during initialization if user explicitly passes VMA_SELECT_POLL_OS_FORCE
and use identical logic (probably handle_os_countdown()
)
In addition we might consider following
- remove
VMA_SELECT_POLL_OS_FORCE
from USER MANUAL and README keeping support just for compatibility only - describe behavior
VMA_SELECT_POLL_OS_FORCE
basing onVMA_SELECT_POLL_OS_RATIO
andVMA_SELECT_SKIP_OS
in README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - wdyt @igor-ivanov ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" but removing this OLD param is a bit problematic in POV of compatibility (if our many clients already use this param and expect it to exists in newer xlio versions)."?
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please notice to update related code if select_poll_os_force is removed
"if ((env_ptr = getenv(SYS_VAR_SELECT_POLL_OS_FORCE)) != NULL)
select_poll_os_force = (uint32_t)atoi(env_ptr);
if (select_poll_os_force) {
select_poll_os_ratio = 1;
select_skip_os_fd_check = 1;
}
"
dd7d5d3
to
6d89095
Compare
When there are no offloaded sockets, we mistakingly poll for offloaded sockets in case VMA_SELECT_POLL_OS_FORCE=1. This behavior makes us getting stuck in an infinite loop. This patch fixes the issue by polling OS sockets in the case there are no offloaded sockets regardless to VMA_SELECT_POLL_OS_FORCE value. Signed-off-by: Tomer Cabouly <[email protected]>
6d89095
to
e221d58
Compare
@tomerdbz xlio has the same fix Mellanox/libxlio@0377f77 |
Description
When there are no offloaded sockets,
we mistakenly poll for offloaded sockets in case VMA_SELECT_POLL_OS_FORCE=1.
This behavior makes us getting stuck in an infinite loop.
This patch fixes the issue by polling OS sockets
in the case there are no offloaded sockets
regardless to VMA_SELECT_POLL_OS_FORCE value.
What
Fix latency spec bug of getting stuck in an infinite loop when there are no offloaded sockets.
Why ?
Customer bug.
Solves https://redmine.mellanox.com/issues/4260620
Change type
What kind of change does this PR introduce?
Check list