-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: decrease ctrlr_loss_timeout_sec for base bdev #170
Conversation
WalkthroughThe changes modify several constants in the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
app/cmd/basic/bdev_nvme.go (2)
Line range hint
150-170
: Misleading CLI interfaceThe command still accepts
ctrlr-loss-timeout-sec
,reconnect-delay-sec
, andfast-io-fail-timeout-sec
flags, but these values are not used in theBdevNvmeSetOptions
call. This creates a confusing interface where users might think they're configuring these timeouts when they're actually not.Consider either:
- Restoring the timeout parameters in the
BdevNvmeSetOptions
call, or- Removing these unused flags from the CLI interface
If option 1 is chosen, modify line 239 to:
- result, err := spdkCli.BdevNvmeSetOptions(int32(c.Int("transport-ack-timeout")), int32(c.Int("keep_alive_timeout_ms"))) + result, err := spdkCli.BdevNvmeSetOptions( + int32(c.Int("transport-ack-timeout")), + int32(c.Int("keep_alive_timeout_ms")), + int32(c.Int("ctrlr-loss-timeout-sec")), + int32(c.Int("reconnect-delay-sec")), + int32(c.Int("fast-io-fail-timeout-sec")), + )
239-239
: Summary: Implementation needs revisionThe current changes deviate significantly from the PR's objective of decreasing the
ctrlr_loss_timeout_sec
value. Instead of removing the timeout parameters completely, consider:
- Keeping the parameters but adjusting the default value of
ctrlr_loss_timeout_sec
- Documenting the rationale for any removed parameters
- Ensuring CLI flags accurately reflect the available options
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (3)
app/cmd/basic/bdev_nvme.go
(1 hunks)pkg/spdk/client/basic.go
(1 hunks)pkg/types/types.go
(1 hunks)
🔇 Additional comments (2)
app/cmd/basic/bdev_nvme.go (1)
Line range hint 239-244
: Document the new default values
Since the timeout parameters are no longer configurable through this interface, please document:
- What are the new default values?
- Where are these defaults set?
- How do these changes affect error handling and recovery?
Let's check where these defaults might be defined:
✅ Verification successful
Based on the gathered information, I can now provide a final response about the timeout parameters:
Default timeout values are defined in the codebase
The default values are defined in pkg/types/types.go
:
DefaultCtrlrLossTimeoutSec = 30
(controller loss timeout in seconds)DefaultTransportAckTimeout
is 14 (which results in 2^14 = 16384 msec = 16.384 sec)
The timeouts are set in the SPDK client (pkg/spdk/client/basic.go
) and have the following characteristics:
transportAckTimeout
: Time to wait for acknowledgment until retransmission for RDMA or connection close for TCP (Range 0-31 where 0 means use default)keepAliveTimeoutMs
: Keep alive timeout in milliseconds
These parameters can only be set before SPDK subsystems are initialized or any bdev NVMe has been created.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for default timeout definitions
rg "DefaultCtrlrLossTimeout|default.*timeout" -g '*.go'
# Look for related configuration files
fd -e yaml -e json -x grep -l "timeout"
Length of output: 558
Script:
#!/bin/bash
# Let's check the full context of the default timeout definitions
cat pkg/types/types.go | grep -A 10 -B 10 "DefaultCtrlr"
# Let's also check the implementation of BdevNvmeSetOptions
ast-grep --pattern 'func BdevNvmeSetOptions($$$) {
$$$
}'
Length of output: 877
Script:
#!/bin/bash
# Let's check the implementation details in the SPDK client
rg "BdevNvmeSetOptions" -A 10
# Let's also look for any documentation about these timeouts
fd README -x cat {}
Length of output: 53663
pkg/spdk/client/basic.go (1)
776-779
: Verify the impact of removing timeout parameters.
The removal of ctrlrLossTimeoutSec
, reconnectDelaySec
, and fastIOFailTimeoutSec
parameters from BdevNvmeSetOptions
could affect I/O error resiliency. While this change aligns with fixing the race condition mentioned in issue #9874, we should ensure:
- The default timeout values used by SPDK are appropriate for the use case
- The removal doesn't introduce new issues in error recovery scenarios
Let's verify the usage and impact:
Build AMD64 PR check failed.
|
Yes, environmental issue. Waiting for @yangchiu fix it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go-spdk-helper
binary accidentally added?
Yes.. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #170 +/- ##
=======================================
Coverage 22.89% 22.89%
=======================================
Files 34 34
Lines 4813 4813
=======================================
Hits 1102 1102
Misses 3537 3537
Partials 174 174
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Longhorn 9874 Signed-off-by: Derek Su <[email protected]>
Signed-off-by: Derek Su <[email protected]>
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9874
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context