Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fence clients with rid==0 #190

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

aversecat
Copy link
Contributor

Occasionally during the export-lookup-evict-race test we see the following failure dmesg output when a server fences a client that has no valid rid:

[ 828.379546] sysfs: cannot create duplicate filename '/fs/scoutfs/f.b928e1.r.7b36c0/fence/0000000000000000' ...
[ 828.379773] kobject_add_internal failed for 0000000000000000 with -EEXIST, don't try to register things with the same name in the same directory. [ 828.385946] scoutfs f.b928e1.r.7b36c0 error: client fence returned err -17, shutting down server

This fails the test. Fencing these clients is unwanted, but we definitely don't want to create duplicate sysfs entries for it, either.

Don't fence clients like this, just return 0.

Occasionally during the export-lookup-evict-race test we see the
following failure dmesg output when a server fences a client that has no
valid rid:

[  828.379546] sysfs: cannot create duplicate filename '/fs/scoutfs/f.b928e1.r.7b36c0/fence/0000000000000000'
...
[  828.379773] kobject_add_internal failed for 0000000000000000 with -EEXIST, don't try to register things with the same name in the same directory.
[  828.385946] scoutfs f.b928e1.r.7b36c0 error: client fence returned err -17, shutting down server

This fails the test. Fencing these clients is unwanted, but we
definitely don't want to create duplicate sysfs entries for it, either.

Don't fence clients like this, just return 0.

Signed-off-by: Auke Kok <[email protected]>
@aversecat
Copy link
Contributor Author

[  828.379546] sysfs: cannot create duplicate filename '/fs/scoutfs/f.b928e1.r.7b36c0/fence/0000000000000000'
[  828.379548] CPU: 1 PID: 14628 Comm: kworker/u4:4 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-513.9.1.el8_9.x86_64 #1
[  828.379550] Hardware name: Red Hat KVM, BIOS 1.16.0-4.module+el8.9.0+1408+7b966129 04/01/2014
[  828.379552] Workqueue: scoutfs_net_server scoutfs_net_reconn_free_worker [scoutfs]
[  828.379582] Call Trace:
[  828.379607]  dump_stack+0x41/0x60
[  828.379612]  sysfs_warn_dup.cold.4+0x17/0x2a
[  828.379617]  sysfs_create_dir_ns+0xb3/0xe0
[  828.379620]  kobject_add_internal+0xc2/0x290
[  828.379622]  kobject_init_and_add+0x71/0xa0
[  828.379624]  scoutfs_sysfs_create_attrs_parent+0x16c/0x1a0 [scoutfs]
[  828.379658]  ? scoutfs_fence_start+0x43/0x190 [scoutfs]
[  828.379688]  ? kmem_cache_alloc_trace+0x142/0x280
[  828.379691]  scoutfs_fence_start+0xb4/0x190 [scoutfs]
[  828.379711]  scoutfs_net_reconn_free_worker+0x1bd/0x270 [scoutfs]
[  828.379732]  process_one_work+0x1d3/0x390
[  828.379736]  ? process_one_work+0x390/0x390
[  828.379738]  worker_thread+0x30/0x390
[  828.379740]  ? process_one_work+0x390/0x390
[  828.379742]  kthread+0x134/0x150
[  828.379745]  ? set_kthread_struct+0x50/0x50
[  828.379747]  ret_from_fork+0x35/0x40
[  828.379773] kobject_add_internal failed for 0000000000000000 with -EEXIST, don't try to register things with the same name in the same directory.
[  828.385946] scoutfs f.b928e1.r.7b36c0 error: client fence returned err -17, shutting down server
export-lookup-evict-race unexpected messages in dmesg

for reference - caller path is known.

@@ -238,6 +238,11 @@ int scoutfs_fence_start(struct super_block *sb, u64 rid, __be32 ipv4_addr, int r
struct pending_fence *fence;
int ret;

if (!rid) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, I don't think this'll work. Depending on where the fence came from we might not end up cleaning it up. See the _fence_next calls in the server that lead to _recov_finished. We'll need to more about the precise source of the fencing request to avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we know it comes from scoutfs_net_reconn_free_worker+0x1bd and not the other 2 callers. I'll look into that.

Also, we can just stopgap and not create sysfs entries here, too, and just let all the fencing happen. The worst that could happen, I guess, is that we end up fencing the wrong clients that also have rid=0, so maybe not such a big deal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants