Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for mmap() writable mappings. #175

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Support for mmap() writable mappings. #175

wants to merge 12 commits into from

Conversation

aversecat
Copy link
Contributor

@aversecat aversecat commented Jun 11, 2024

Replaces #27, #39.

Contains mostly original patches from andy, touched up for conflicts. Additional fixups and changes to avoid various deadlocks and debug kernel warnings for lock contention issues.

  • - Does not pass xfstests:generic/346 - hard lockup in _mkwrite when doing update_inode
  • - Occasionally fails offline-extent-waiting - when reverse staging, the first blocks of the file end up zeros, not the expected content
  • - Passes all other xfstests
  • - Added cross-node mmap consistency test. doesn't work on el7
  • - sparse warnings about ret not returning vm_fault_t
  • - _walk_inodes page fault safe
  • - _get_allocated_inos page fault safe
  • - fsstress hard lockup in generic/013

@aversecat aversecat added the enhancement New feature or request label Jun 11, 2024
@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

Copy link
Collaborator

@zabbo zabbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a WIP, we knew there'd be W to do :).

tests/tests/mmap.sh Outdated Show resolved Hide resolved
kmod/src/data.c Outdated Show resolved Hide resolved
tests/golden/xfstests Show resolved Hide resolved
kmod/src/data.c Outdated Show resolved Hide resolved
@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@aversecat aversecat changed the title mmap() tree **WIP** mmap() tree. Aug 5, 2024
@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@versity-github
Copy link

@aversecat
Copy link
Contributor Author

one more small compat fix for el7.

@versity-github
Copy link

@versity-github
Copy link

@aversecat
Copy link
Contributor Author

retest

@aversecat aversecat changed the title mmap() tree. Support for mmap() writable mappings. Nov 20, 2024
@aversecat
Copy link
Contributor Author

the -debug- failures are:

  • ng-scoutfs-test-debug-el7:
12:56:06   lock-recover-invalidate          [ failed: output differs ]

The output has bash: terminated interspersed. This is sometimes happening where stderr/stdout for the subprocess isn't properly discarded by redirection to /dev/null.

  • ng-scoutfs-test-debug-el8:
12:38:41   large-fragmented-free            [ failed: unexpected messages in dmesg ]
16:41:21   orphan-inodes                    [ failed: output differs ]

the dmesg here is blocked for more than 120 seconds. and subsequent stack dumps.

The output differs here is this intermittent test failure pattern:

 == orphaned inos in all mounts all deleted
+5264385 still exists
+5274624 still exists
+5284864 still exists
+5295104 still exists
+5305344 still exists
  • ng-scoutfs-test-debug-el94:
12:40:13   large-fragmented-free            [ failed: unexpected messages in dmesg ]
16:05:55   createmany-rename-large-dir      [ failed: unexpected messages in dmesg ]
13:45:10   srch-basic-functionality         [ failed: unexpected messages in dmesg ]

Multiple watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/u4:20:18707] here - the VM is clearly overrun by scheduled tasks.

@zabbo
Copy link
Collaborator

zabbo commented Dec 3, 2024

the -debug- failures are:

Thanks for enumerating these -- yeah, so, the same repeat offenders :/.

Adds the required memory mapped ops struct and page fault handler
for reads.

Signed-off-by: Benjamin LaHaise <[email protected]>
Signed-off-by: Auke Kok <[email protected]>
@aversecat aversecat force-pushed the auke/mmap branch 2 times, most recently from 8b5ba17 to 282fa84 Compare December 3, 2024 23:09
@aversecat
Copy link
Contributor Author

Hmmmm, not looking good in CI.

Both debug-el8 and debug-el94 appear(*) stuck on generic/346 which is holetest with mmap. Trying to reproduce myself now.

(*) they completed generic/343 but output is stuck there. No message indicating 346 actually started.

bcrl and others added 11 commits December 6, 2024 09:56
Add support for writable MAP_SHARED mmap()ings.  Avoid issues with late
writepage()s building transactions by doing the block_write_begin() work in
scoutfs_data_page_mkwrite().  Ensure the page is marked dirty and prepared
for write, then let the VM complete the write when the page is flushed or
invalidated.

Signed-off-by: Benjamin LaHaise <[email protected]>
Signed-off-by: Auke Kok <[email protected]>
Two test programs are added. The run time is about 1min on my el7
instance.

The test script finishes up with a read/write mmap test on offline
extents to verify the data wait paths in those functions.

One program will perform vfs read/write and mmap read/write calls on
the same file from across 5 threads (mounts) repeatedly.  The goal
is to assure there are no locking issues between read/write paths.

The second test program performs consistency checking on a file that is
repeatedly written/read using memory maps and normal reads and writes,
and the content is verified after every operation.

Signed-off-by: Auke Kok <[email protected]>
Now that all of these should be passing, we enable all mmap() tests in
xfstests, and update the golden output with the new tests.

Signed-off-by: Auke Kok <[email protected]>
We merely trace exit values and position, and ignore length.

Because vm_fault_t is __bitwise, sparse will loudly complain about
a plain cast to u32, so we must __force (on el8). ret will be 512 in
normal cases.

Signed-off-by: Auke Kok <[email protected]>
These 2 sections of compat for readdir are wholly obsolete and can be
hard dropped, which restores the method to look like current upstream
code.

This was added in ddd1a4e.

Signed-off-by: Auke Kok <[email protected]>
dir_emit() will copy_to_user, which can pagefault. If this happens while
cluster locked, we could deadlock.

We use a single page to stage dir_emit data, and iterate between
fetching dirents while locked, and emitting them while not locked.

Signed-off-by: Auke Kok <[email protected]>
Now that we support mmap writes, at any point in time we could
pagefault and lock for writes. That means - just like readdir -
we can no longer lock and copy_to_user, since it also may page fault
and thus deadlock.

We statically allocate 32 extent entries on the stack and use
these to shuffle out fiemap entries at a time, locking and
unlocking around collecting and fiemap_fill_extent_next.

Signed-off-by: Auke Kok <[email protected]>
Similar to readdir and fiemap vfs methods, we can't copy to user while
holding cluster locks. The previous comment about it being safe no
longer applies, and this could deadlock.

Rewrite the loop to iterate and store entries in a page, then flush
the page contents while not holding a clusterlock.

Signed-off-by: Auke Kok <[email protected]>
Similar to fiemap, readdir and walk_inodes, this method could have
put_user during a page fault, causing potentially a deadlock.

Signed-off-by: Auke Kok <[email protected]>
While debugging a double unlock error we hit this condition and
debugging would have been a lot easier had we enforced this simple
constraint that we can't decrement the lock users count if it's
already 0.

Signed-off-by: Auke Kok <[email protected]>
@aversecat
Copy link
Contributor Author

aversecat commented Dec 9, 2024

-debug- failures are:

  • el8: hung task timeout in large-fragmented-free, and orphan-inodes failure
  • el94: hung task timeout in large-fragmented-free, and orphan-inodes failure
  • el95: hung task timeout in large-fragmented-free, and this one:
--- golden/archive-light-cycle	2024-12-06 20:05:22.572655167 +0000
+++ /root/scoutfs/tests/results/output/archive-light-cycle	2024-12-07 03:12:08.082647728 +0000
@@ -4,6 +4,8 @@
 == round 1: create
 == round 1: online
 == round 1: verify
+/mnt/test.3/test/archive-light-cycle/dir/3/2-2 /dev/fd/63 differ: char 277401601, line 67726
+script pid 165368 failed: rc 1
 == round 1: release
 == round 1: offline
 == round 1: stage
archive-light-cycle output differs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants