Fix timerfd restore problem with double parsing #2030

hejingxian123 · 2022-12-26T09:53:53Z

We restore timerfd with the state of it_value and it_interval.
However, when it_value is zero, the timer will be restore without running.
Because it_value is changing with the timer running, we can restore the timerfd with double parsing the it_value.
If the timer is running, the it_value can be non-zero at least once.

Signed-off-by: Jingxian He [email protected]

Change made through this commit: - Include copy of flog as a seperate tree. - Modify the makefile to add and compile flog code. Signed-off-by: prakritigoyal19 <[email protected]>

CID 302713 (checkpoint-restore#1 of 1): Missing varargs init or cleanup (VARARGS) va_end was not called for argptr. Signed-off-by: Adrian Reber <[email protected]>

Separate commit for easier criu-dev <-> master transfer. Acked-by: Mike Rapoport <[email protected]> Signed-off-by: Adrian Reber <[email protected]>

It is mapped, not maped. Same applies for mmap I guess. Found by codespell, except it wants to change it to mapped, which will make it less specific. Signed-off-by: Kir Kolyshkin <[email protected]>

Brought to you by codespell -w (using codespell v2.1.0). [v2: use "make indent" on the result] Signed-off-by: Kir Kolyshkin <[email protected]>

It can be confusing to see error from post-dump action script and non zero return from criu though at the same time see "Dumping finished successfully" in log. I believe it is logical to consider post-dump action script as a part of "dump" process so fail in it means that the whole dump failed. Signed-off-by: Pavel Tikhomirov <[email protected]>

Signed-off-by: Radostin Stoyanov <[email protected]>

As private hugetlb mappings are not pre-mapped, the content of them is restored in the the restorer which cannot use page_read->read_pages. As a result, we cannot recursively read the content of pre-dumped image in the parent directory and use preadv to read the content from the last dumped image only. Therefore, it may freeze while restoring when the content of mapping is in pre-dumped image in parent directory. We need to skip pre-dumping on hugetlb mappings to resolve the issue. Suggested-by: Alexander Mikhalitsyn <[email protected]> Signed-off-by: Bui Quang Minh <[email protected]>

This reverts commit 37ea8c5. Signed-off-by: Bui Quang Minh <[email protected]>

Reported-by: Mr. Jenkins (ppc64le) Signed-off-by: Alexander Mikhalitsyn <[email protected]>

Name collision with an abandoned project named 'crit' in pypi causes pip to show crit (CRiu Image Tool) as outdated. This patch updates crit to use the same version and license as criu. Fixes checkpoint-restore#1878 Signed-off-by: Radostin Stoyanov <[email protected]>

But actually, 5a92f10 probably has to be reverted as a whole. PIPE_MAX_SIZE is the hard limit to avoid PAGE_ALLOC_COSTLY_ORDER allocations in the kernel. But F_SETPIPE_SZ rounds up a requested pipe size to a power-of-2 pages. It means that when we request PIPE_MAX_SIZE that isn't a power-of-2 number, we actually request a pipe size greater than PIPE_MAX_SIZE. Fixes: 5a92f10 ("page-pipe: Resize up to PIPE_MAX_SIZE") Signed-off-by: Andrei Vagin <[email protected]>

Signed-off-by: Andrei Vagin <[email protected]>

Due to side effects of F_SETPIPE_SZ, the actual pipe size can be greater than PIPE_MAX_SIZE. Signed-off-by: Andrei Vagin <[email protected]>

In this case, vmplice attaches pages without coping them. Signed-off-by: Andrei Vagin <[email protected]>

* handle unexpected errors of process_vm_readv * adjust riovs in analyze_iov * call handle_faulty_iov only if process_vm_readv returns EFAULT. Signed-off-by: Andrei Vagin <[email protected]>

Signed-off-by: Andrei Vagin <[email protected]>

@siris

When building packages for CRIU the source directory might have a name different than 'criu'. Fixes: checkpoint-restore#1877 Reported-by: @siris Signed-off-by: Radostin Stoyanov <[email protected]>

Building the criu packages for Ubuntu/Debian fails with: mkdir: cannot create directory '/var/lib/criu': Permission denied This patch updates PLUGINDIR with the value /usr/lib/criu Fixes: checkpoint-restore#1877 Signed-off-by: Radostin Stoyanov <[email protected]>

This allows us to only detect bad formating in PR changes but not all the CRIU codebase. Signed-off-by: Pavel Tikhomirov <[email protected]>

criu-ns script incorrectly compares the pidns fd with mntns fd. Also reversed the condition in is_my_namespace function to align it with the function name. Signed-off-by: Ashutosh Mehra <[email protected]>

Before this patch, if we had a unixsk with incomming scm packets (with fds) and with the sender side fd closed, we got an error: Error (criu/sk-unix.c:1125): unix: Can't find sender for 0x1e First part of the problem is that unix_note_scm_rights() expects to see a "queuer" which would send scm packets to the unixsk, and there is no as the sender side is closed. Second part of the problem is that we already have "fake" queuers feature so that it already creates a unix socket pair and leaves other end open for later queuing packets. But function add_fake_unix_queuers() is called after unix_note_scm_rights() thus there is no chance to find queuer at the point of failure. Third part is that when we look for a queuer in find_queuer_for() we actually look for a socket for which we are a queuer and not for the socket which is a queuer for us, which is opposite to the name. For cases where both ends are alive both are queuers for each other so this was not important, but for our closed sender case it breaks. So let's reorder add_fake_unix_queuers() before unix_note_scm_rights() and make find_queuer_for() actually do what it's name implies. This situation is started to reproduce on Virtuozzo start/stop tests with the unixsk belonging to systemd, we suppose that this state where the sender fd side is closed happens rarely only on systemd start/stop, so we don't see it in regular suspend resume of long-living containers. Signed-off-by: Pavel Tikhomirov <[email protected]>

Signed-off-by: Pavel Tikhomirov <[email protected]>

This helper restores master_id and shared_id of first mount in the sharing group. It first copies sharing from either external source or internal parent sharing group and makes master_id from shared_id. Next it creates new shared_id when needed. All other mounts except first are just copied from the first one. Signed-off-by: Pavel Tikhomirov <[email protected]>

…root It's a problem when while restoring sharing group we need to copy sharing between two mounts with non-intersecting roots, because kernel does not allow it. We have a case opencontainers/runc#3442, where runc adds different devtmpfs file-bindmounts to container and there is no fsroot mount in container for this devtmpfs, thus mount-v2 faces the above problem. Luckily for the case of external mounts which are in one sharing group and which have non-intersecting roots, these mounts likely only have external master with no sharing, so we can just copy sharing from external source and make it slave as a workaround. checkpoint-restore#1886 Signed-off-by: Pavel Tikhomirov <[email protected]>

Signed-off-by: Pavel Tikhomirov <[email protected]>

Signed-off-by: Radostin Stoyanov <[email protected]>

…roach Currently, the content of anonymous private hugetlb mapping is dumped in 2 different images: memfd approach and normal private mapping dumping. In memfd approach, we dump the content of the backing pseudo file (/anon_hugepage). This is incorrect and redundant since the mapping is private, the content of backing file may differ from the content of the mapping. With this commit, we remove the redundant memfd approach dump and only do the normal private mapping dump on anonymous hugetlb mapping. Run zdtm.py run -f h --keep-img always -t zdtm/static/maps09, du -h in the dumped image directory Before this commit 13M test/dump/zdtm/static/maps09/55/1 After this commit 8.5M test/dump/zdtm/static/maps09/55/1 The reduction in size is approximately 4MB which is the size of anonymous private hugetlb mapping in the test. Signed-off-by: Bui Quang Minh <[email protected]>

…nter Else we have a Segmentation fault in __move_mount_set_group() on xfree(source_mp) if resolve_mountpoint() returned statically allocated path. Signed-off-by: Pavel Tikhomirov <[email protected]>

This test has one external mount [criumntns] /zdtm_root_ext.tmp -> [testmntns] /mnt_root_ext.test, and it specifically gives '--external mnt[MNT]:.zdtm_root_ext.tmp' option on restore without '/' to make dirname on it return static '.' path (see glibc dirname() code) and reproduce a segfault in resolve_mountpoint(). Signed-off-by: Pavel Tikhomirov <[email protected]>

Check that CRIU can checkpoint/restore global properties in cgroup-v2 properly. Signed-off-by: Bui Quang Minh <[email protected]>

Currently, we assume all threads in process are in the same cgroup controllers. However, with threaded controllers, threads in a process may be in different controllers. So we need to dump cgroup controllers of every threads in process and fixup the procfs cgroup parsing to parse from self/task/<tid>/cgroup. Signed-off-by: Bui Quang Minh <[email protected]>

…lers As threads in a process may be in different threaded controllers, we need to move thoses threads to the correct controllers. Because the threads of a process are restored in later stage in restorer.c, we need to create a cgroupd service to help to move those threads into correct controllers when they are restored. We cannot use usernsd as the code in restorer does not know the address of outside function to pass to userns_call. However, this cgroupd service still reuses a lot of code from usernsd. The main logic is that restored threads receive the cg_set number they belong to before restorer stage in case their cg_set are different from main thread. When these threads are restored, they send the cg_set number and their thread ids through unix socket to cgroupd. cgroupd receives the cg_set number and thread ids and moves those threads into correct controllers. Thread ids are sent through SCM_CREDENTIALS of unix socket so they are translated into correct thread ids in the receiving end. Signed-off-by: Bui Quang Minh <[email protected]>

This test creates a process with 2 threads in different threaded controllers and check if CRIU restores these threads' cgroup controllers properly. Signed-off-by: Bui Quang Minh <[email protected]>

As cgroupv2_00, cgroupv2_01 need cpuset in cgroup-v2 hierarchy to check CRIU handle cgroup-v2 properly, umount cpuset in cgroup-v1 to make it move to cgroup-v2. Signed-off-by: Bui Quang Minh <[email protected]>

…lled Signed-off-by: Adrian Reber <[email protected]>

A previous commit added a cgroup cpuset unmounting to scripts/ci/Makefile. We are sometimes running in a container without the necessary privileges to unmount certain cgroups. This commit moves the cgroup unmounting to a place in run-ci-tests.sh which already requires privileged access and does not break unprivileged build-only CI runs. Signed-off-by: Adrian Reber <[email protected]>

…turned Some users on Raspberry Pi report that the kerndat checking for memfd_create(MFD_HUGETLB) support returns ENOSYS even when memfd_create syscall is available. We currently treat this error as unexpected and return error. This commit marks the memfd_create(MFD_HUGETLB) as unavailable when ENOSYS is returned. Signed-off-by: Bui Quang Minh <[email protected]>

Zombie tasks are dumped in dump_zombies() so it is redundant to handle them in dump_one_task(). Deprecate cg_set in task_core_entry as this field must be per thread now. Signed-off-by: Bui Quang Minh <[email protected]>

Signed-off-by: Mathias Gibbens <[email protected]>

This patch adds a missing definition for `__nmk_dir` in the Makefile for the amdgpu plugin. This definition is required, for example, when building the `test_topology_remap` target: make -C plugins/amdgpu/ test_topology_remap Signed-off-by: Radostin Stoyanov <[email protected]>

While building on a machine that has a HOL clang compiler, I ran into warnings regarding the changed line. It appears this warning is on by default because of anticipated changes to the C standard. Signed-off-by: Drew Wock <[email protected]>

The way ShellCheck is installed was changed in commit c056f99 (ci/gha/lint: install a recent shellcheck) to use the latest version v0.8.0 and remove some of the "shellcheck disable=..." annotations. Since then, Fedora 37 has been released and the ShellCheck package has been updated to v0.8.0. Signed-off-by: Radostin Stoyanov <[email protected]>

The python3 package in Alpine has recently been updated to install symbolic link for /usr/bin/python. https://git.alpinelinux.org/aports/commit/main/python3?id=d91da210b1614eb75517d59b7f348fee01699f35 This causes the following error in CI: Step 10/11 : RUN ln -s /usr/bin/python3 /usr/bin/python ---> Running in a5a94be9dc93 ln: failed to create symbolic link '/usr/bin/python': File exists The command '/bin/sh -c ln -s /usr/bin/python3 /usr/bin/python' returned a non-zero code: 1 Signed-off-by: Radostin Stoyanov <[email protected]>

This patch fixes applies the changes required by clang-format v15.0.5 for `make indent`. Signed-off-by: Radostin Stoyanov <[email protected]>

In order to reduce the frequency of using system call, based on https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/misc/create_inode.c#n519, I created a new algorithm of dumping chunk via fiemap.(copy_file_to_chunks_fiemap) Also, I added another BOOL_OPT for users to determine which algorithm they want to use. Moreover, for those filesystem not supporting fiemap, criu will fall back to the original algorithm(SEEK_HOLE/SEEK_DATA). v2: don't call copy_chunk_from_file on outstanding extent; rearange headers to workaround "redeclaration of ‘enum fsconfig_command’" problem Signed-off-by: Liang-Chun Chen <[email protected]>

ghost_multi_hole00 and ghost_multi_hole01 are tests which create a ghost file with a lot of holes, there are 4K data and 4K hole inside every 8K length. The only difference between them is ghost-fiemap option, 01 is a test for the fiemap dumping algorithm, and we want to test the behavior of EXTENT_MAX_COUNT part, so the file size should be 8M, thus there will be 1024 chunks in the ghost file. In some file system, such as xfs, we somehow can not easily create highly sparse file as in ext4 or btrfs, therefore we need `fallocate` to forcibly create holes. Signed-off-by: Liang-Chun Chen <[email protected]>

Signed-off-by: Shubham Verma <[email protected]>

SO_SNDBUFFORCE/SO_RCVBUFFORCE require root or CAP_NET_ADMIN. We can use SO_SNDBUF/SO_RCVBUF in some cases and avoid needing elevated privileges. This patch renames sk_setbufs() to sk_setbufs_ns() and makes sk_setbufs() a general helper that sets socket send and receive buffer sizes. The helper tries to use SO_SNDBUFFORCE/SO_RCVBUFFORCE first and falls back to SO_SNDBUF/SO_RCVBUF if we're in unprivileged mode. The existing sk_setbufs_ns() which takes a pid parameter and is intended to be called via userns_call() is rewritten to call sk_setbufs(). Existing code that sets buffer sizes via setsockopt() is modified to call sk_setbufs() instead. Signed-off-by: Younes Manton <[email protected]>

Restoring SO_MARK requires root or CAP_NET_ADMIN. If the value is 0 we will avoid dumping it so that we don't need to do a privileged call on restore. Signed-off-by: Younes Manton <[email protected]>

Signed-off-by: Andrei Vagin <[email protected]>

TestNG is vulnerable to Path Traversal Fixes https://github.com/checkpoint-restore/criu/security/dependabot/1. Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Radostin Stoyanov <[email protected]>

We restore timerfd with the state of it_value and it_interval. However, when it_value is zero, the timer will be restore without running. Because it_value is changing with the timer running, we can restore the timerfd with double parsing the it_value. If the timer is running, the it_value can be non-zero at least once. Signed-off-by: Jingxian He <[email protected]>

avagin · 2023-01-02T01:22:39Z

criu/timerfd.c

@@ -75,6 +75,12 @@ static int dump_one_timerfd(int lfd, u32 id, const struct fd_parms *p)
 	tfe.id = id;
 	tfe.flags = p->flags;
 	tfe.fown = (FownEntry *)&p->fown;
+
+	/* when it_value is zero, we need parse it again */


Could you write a more details comment here? Right now, it is unclear why we need to parse it again.

avagin · 2023-01-02T01:23:23Z

Can we introduce a test for this case?

github-actions · 2023-02-02T00:48:57Z

A friendly reminder that this PR had no activity for 30 days.

prakritigoyal19 and others added 30 commits May 12, 2022 17:55

Add flog to CRIU

6f8d543

Change made through this commit: - Include copy of flog as a seperate tree. - Modify the makefile to add and compile flog code. Signed-off-by: prakritigoyal19 <[email protected]>

flog: Missing varargs init or cleanup (VARARGS)

bf7b517

CID 302713 (checkpoint-restore#1 of 1): Missing varargs init or cleanup (VARARGS) va_end was not called for argptr. Signed-off-by: Adrian Reber <[email protected]>

Run 'make indent' on 'flog/'

09348a2

Separate commit for easier criu-dev <-> master transfer. Acked-by: Mike Rapoport <[email protected]> Signed-off-by: Adrian Reber <[email protected]>

flog: typo: mmaped -> mmapped

d355c36

It is mapped, not maped. Same applies for mmap I guess. Found by codespell, except it wants to change it to mapped, which will make it less specific. Signed-off-by: Kir Kolyshkin <[email protected]>

flog: fix some codespell warnings

6eafe4e

Brought to you by codespell -w (using codespell v2.1.0). [v2: use "make indent" on the result] Signed-off-by: Kir Kolyshkin <[email protected]>

ci: Fix unsafe repository error

03539d4

Signed-off-by: Radostin Stoyanov <[email protected]>

Revert "ci: skip new hugetlb maps09/maps10 tests for pre-dump"

2de7eea

This reverts commit 37ea8c5. Signed-off-by: Bui Quang Minh <[email protected]>

zdtm: skip zdtm/static/shm-hugetlb when hugetlb is not supported

a1fb2ec

Reported-by: Mr. Jenkins (ppc64le) Signed-off-by: Alexander Mikhalitsyn <[email protected]>

page-xfer: use negative values for error codes

d84e2e4

Signed-off-by: Andrei Vagin <[email protected]>

page-xfer: adjust a buffer to a pipe size

6d879d5

Due to side effects of F_SETPIPE_SZ, the actual pipe size can be greater than PIPE_MAX_SIZE. Signed-off-by: Andrei Vagin <[email protected]>

pre-dump: call vmsplice with SPLICE_F_GIFT

a6aae07

In this case, vmplice attaches pages without coping them. Signed-off-by: Andrei Vagin <[email protected]>

page-xfer: refactoring analyze_iov and fill_userbuf

574f396

* handle unexpected errors of process_vm_readv * adjust riovs in analyze_iov * call handle_faulty_iov only if process_vm_readv returns EFAULT. Signed-off-by: Andrei Vagin <[email protected]>

ci: test the read mode of pre-dump

45641ab

Signed-off-by: Andrei Vagin <[email protected]>

amdgpu/Makefile: Fix include path

cd0ed7e

When building packages for CRIU the source directory might have a name different than 'criu'. Fixes: checkpoint-restore#1877 Reported-by: @siris Signed-off-by: Radostin Stoyanov <[email protected]>

github: use git-clang-format instead of make indent

98eda32

This allows us to only detect bad formating in PR changes but not all the CRIU codebase. Signed-off-by: Pavel Tikhomirov <[email protected]>

Fix the check for mnt namespace in criu-ns

0db600d

criu-ns script incorrectly compares the pidns fd with mntns fd. Also reversed the condition in is_my_namespace function to align it with the function name. Signed-off-by: Ashutosh Mehra <[email protected]>

zdtm/scm: add scm09 test with closed sender fd

8a147da

Signed-off-by: Pavel Tikhomirov <[email protected]>

zdtm: test multiple ext bindmounts with no common root and same master

58a2d98

Signed-off-by: Pavel Tikhomirov <[email protected]>

amdgpu: Add gitignore

edb3b8f

Signed-off-by: Radostin Stoyanov <[email protected]>

util/mount-v2: fix resolve_mountpoint() to always return freeable poi…

dc160c0

…nter Else we have a Segmentation fault in __move_mount_set_group() on xfree(source_mp) if resolve_mountpoint() returned statically allocated path. Signed-off-by: Pavel Tikhomirov <[email protected]>

minhbq-99 and others added 23 commits November 1, 2022 21:35

zdtm: Add test to check global properties of cgroup-v2 are preserved

a8328c7

Check that CRIU can checkpoint/restore global properties in cgroup-v2 properly. Signed-off-by: Bui Quang Minh <[email protected]>

zdtm: Check threads are restored into correct threaded controllers

030c5ab

This test creates a process with 2 threads in different threaded controllers and check if CRIU restores these threads' cgroup controllers properly. Signed-off-by: Bui Quang Minh <[email protected]>

ci: Make cpuset move to cgroup-v2 hierarchy

d3ed3e9

As cgroupv2_00, cgroupv2_01 need cpuset in cgroup-v2 hierarchy to check CRIU handle cgroup-v2 properly, umount cpuset in cgroup-v1 to make it move to cgroup-v2. Signed-off-by: Bui Quang Minh <[email protected]>

ci: Do not fail if latest epel repository definition is already insta…

f47f5c0

…lled Signed-off-by: Adrian Reber <[email protected]>

cgroup: Remove redundant code that handles zombie tasks

52435d7

Zombie tasks are dumped in dump_zombies() so it is redundant to handle them in dump_one_task(). Deprecate cg_set in task_core_entry as this field must be per thread now. Signed-off-by: Bui Quang Minh <[email protected]>

Remove execute bit from source file

1c6517a

Signed-off-by: Mathias Gibbens <[email protected]>

ci: fix make indent

4f659d5

This patch fixes applies the changes required by clang-format v15.0.5 for `make indent`. Signed-off-by: Radostin Stoyanov <[email protected]>

Fix typo in comment

50db2be

Signed-off-by: Shubham Verma <[email protected]>

non-root: Don't dump socket option SO_MARK if 0

318ff08

Restoring SO_MARK requires root or CAP_NET_ADMIN. If the value is 0 we will avoid dumping it so that we don't need to do a privileged call on restore. Signed-off-by: Younes Manton <[email protected]>

sockets: tiny style fix

6e11e7f

Signed-off-by: Andrei Vagin <[email protected]>

test/javaTests: update org.testng:testng (Maven)

008c2b9

TestNG is vulnerable to Path Traversal Fixes https://github.com/checkpoint-restore/criu/security/dependabot/1. Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Radostin Stoyanov <[email protected]>

avagin reviewed Jan 2, 2023

View reviewed changes

github-actions bot added the stale-pr label Feb 2, 2023

avagin force-pushed the criu-dev branch 2 times, most recently from f392ea1 to 4d137b8 Compare June 12, 2023 06:33

avagin force-pushed the criu-dev branch from 266f150 to f5dec05 Compare October 31, 2024 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix timerfd restore problem with double parsing #2030

Fix timerfd restore problem with double parsing #2030

hejingxian123 commented Dec 26, 2022

avagin Jan 2, 2023

avagin commented Jan 2, 2023

github-actions bot commented Feb 2, 2023

Fix timerfd restore problem with double parsing #2030

Are you sure you want to change the base?

Fix timerfd restore problem with double parsing #2030

Conversation

hejingxian123 commented Dec 26, 2022

avagin Jan 2, 2023

Choose a reason for hiding this comment

avagin commented Jan 2, 2023

github-actions bot commented Feb 2, 2023