Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

scx: Don't wait for a work item before scheduling is restored in scx_… #141

Merged
merged 1 commit into from
Feb 9, 2024

Conversation

htejun
Copy link
Collaborator

@htejun htejun commented Feb 9, 2024

…ops_disable_workfn()

When scx_ops_disable_workfn() invoked to disable a BPF scheduler, it cannot depend on the scheduler working and thus can't depend on !RT tasks making forward progress, so it performs a series of non-blocking operations to restore forward progress guarantee and then kicks out the BPF scheduler.

Watchdog code added cancel_delayed_work_sync() in scx_ops_disable_workfn() before forward progress guarantee is restored. cancel_delayed_work_sync() implies flush_work() if the target work item is already executing and that work item may not be able to run due to malfunctioning scheduling, making the system stuck and unrecoverable.

There's no need to shutdown the watchdog timer early. Move cancel_delayed_work_sync() to later in the disable process where all the critical operaitons are complete and the kernel default scheduling is restored.

…ops_disable_workfn()

When scx_ops_disable_workfn() invoked to disable a BPF scheduler, it cannot
depend on the scheduler working and thus can't depend on !RT tasks making
forward progress, so it performs a series of non-blocking operations to
restore forward progress guarantee and then kicks out the BPF scheduler.

Watchdog code added cancel_delayed_work_sync() in scx_ops_disable_workfn()
before forward progress guarantee is restored. cancel_delayed_work_sync()
implies flush_work() if the target work item is already executing and that
work item may not be able to run due to malfunctioning scheduling, making
the system stuck and unrecoverable.

There's no need to shutdown the watchdog timer early. Move
cancel_delayed_work_sync() to later in the disable process where all the
critical operaitons are complete and the kernel default scheduling is
restored.

Signed-off-by: Tejun Heo <[email protected]>
@htejun htejun requested review from arighi and Byte-Lab February 9, 2024 19:02
Copy link
Collaborator

@arighi arighi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested & approved.

Minor nit in the commit message: operaitons -> operations

Thanks!

Copy link
Collaborator

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We probably should have put it here in the first place to keep things simple. Thanks for fixing

@Byte-Lab Byte-Lab merged commit 7caf4d3 into sched_ext Feb 9, 2024
1 check passed
@Byte-Lab
Copy link
Collaborator

Byte-Lab commented Feb 9, 2024

LGTM, tested & approved.

Minor nit in the commit message: operaitons -> operations

Thanks!

Argh, sorry Andrea, I merged before I saw your nit. Looks like we acked at exactly the same time.

@arighi
Copy link
Collaborator

arighi commented Feb 9, 2024

LGTM, tested & approved.
Minor nit in the commit message: operaitons -> operations
Thanks!

Argh, sorry Andrea, I merged before I saw your nit. Looks like we acked at exactly the same time.

No worries, nobody will ever notice that :)

htejun pushed a commit that referenced this pull request Jun 17, 2024
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:

test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120  224  260 PASS

test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1  23 PASS

test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS

Deployment and structure
------------------------
The related codes are added to "arch/arc/net":

- bpf_jit.h       -- The interface that a back-end translator must provide
- bpf_jit_core.c  -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic

The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".

1. Normal pass       # The necessary pass
     1a. Dry run       # Get the whole JIT length, epilogue offset, etc.
     1b. Emit phase    # Allocate memory and start emitting instructions
2. Extra pass        # Only needed if there are relocations to be fixed
     2a. Patch relocations

Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:

- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)

The result of "test_bpf" test suite on an HSDK board is:

hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf

  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:

  .-----------.------------.-------------.
  | test type |   opcodes  | # of cases  |
  |-----------+------------+-------------|
  | atomic    | 0xC3, 0xDB |         149 |
  | div64     | 0x37, 0x3F |          22 |
  | mod64     | 0x97, 0x9F |          15 |
  `-----------^------------+-------------|
                           | (total) 186 |
                           `-------------'

Setup: build config
-------------------
The following configs must be set to have a working JIT test:

  CONFIG_BPF_JIT=y
  CONFIG_BPF_JIT_ALWAYS_ON=y
  CONFIG_TEST_BPF=m

The following options are not necessary for the tests module,
but are good to have:

  CONFIG_DEBUG_INFO=y             # prerequisite for below
  CONFIG_DEBUG_INFO_BTF=y         # so bpftool can generate vmlinux.h

  CONFIG_FTRACE=y                 #
  CONFIG_BPF_SYSCALL=y            # all these options lead to
  CONFIG_KPROBE_EVENTS=y          # having CONFIG_BPF_EVENTS=y
  CONFIG_PERF_EVENTS=y            #

Some BPF programs provide data through /sys/kernel/debug:
  CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug

Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.

[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7

Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:

pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats

Or else, the build will fail:

$ make V=1
  ...
  BTF     .btf.vmlinux.bin.o
pahole -J --btf_gen_floats                    \
       -j --lang_exclude=rust                 \
       --skip_encoding_btf_inconsistent_proto \
       --btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
  ...
  BTFIDS  vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available

This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.

Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
      ...
      test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
      test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support

Signed-off-by: Shahab Vahedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants