Commit Graph

42188 Commits

Author SHA1 Message Date
Xu Kuohai
04d8243b1f selftests/bpf: Add verifier tests for bpf lsm
Add verifier tests to check bpf lsm return values and disabled hooks.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-10-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 13:09:45 -07:00
Xu Kuohai
d463dd9c9a selftests/bpf: Add test for lsm tail call
Add test for lsm tail call to ensure tail call can only be used between
bpf lsm progs attached to the same hook.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-9-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 13:09:41 -07:00
Xu Kuohai
2b23b6c0f0 selftests/bpf: Add return value checks for failed tests
The return ranges of some bpf lsm test progs can not be deduced by
the verifier accurately. To avoid erroneous rejections, add explicit
return value checks for these progs.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-8-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 13:09:37 -07:00
Xu Kuohai
4dc7556490 selftests/bpf: Avoid load failure for token_lsm.c
The compiler optimized the two bpf progs in token_lsm.c to make return
value from the bool variable in the "return -1" path, causing an
unexpected rejection:

0: R1=ctx() R10=fp0
; int BPF_PROG(bpf_token_capable, struct bpf_token *token, int cap) @ bpf_lsm.c:17
0: (b7) r6 = 0                        ; R6_w=0
; if (my_pid == 0 || my_pid != (bpf_get_current_pid_tgid() >> 32)) @ bpf_lsm.c:19
1: (18) r1 = 0xffffc9000102a000       ; R1_w=map_value(map=bpf_lsm.bss,ks=4,vs=5)
3: (61) r7 = *(u32 *)(r1 +0)          ; R1_w=map_value(map=bpf_lsm.bss,ks=4,vs=5) R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
4: (15) if r7 == 0x0 goto pc+11       ; R7_w=scalar(smin=umin=umin32=1,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
5: (67) r7 <<= 32                     ; R7_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
6: (c7) r7 s>>= 32                    ; R7_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
7: (85) call bpf_get_current_pid_tgid#14      ; R0=scalar()
8: (77) r0 >>= 32                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
9: (5d) if r0 != r7 goto pc+6         ; R0_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0; 0x7fffffff)) R7=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0; 0x7fffffff))
; if (reject_capable) @ bpf_lsm.c:21
10: (18) r1 = 0xffffc9000102a004      ; R1_w=map_value(map=bpf_lsm.bss,ks=4,vs=5,off=4)
12: (71) r6 = *(u8 *)(r1 +0)          ; R1_w=map_value(map=bpf_lsm.bss,ks=4,vs=5,off=4) R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
;  @ bpf_lsm.c:0
13: (87) r6 = -r6                     ; R6_w=scalar()
14: (67) r6 <<= 56                    ; R6_w=scalar(smax=0x7f00000000000000,umax=0xff00000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xff00000000000000))
15: (c7) r6 s>>= 56                   ; R6_w=scalar(smin=smin32=-128,smax=smax32=127)
; int BPF_PROG(bpf_token_capable, struct bpf_token *token, int cap) @ bpf_lsm.c:17
16: (bf) r0 = r6                      ; R0_w=scalar(id=1,smin=smin32=-128,smax=smax32=127) R6_w=scalar(id=1,smin=smin32=-128,smax=smax32=127)
17: (95) exit
At program exit the register R0 has smin=-128 smax=127 should have been in [-4095, 0]

To avoid this failure, change the variable type from bool to int.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-7-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 13:09:34 -07:00
Martin KaFai Lau
4009c95fed selftests/bpf: Ensure the unsupported struct_ops prog cannot be loaded
There is an existing "bpf_tcp_ca/unsupp_cong_op" test to ensure
the unsupported tcp-cc "get_info" struct_ops prog cannot be loaded.

This patch adds a new test in the bpf_testmod such that the
unsupported ops test does not depend on other kernel subsystem
where its supporting ops may be changed in the future.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240722183049.2254692-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 13:09:10 -07:00
Martin KaFai Lau
e44b4fc40c selftests/bpf: Fix the missing tramp_1 to tramp_40 ops in cfi_stubs
The tramp_1 to tramp_40 ops is not set in the cfi_stubs in the
bpf_testmod_ops. It fails the struct_ops_multi_pages test after
retiring the unsupported_ops in the earlier patch.

This patch initializes them in a loop during the bpf_testmod_init().

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240722183049.2254692-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 13:08:56 -07:00
Tao Chen
0d7c06125c bpftool: Add document for net attach/detach on tcx subcommand
This commit adds sample output for net attach/detach on
tcx subcommand.

Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240721144252.96264-1-chen.dylane@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:54:08 -07:00
Tao Chen
4f88dde0e1 bpftool: Add bash-completion for tcx subcommand
This commit adds bash-completion for attaching tcx program on interface.

Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240721144238.96246-1-chen.dylane@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:54:04 -07:00
Tao Chen
3b9d4fee8a bpftool: Add net attach/detach command to tcx prog
Now, attach/detach tcx prog supported in libbpf, so we can add new
command 'bpftool attach/detach tcx' to attach tcx prog with bpftool
for user.

 # bpftool prog load tc_prog.bpf.o /sys/fs/bpf/tc_prog
 # bpftool prog show
	...
	192: sched_cls  name tc_prog  tag 187aeb611ad00cfc  gpl
	loaded_at 2024-07-11T15:58:16+0800  uid 0
	xlated 152B  jited 97B  memlock 4096B  map_ids 100,99,97
	btf_id 260
 # bpftool net attach tcx_ingress name tc_prog dev lo
 # bpftool net
	...
	tc:
	lo(1) tcx/ingress tc_prog prog_id 29

 # bpftool net detach tcx_ingress dev lo
 # bpftool net
	...
	tc:
 # bpftool net attach tcx_ingress name tc_prog dev lo
 # bpftool net
	tc:
	lo(1) tcx/ingress tc_prog prog_id 29

Test environment: ubuntu_22_04, 6.7.0-060700-generic

Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240721144221.96228-1-chen.dylane@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:54:00 -07:00
Tao Chen
b7264f87f7 bpftool: Refactor xdp attach/detach type judgment
This commit no logical changed, just increases code readability and
facilitates TCX prog expansion, which will be implemented in the next
patch.

Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240721143353.95980-2-chen.dylane@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:53:50 -07:00
Leon Hwang
b83b936f3e selftests/bpf: Add testcases for tailcall hierarchy fixing
Add some test cases to confirm the tailcall hierarchy issue has been fixed.

On x64, the selftests result is:

cd tools/testing/selftests/bpf && ./test_progs -t tailcalls
327/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
327/19  tailcalls/tailcall_bpf2bpf_hierarchy_fentry:OK
327/20  tailcalls/tailcall_bpf2bpf_hierarchy_fexit:OK
327/21  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_fexit:OK
327/22  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_entry:OK
327/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
327/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
327     tailcalls:OK
Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED

On arm64, the selftests result is:

cd tools/testing/selftests/bpf && ./test_progs -t tailcalls
327/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
327/19  tailcalls/tailcall_bpf2bpf_hierarchy_fentry:OK
327/20  tailcalls/tailcall_bpf2bpf_hierarchy_fexit:OK
327/21  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_fexit:OK
327/22  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_entry:OK
327/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
327/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
327     tailcalls:OK
Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Link: https://lore.kernel.org/r/20240714123902.32305-4-hffilwlqm@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:53:42 -07:00
Eduard Zingerman
cfbf25481d selftests/bpf: Update comments find_equal_scalars->sync_linked_regs
find_equal_scalars() is renamed to sync_linked_regs(),
this commit updates existing references in the selftests comments.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-5-eddyz87@gmail.com
2024-07-29 12:53:24 -07:00
Eduard Zingerman
bebc17b1c0 selftests/bpf: Tests for per-insn sync_linked_regs() precision tracking
Add a few test cases to verify precision tracking for scalars gaining
range because of sync_linked_regs():
- check what happens when more than 6 registers might gain range in
  sync_linked_regs();
- check if precision is propagated correctly when operand of
  conditional jump gained range in sync_linked_regs() and one of
  linked registers is marked precise;
- check if precision is propagated correctly when operand of
  conditional jump gained range in sync_linked_regs() and a
  other-linked operand of the conditional jump is marked precise;
- add a minimized reproducer for precision tracking bug reported in [0];
- Check that mark_chain_precision() for one of the conditional jump
  operands does not trigger equal scalars precision propagation.

[0] https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-4-eddyz87@gmail.com
2024-07-29 12:53:17 -07:00
Eduard Zingerman
842edb5507 bpf: Remove mark_precise_scalar_ids()
Function mark_precise_scalar_ids() is superseded by
bt_sync_linked_regs() and equal scalars tracking in jump history.
mark_precise_scalar_ids() propagates precision over registers sharing
same ID on parent/child state boundaries, while jump history records
allow bt_sync_linked_regs() to propagate same information with
instruction level granularity, which is strictly more precise.

This commit removes mark_precise_scalar_ids() and updates test cases
in progs/verifier_scalar_ids to reflect new verifier behavior.

The tests are updated in the following manner:
- mark_precise_scalar_ids() propagated precision regardless of
  presence of conditional jumps, while new jump history based logic
  only kicks in when conditional jumps are present.
  Hence test cases are augmented with conditional jumps to still
  trigger precision propagation.
- As equal scalars tracking no longer relies on parent/child state
  boundaries some test cases are no longer interesting,
  such test cases are removed, namely:
  - precision_same_state and precision_cross_state are superseded by
    linked_regs_bpf_k;
  - precision_same_state_broken_link and equal_scalars_broken_link
    are superseded by linked_regs_broken_link.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-3-eddyz87@gmail.com
2024-07-29 12:53:14 -07:00
Eduard Zingerman
4bf79f9be4 bpf: Track equal scalars history on per-instruction level
Use bpf_verifier_state->jmp_history to track which registers were
updated by find_equal_scalars() (renamed to collect_linked_regs())
when conditional jump was verified. Use recorded information in
backtrack_insn() to propagate precision.

E.g. for the following program:

            while verifying instructions
  1: r1 = r0              |
  2: if r1 < 8  goto ...  | push r0,r1 as linked registers in jmp_history
  3: if r0 > 16 goto ...  | push r0,r1 as linked registers in jmp_history
  4: r2 = r10             |
  5: r2 += r0             v mark_chain_precision(r0)

            while doing mark_chain_precision(r0)
  5: r2 += r0             | mark r0 precise
  4: r2 = r10             |
  3: if r0 > 16 goto ...  | mark r0,r1 as precise
  2: if r1 < 8  goto ...  | mark r0,r1 as precise
  1: r1 = r0              v

Technically, do this as follows:
- Use 10 bits to identify each register that gains range because of
  sync_linked_regs():
  - 3 bits for frame number;
  - 6 bits for register or stack slot number;
  - 1 bit to indicate if register is spilled.
- Use u64 as a vector of 6 such records + 4 bits for vector length.
- Augment struct bpf_jmp_history_entry with a field 'linked_regs'
  representing such vector.
- When doing check_cond_jmp_op() remember up to 6 registers that
  gain range because of sync_linked_regs() in such a vector.
- Don't propagate range information and reset IDs for registers that
  don't fit in 6-value vector.
- Push a pair {instruction index, linked registers vector}
  to bpf_verifier_state->jmp_history.
- When doing backtrack_insn() check if any of recorded linked
  registers is currently marked precise, if so mark all linked
  registers as precise.

This also requires fixes for two test_verifier tests:
- precise: test 1
- precise: test 2

Both tests contain the following instruction sequence:

19: (bf) r2 = r9                      ; R2=scalar(id=3) R9=scalar(id=3)
20: (a5) if r2 < 0x8 goto pc+1        ; R2=scalar(id=3,umin=8)
21: (95) exit
22: (07) r2 += 1                      ; R2_w=scalar(id=3+1,...)
23: (bf) r1 = r10                     ; R1_w=fp0 R10=fp0
24: (07) r1 += -8                     ; R1_w=fp-8
25: (b7) r3 = 0                       ; R3_w=0
26: (85) call bpf_probe_read_kernel#113

The call to bpf_probe_read_kernel() at (26) forces r2 to be precise.
Previously, this forced all registers with same id to become precise
immediately when mark_chain_precision() is called.
After this change, the precision is propagated to registers sharing
same id only when 'if' instruction is backtracked.
Hence verification log for both tests is changed:
regs=r2,r9 -> regs=r2 for instructions 25..20.

Fixes: 904e6ddf41 ("bpf: Use scalar ids in mark_chain_precision()")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240718202357.1746514-2-eddyz87@gmail.com

Closes: https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/
2024-07-29 12:53:10 -07:00
Ihor Solodrai
844f7315e7 selftests/bpf: Use auto-dependencies for test objects
Make use of -M compiler options when building .test.o objects to
generate .d files and avoid re-building all tests every time.

Previously, if a single test bpf program under selftests/bpf/progs/*.c
has changed, make would rebuild all the *.bpf.o, *.skel.h and *.test.o
objects, which is a lot of unnecessary work.

A typical dependency chain is:
progs/x.c -> x.bpf.o -> x.skel.h -> x.test.o -> trunner_binary

However for many tests it's not a 1:1 mapping by name, and so far
%.test.o have been simply dependent on all %.skel.h files, and
%.skel.h files on all %.bpf.o objects.

Avoid full rebuilds by instructing the compiler (via -MMD) to
produce *.d files with real dependencies, and appropriately including
them. Exploit make feature that rebuilds included makefiles if they
were changed by setting %.test.d as prerequisite for %.test.o files.

A couple of examples of compilation time speedup (after the first
clean build):

$ touch progs/verifier_and.c && time make -j8
Before: real	0m16.651s
After:  real	0m2.245s
$ touch progs/read_vsyscall.c && time make -j8
Before: real	0m15.743s
After:  real	0m1.575s

A drawback of this change is that now there is an overhead due to make
processing lots of .d files, which potentially may slow down unrelated
targets. However a time to make all from scratch hasn't changed
significantly:

$ make clean && time make -j8
Before: real	1m31.148s
After:  real	1m30.309s

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/VJihUTnvtwEgv_mOnpfy7EgD9D2MPNoHO-MlANeLIzLJPGhDeyOuGKIYyKgk0O6KPjfM-MuhtvPwZcngN8WFqbTnTRyCSMc2aMZ1ODm1T_g=@pm.me
2024-07-29 12:53:07 -07:00
Geliang Tang
c70b2d9027 selftests/bpf: Add connect_to_addr_str helper
Similar to connect_to_addr() helper for connecting to a server with the
given sockaddr_storage type address, this patch adds a new helper named
connect_to_addr_str() for connecting to a server with the given string
type address "addr_str", together with its "family" and "port" as other
parameters of connect_to_addr_str().

In connect_to_addr_str(), the parameters "family", "addr_str" and "port"
are used to create a sockaddr_storage type address "addr" by invoking
make_sockaddr(). Then pass this "addr" together with "addrlen", "type"
and "opts" to connect_to_addr().

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/647e82170831558dbde132a7a3d86df660dba2c4.1721282219.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:52:51 -07:00
Geliang Tang
e1ee5a48b5 selftests/bpf: Drop must_fail from network_helper_opts
The struct member "must_fail" of network_helper_opts() is only used in
cgroup_v1v2 tests, it makes sense to drop it from network_helper_opts.

Return value (fd) of connect_to_fd_opts() and the expect errno (EPERM)
can be checked in cgroup_v1v2.c directly, no need to check them in
connect_fd_to_addr() in network_helpers.c.

This also makes connect_fd_to_addr() function useless. It can be replaced
by connect().

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/3faf336019a9a48e2e8951f4cdebf19e3ac6e441.1721282219.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:52:45 -07:00
Geliang Tang
a63507f3b1 selftests/bpf: Drop type of connect_to_fd_opts
The "type" parameter of connect_to_fd_opts() is redundant of "server_fd".
Since the "type" can be obtained inside by invoking getsockopt(SO_TYPE),
without passing it in as a parameter.

This patch drops the "type" parameter of connect_to_fd_opts() and updates
its callers.

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/50d8ce7ab7ab0c0f4d211fc7cc4ebe3d3f63424c.1721282219.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29 12:52:24 -07:00
Linus Torvalds
1722389b0d A lot of networking people were at a conference last week, busy
catching COVID, so relatively short PR. Including fixes from bpf
 and netfilter.
 
 Current release - regressions:
 
  - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
 
 Current release - new code bugs:
 
  - l2tp: protect session IDR and tunnel session list with one lock,
    make sure the state is coherent to avoid a warning
 
  - eth: bnxt_en: update xdp_rxq_info in queue restart logic
 
  - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
 
 Previous releases - regressions:
 
  - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
    the field reuses previously un-validated pad
 
 Previous releases - always broken:
 
  - tap/tun: drop short frames to prevent crashes later in the stack
 
  - eth: ice: add a per-VF limit on number of FDIR filters
 
  - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaibxAACgkQMUZtbf5S
 IruuIRAAu96TiN/urPwmKznyb/Sk8x7p8iUzn6OvPS/TUlFUkURQtOh6M9uvbpN4
 x/L//EWkMR0hY4SkBegoiXfb1GS0PjBdWTWUiROm5X9nVHqp5KRZAxWXhjFiS1BO
 BIYOT+JfCl7mQiPs90Mys/cEtYOggMBsCZQVIGw/iYoJLFREqxFSONwa0dG+tGMX
 jn9WNu4yCVDhJ/jtl2MaTsCNtYUaBUgYrKHJBfNGfJ2Lz/7rH9yFui2WSMlmOd/U
 QGeCb1DWURlShlCqY37wNinbFsxWkI5JN00ukTtwFAXLIaqc+zgHcIjrDjTJwK43
 F4tKbJT3+bmehMU/h3Uo3c7DhXl7n9zDGiDtbCxnkykp0sFGJpjhDrWydo51c+YB
 qW5HaNrII2LiDicOVN8L29ylvKp7AEkClxgivEhZVGGk2f/szJRXfp9u3WBn5kAx
 3paH55YN0DEsKbYbb1ZENEI1Vnc/4ff4PxZJCUNKwzcS8wCn1awqwcriK9TjS/cp
 fjilNFT4J3/uFrodHWTkx0jJT6UJFT0aF03qPLUH/J5kG+EVukOf1jBPInNdf1si
 1j47SpblHUe86HiHphFMt32KZ210lJzWxh8uGma57Y2sB9makdLiK4etrFjkiMJJ
 Z8A3kGp3KpFjbuK4tHY25rp+5oxLNNOBNpay29lQrWtCL/NDcaQ=
 =9OsH
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  A lot of networking people were at a conference last week, busy
  catching COVID, so relatively short PR.

  Current release - regressions:

   - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP

  Current release - new code bugs:

   - l2tp: protect session IDR and tunnel session list with one lock,
     make sure the state is coherent to avoid a warning

   - eth: bnxt_en: update xdp_rxq_info in queue restart logic

   - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field

  Previous releases - regressions:

   - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
     the field reuses previously un-validated pad

  Previous releases - always broken:

   - tap/tun: drop short frames to prevent crashes later in the stack

   - eth: ice: add a per-VF limit on number of FDIR filters

   - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"

* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  tun: add missing verification for short frame
  tap: add missing verification for short frame
  mISDN: Fix a use after free in hfcmulti_tx()
  gve: Fix an edge case for TSO skb validity check
  bnxt_en: update xdp_rxq_info in queue restart logic
  tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
  MAINTAINERS: make Breno the netconsole maintainer
  MAINTAINERS: Update bonding entry
  net: nexthop: Initialize all fields in dumped nexthops
  net: stmmac: Correct byte order of perfect_match
  selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
  tipc: Return non-zero value from tipc_udp_addr2str() on error
  netfilter: nft_set_pipapo_avx2: disable softinterrupts
  ice: Fix recipe read procedure
  ice: Add a per-VF limit on number of FDIR filters
  net: bonding: correctly annotate RCU in bond_should_notify_peers()
  ...
2024-07-25 13:32:25 -07:00
Jakub Kicinski
f7578df913 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZqIl1AAKCRDbK58LschI
 g/MdAP9oyZV9/IZ6Y6Z1fWfio0SB+yJGugcwbFjWcEtNrzsqJQEAwipQnemAI4NC
 HBMfK2a/w7vhAFMXrP/SbkB/gUJJ7QE=
 =vovf
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-07-25

We've added 14 non-merge commits during the last 8 day(s) which contain
a total of 19 files changed, 177 insertions(+), 70 deletions(-).

The main changes are:

1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
   BPF sockhash. Also add test coverage for this case, from Michal Luczaj.

2) Fix a segmentation issue when downgrading gso_size in the BPF helper
   bpf_skb_adjust_room(), from Fred Li.

3) Fix a compiler warning in resolve_btfids due to a missing type cast,
   from Liwei Song.

4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
   boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.

5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
   from Stanislav Fomichev.

6) Fix function prototype BTF dumping in libbpf for prototypes that have
   no input arguments, from Andrii Nakryiko.

7) Fix stacktrace symbol resolution in perf script for BPF programs
   containing subprograms, from Hou Tao.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
  bpf, events: Use prog to emit ksymbol event for main program
  selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
  selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
  selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
  af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
  bpftool: Fix typo in usage help
  libbpf: Fix no-args func prototype BTF dumping syntax
  MAINTAINERS: Update powerpc BPF JIT maintainers
  MAINTAINERS: Update email address of Naveen
  selftests/bpf: fexit_sleep: Fix stack allocation for arm64
====================

Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25 07:40:25 -07:00
Stanislav Fomichev
9b9969c40b selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
This flag is now required to use tx_metadata_len.

Fixes: 40808a237d ("selftests/bpf: Add TX side to xdp_metadata")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me
2024-07-25 11:57:33 +02:00
Linus Torvalds
7a3fad30fd Random number generator updates for Linux 6.11-rc1.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmaarzgACgkQSfxwEqXe
 A66ZWBAAlhXx8bve0uKlDRK8fffWHgruho/fOY4lZJ137AKwA9JCtmOyqdfL4Dmk
 VxFe7pEQJlQhcA/6kH54uO7SBXwfKlKZJth6SYnaCRMUIbFifHjjIQ0QqldjEKi0
 rP90Hu4FVsbwQC7u9i9lQj9n2P36zb6pn83BzpZQ/2PtoVCSCrdSJUe0Rxa3H3GN
 0+nNkDSXQt5otCByLaeE3x7KJgXLWL9+G2eFSFLTZ8rSVfMx1CdOIAG37WlLGdWm
 BaFYPDKMyBTVvVJBNgAe9YSqtrsZ5nlmLz+Z9wAe/hTL7RlL03kWUu34/Udcpull
 zzMDH0WMntiGK3eFQ2gOYSWqypvAjwHgn3BzqNmjUb69+89mZsdU1slcvnxWsUwU
 D3vphrscaqarF629tfsXti3jc5PoXwUTjROZVcCyeFPBhyAZgzK8xUvPpJO+RT+K
 EuUABob9cpA6FCpW/QeolDmMDhXlNT8QgsZu1juokZac2xP3Ly3REyEvT7HLbU2W
 ZJjbEqm1ppp3RmGELUOJbyhwsLrnbt+OMDO7iEWoG8aSFK4diBK/ZM6WvLMkr8Oi
 7ioXGIsYkCy3c47wpZKTrAapOPJp5keqNAiHSEbXw8mozp6429QAEZxNOcczgHKC
 Ea2JzRkctqutcIT+Slw/uUe//i1iSsIHXbE81fp5udcQTJcUByo=
 =P8aI
 -----END PGP SIGNATURE-----

Merge tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator updates from Jason Donenfeld:
 "This adds getrandom() support to the vDSO.

  First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which
  lets the kernel zero out pages anytime under memory pressure, which
  enables allocating memory that never gets swapped to disk but also
  doesn't count as being mlocked.

  Then, the vDSO implementation of getrandom() is introduced in a
  generic manner and hooked into random.c.

  Next, this is implemented on x86. (Also, though it's not ready for
  this pull, somebody has begun an arm64 implementation already)

  Finally, two vDSO selftests are added.

  There are also two housekeeping cleanup commits"

* tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  MAINTAINERS: add random.h headers to RNG subsection
  random: note that RNDGETPOOL was removed in 2.6.9-rc2
  selftests/vDSO: add tests for vgetrandom
  x86: vdso: Wire up getrandom() vDSO implementation
  random: introduce generic vDSO getrandom() implementation
  mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-24 10:29:50 -07:00
Linus Torvalds
d1e9a63dcd vfs-6.11-rc1.fixes.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZqDFUwAKCRCRxhvAZXjc
 omD6APwJKlepwDYlu5XZptI6/1kmai6SqaYnifTX1+ELR/rQQAD/Z37aho42v2JZ
 NYr+KFj02vj7ryKA5OWuSD8cw+6GlwQ=
 =dfob
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.11-rc1.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "VFS:

   - The new 64bit mount ids start after the old mount id, i.e., at the
     first non-32 bit value. However, we started counting one id too
     late and thus lost 4294967296 as the first valid id. Fix that.

   - Update a few comments on some vfs_*() creation helpers.

   - Move copying of the xattr name out from the locks required to start
     a filesystem write.

   - Extend the filelock lock UAF fix to the compat code as well.

   - Now that we added the ability to look up an inode under RCU it's
     possible that lockless hash lookup can find and lock an inode after
     it gets I_FREEING set. It then waits until inode teardown in
     evict() is finished.

     The flag however is still set after evict() has woken up all
     waiters. If the inode lock is taken late enough on the waiting side
     after hash removal and wakeup happened the waiting thread will
     never be woken.

     Before RCU based lookup this was synchronized via the
     inode_hash_lock. But since unhashing requires the inode lock as
     well we can check whether the inode is unhashed while holding inode
     lock even without holding inode_hash_lock.

  pidfd:

   - The nsproxy structure contains nearly all of the namespaces
     associated with a task. When a namespace type isn't supported
     nsproxy might contain a NULL pointer or always point to the initial
     namespace type. The logic isn't consistent. So when deriving
     namespace fds we need to ensure that the namespace type is
     supported.

     First, so that we don't risk dereferncing NULL pointers. The
     correct bigger fix would be to change all namespaces to always set
     a valid namespace pointer in struct nsproxy independent of whether
     or not it is compiled in. But that requires quite a few changes.

     Second, so that we don't allow deriving namespace fds when the
     namespace type doesn't exist and thus when they couldn't also be
     derived via /proc/self/ns/.

   - Add missing selftests for the new pidfd ioctls to derive namespace
     fds. This simply extends the already existing testsuite.

  netfs:

   - Fix debug logging and fix kconfig variable name so it actually
     works.

   - Fix writeback that goes both to the server and cache. The streams
     are only activated once a subreq is added. When a server write
     happens the subreq doesn't need to have finished by the time the
     cache write is started. If the server write has already finished by
     the time the cache write is about to start the cache write will
     operate on a folio that might already have been reused. Fix this by
     preactivating the cache write.

   - Limit cachefiles subreq size for cache writes to MAX_RW_COUNT"

* tag 'vfs-6.11-rc1.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  inode: clarify what's locked
  vfs: Fix potential circular locking through setxattr() and removexattr()
  filelock: Fix fcntl/close race recovery compat path
  fs: use all available ids
  cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT
  netfs: Fix writeback that needs to go to both server and cache
  pidfs: add selftests for new namespace ioctls
  pidfs: handle kernels without namespaces cleanly
  pidfs: when time ns disabled add check for ioctl
  vfs: correct the comments of vfs_*() helpers
  vfs: handle __wait_on_freeing_inode() and evict() race
  netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG
  netfs: Revert "netfs: Switch debug logging to pr_debug()"
2024-07-24 09:42:51 -07:00
Hangbin Liu
863ff546fb selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
If the testing kernel doesn't support setting fdb_max_learned or show
fdb_n_learned, just skip it. Or we will get errors like

./bridge_fdb_learning_limit.sh: line 218: [: null: integer expression expected
./bridge_fdb_learning_limit.sh: line 225: [: null: integer expression expected

Fixes: 6f84090333 ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-24 12:50:28 +01:00
Christian Brauner
1bb8dce5df
pidfs: add selftests for new namespace ioctls
Add selftests to verify that deriving namespace file descriptors from
pidfd file descriptors works correctly.

Link: https://lore.kernel.org/r/20240722-work-pidfs-69dbea91edab@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24 10:53:13 +02:00
Linus Torvalds
786c8248db perf tools fixes for v6.11
Two fixes about building perf and other tools:
 
 * Fix breakage in tracing tools due to pkg-config for libtrace{event,fs}
 
 * Fix build of perf when libunwind is used
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZqA1SAAKCRCMstVUGiXM
 g3TfAQDXLi+XcSDE/u5JcDN3H6+bXvavDn2k8Gsd6vWZQc5LEQD3X1E+GbtWTQsE
 ruk5ZT3voy8qBPgmrUg72NJwmRxYAQ==
 =0RKR
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
 "Two fixes for building perf and other tools:

   - Fix breakage in tracing tools due to pkg-config for
     libtrace{event,fs}

   - Fix build of perf when libunwind is used"

* tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf dso: Fix build when libunwind is enabled
  tools/latency: Use pkg-config in lib_setup of Makefile.config
  tools/rtla: Use pkg-config in lib_setup of Makefile.config
  tools/verification: Use pkg-config in lib_setup of Makefile.config
  tools: Make pkg-config dependency checks usable by other tools
  perf build: Warn if libtracefs is not found
2024-07-23 18:15:51 -07:00
Linus Torvalds
ca83c61cb3 Kbuild updates for v6.11
- Remove tristate choice support from Kconfig
 
  - Stop using the PROVIDE() directive in the linker script
 
  - Reduce the number of links for the combination of CONFIG_DEBUG_INFO_BTF
    and CONFIG_KALLSYMS
 
  - Enable the warning for symbol reference to .exit.* sections by default
 
  - Fix warnings in RPM package builds
 
  - Improve scripts/make_fit.py to generate a FIT image with separate base
    DTB and overlays
 
  - Improve choice value calculation in Kconfig
 
  - Fix conditional prompt behavior in choice in Kconfig
 
  - Remove support for the uncommon EMAIL environment variable in Debian
    package builds
 
  - Remove support for the uncommon "name <email>" form for the DEBEMAIL
    environment variable
 
  - Raise the minimum supported GNU Make version to 4.0
 
  - Remove stale code for the absolute kallsyms
 
  - Move header files commonly used for host programs to scripts/include/
 
  - Introduce the pacman-pkg target to generate a pacman package used in
    Arch Linux
 
  - Clean up Kconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmagBLUVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGmoUQAJ8pnURs0g+Rcyk6bdY/qtXBYkS+
 nXpIK1ssFgRRgAQdeszYtvBqLFzb0wRCSie87G1AriD/JkVVTjCCY1For1y+vs0u
 a7HfxitHhZpPyZW/T+WMQ3LViNccpkx+DFAcoRH8xOY/XPEJKVUby332jOIXMuyg
 +NKIELQJVsLhcDofTUGb5VfIQektw219n5c4jKjXdNk4ZtE24xCRM5X528ZebwWJ
 RZhMvJ968PyIH1IRXvNt6dsKBxoGIwPP8IO6yW9hzHaNsBqt7MGSChSel7r1VKpk
 iwCNApJvEiVBe5wvTSVOVro7/8p/AZ70CQAqnMJV+dNnRqtGqW7NvL6XAjZRJgJJ
 Uxe5NSrXgQd3FtqfcbXLetBgp9zGVt328nHm1HXHR5rFsvoOiTvO7hHPbhA+OoWJ
 fs+jHzEXdAMRgsNrczPWU5Svq6MgGe4v8HBf0m8N1Uy65t/O+z9ti2QAw7kIFlbu
 /VSFNjw4CHmNxGhnH0khCMsy85FwVIt9Ux+2d6IEc0gP8S1Qa1HgHGAoVI4U51eS
 9dxEPVJNPOugaIVHheuS3wimEO6wzaJcQHn4IXaasMA7P6Yo4G/jiGoy4cb9qPTM
 Hb+GaOltUy7vDoG4D2LSym8zR8rdKwbIf/5psdZrq/IWVKq5p+p7KWs3aOykSoM7
 o6Hb532Ioalhm8je
 =BYu7
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Remove tristate choice support from Kconfig

 - Stop using the PROVIDE() directive in the linker script

 - Reduce the number of links for the combination of CONFIG_KALLSYMS and
   CONFIG_DEBUG_INFO_BTF

 - Enable the warning for symbol reference to .exit.* sections by
   default

 - Fix warnings in RPM package builds

 - Improve scripts/make_fit.py to generate a FIT image with separate
   base DTB and overlays

 - Improve choice value calculation in Kconfig

 - Fix conditional prompt behavior in choice in Kconfig

 - Remove support for the uncommon EMAIL environment variable in Debian
   package builds

 - Remove support for the uncommon "name <email>" form for the DEBEMAIL
   environment variable

 - Raise the minimum supported GNU Make version to 4.0

 - Remove stale code for the absolute kallsyms

 - Move header files commonly used for host programs to scripts/include/

 - Introduce the pacman-pkg target to generate a pacman package used in
   Arch Linux

 - Clean up Kconfig

* tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits)
  kbuild: doc: gcc to CC change
  kallsyms: change sym_entry::percpu_absolute to bool type
  kallsyms: unify seq and start_pos fields of struct sym_entry
  kallsyms: add more original symbol type/name in comment lines
  kallsyms: use \t instead of a tab in printf()
  kallsyms: avoid repeated calculation of array size for markers
  kbuild: add script and target to generate pacman package
  modpost: use generic macros for hash table implementation
  kbuild: move some helper headers from scripts/kconfig/ to scripts/include/
  Makefile: add comment to discourage tools/* addition for kernel builds
  kbuild: clean up scripts/remove-stale-files
  kconfig: recursive checks drop file/lineno
  kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec
  kallsyms: get rid of code for absolute kallsyms
  kbuild: Create INSTALL_PATH directory if it does not exist
  kbuild: Abort make on install failures
  kconfig: remove 'e1' and 'e2' macros from expression deduplication
  kconfig: remove SYMBOL_CHOICEVAL flag
  kconfig: add const qualifiers to several function arguments
  kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups()
  ...
2024-07-23 14:32:21 -07:00
Linus Torvalds
d2d721e2eb Livepatching changes for 6.11
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmafyQEACgkQUqAMR0iA
 lPIb5Q/+Puck/TtP9W7uyFMDi6AQMm2/Ppja12OV0cpzKwvZci81mvANEJDC2gGa
 Wlg4++WVqeA6UfxsPLqDWDWI46dSHWfbMR5d9c5ZZBWh8byNLznBvgQdV04m0KaT
 567gXwRaVc3Rx4q0WkoWYn12FFvZV5IZPJbUuM4hKS1WmDrW5tPKteBoxGVujU5N
 j9BB3zMHH+cDrgB2+tasUGkEXRLQtesjVrPNQp4lJA535JwnpFk3961+y4cnaAaQ
 EqrGy/uUVk6Hl/4Q/JgnYRREmDytMMmvxJJx3FIZm6i5xLOY0y1rJ1YF0z7fL+Eu
 +WJi6Uncwo1mFUz5MvKAg2tIUilbOp3awQwxPdPFXoziF0A1mS+S6qHlz6Y8Kk4N
 kqgmfIXZeb/K+UCWX6freqqvAg/sJ0JlNIcaGOGZi7+KRwoi8US3eahIAEUxHuis
 DRnwtF9qI6YRyjfwy3INDc9Atgnhe6O/mcUpAe1TiSHgbgKT8kQgIQUWb1BwurVE
 Ey1Hpci+gpJyj1MW0f1jE08lqQ1O7EdFTJyxJvgjs1IeGA2Zgc0U2GMemSidh6qZ
 pqQnEwOfvCdVgR8W4R0Gu8IZYsyCYEawNVHixk2V+GQSdvZsPfJNvnhkqYH/IzUQ
 yXKBCZVVBwTQW6zkGI6fjeXgjVH2cEL29vDO4ChQsSH4qEZb/GA=
 =k4I2
 -----END PGP SIGNATURE-----

Merge tag 'livepatching-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching update from Petr Mladek:

 - show patch->replace flag in sysfs

 - add or improve few selftests

* tag 'livepatching-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  livepatch: Replace snprintf() with sysfs_emit()
  selftests/livepatch: Add selftests for "replace" sysfs attribute
  livepatch: Add "replace" sysfs attribute
  selftests: livepatch: Test atomic replace against multiple modules
  selftests/livepatch: define max test-syscall processes
2024-07-23 11:11:51 -07:00
Petr Mladek
ea5377ec49 Merge branch 'for-6.11/sysfs-patch-replace' into for-linus 2024-07-23 17:13:10 +02:00
Liwei Song
13c9b702e6 tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
Add a type cast for set8->pairs to fix below compile warning:

main.c: In function 'sets_patch':
main.c:699:50: warning: comparison of distinct pointer types lacks a cast
  699 |        BUILD_BUG_ON(set8->pairs != &set8->pairs[0].id);
      |                                 ^~

Fixes: 9707ac4fe2 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h")
Signed-off-by: Liwei Song <liwei.song.lsong@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240722083305.4009723-1-liwei.song.lsong@gmail.com
2024-07-22 16:35:30 +02:00
Linus Torvalds
527eff227d - In the series "treewide: Refactor heap related implementation",
Kuan-Wei Chiu has significantly reworked the min_heap library code and
   has taught bcachefs to use the new more generic implementation.
 
 - Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
   reworks the cpumask and nodemask headers to make things generally more
   rational.
 
 - Kuan-Wei Chiu has sent along some maintenance work against our sorting
   library code in the series "lib/sort: Optimizations and cleanups".
 
 - More library maintainance work from Christophe Jaillet in the series
   "Remove usage of the deprecated ida_simple_xx() API".
 
 - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
   series "nilfs2: eliminate the call to inode_attach_wb()".
 
 - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB
   command error".
 
 - Plus the usual shower of singleton patches all over the place.  Please
   see the relevant changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2GvwAKCRDdBJ7gKXxA
 jlf/AP48xP5ilIHbtpAKm2z+MvGuTxJQ5VSC0UXFacuCbc93lAEA+Yo+vOVRmh6j
 fQF2nVKyKLYfSz7yqmCyAaHWohIYLgg=
 =Stxz
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - In the series "treewide: Refactor heap related implementation",
   Kuan-Wei Chiu has significantly reworked the min_heap library code
   and has taught bcachefs to use the new more generic implementation.

 - Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
   reworks the cpumask and nodemask headers to make things generally
   more rational.

 - Kuan-Wei Chiu has sent along some maintenance work against our
   sorting library code in the series "lib/sort: Optimizations and
   cleanups".

 - More library maintainance work from Christophe Jaillet in the series
   "Remove usage of the deprecated ida_simple_xx() API".

 - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
   series "nilfs2: eliminate the call to inode_attach_wb()".

 - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix
   GDB command error".

 - Plus the usual shower of singleton patches all over the place. Please
   see the relevant changelogs for details.

* tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits)
  ia64: scrub ia64 from poison.h
  watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
  tsacct: replace strncpy() with strscpy()
  lib/bch.c: use swap() to improve code
  test_bpf: convert comma to semicolon
  init/modpost: conditionally check section mismatch to __meminit*
  init: remove unused __MEMINIT* macros
  nilfs2: Constify struct kobj_type
  nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro
  math: rational: add missing MODULE_DESCRIPTION() macro
  lib/zlib: add missing MODULE_DESCRIPTION() macro
  fs: ufs: add MODULE_DESCRIPTION()
  lib/rbtree.c: fix the example typo
  ocfs2: add bounds checking to ocfs2_check_dir_entry()
  fs: add kernel-doc comments to ocfs2_prepare_orphan_dir()
  coredump: simplify zap_process()
  selftests/fpu: add missing MODULE_DESCRIPTION() macro
  compiler.h: simplify data_race() macro
  build-id: require program headers to be right after ELF header
  resource: add missing MODULE_DESCRIPTION()
  ...
2024-07-21 17:56:22 -07:00
Linus Torvalds
fbc90c042c - 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression
   (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff).
   Yu has a fix which I'll send along later via the hotfixes branch.
 
 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.
 
 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that.  This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches.  My bad.
 
 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"
 
 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability of
   cgroup writeback"
 
 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache index".
 
 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of the
   zeropage in MAP_SHARED mappings.  I don't see any runtime effects here -
   more a cleanup/understandability/maintainablity thing.
 
 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of
   higher addresses, for aarch64.  The (poorly named) series is
   "Restructure va_high_addr_switch".
 
 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".
 
 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the
   series "Enhance soft hwpoison handling and injection".
 
 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything.  Some landed in this pull.
 
 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has
   simplified migration's use of hardware-offload memory copying.
 
 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".
 
 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code.  This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.
 
 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.
 
 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP.  By default this
   is a no-op - users must opt in vis sysfs controls.  Dramatic
   improvements in pagefault latency are realized.
 
 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".
 
 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".
 
 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".
 
 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".
 
 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances.  A 10x speedup is seen in a microbenchmark.
 
   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.
 
 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".
 
 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.
 
 - Is anyone reading this stuff?  If so, email me!
 
 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.
 
 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".
 
 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.
 
 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".
 
 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE".  It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.
 
 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.
 
 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large folio
   userspace copying.
 
 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers.  From SeongJae Park.
 
 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.
 
 - David Hildenbrand sends along more cleanups, this time against the
   migration code.  The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".
 
 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code.  He addresses this in the series "mm: Fix various
   readahead quirks".
 
 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's self
   testing code.
 
 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code.  The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this.  The series is marked cc:stable.
 
 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.
 
 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion.  The series (which also makes the memcg-v1 code
   Kconfigurable) are
 
   "mm: memcg: separate legacy cgroup v1 code and put under config
   option" and
   "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1"
 
 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.
 
 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of excessive
   correctable memory errors.  In order to permit userspace to monitor and
   handle this situation.
 
 - Kefeng Wang's series "mm: migrate: support poison recover from migrate
   folio" teaches the kernel to appropriately handle migration from
   poisoned source folios rather than simply panicing.
 
 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.
 
 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory utilization.
 
 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare
   refcount increments.  So these paes can first be moved aside if they
   reside in the movable zone or a CMA block.
 
 - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps
   for much faster reading of vma information.  The series is "query VMAs
   from /proc/<pid>/maps".
 
 - In the series "mm: introduce per-order mTHP split counters" Lance Yang
   improves the kernel's presentation of developer information related to
   multisize THP splitting.
 
 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)".  This permits
   userspace to use all available huge page sizes.
 
 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and not
   very useful feature from slab fault injection.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA
 joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3
 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8=
 =z0Lf
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.

 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that. This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches. My
   bad.

 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"

 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability
   of cgroup writeback"

 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache
   index".

 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of
   the zeropage in MAP_SHARED mappings. I don't see any runtime effects
   here - more a cleanup/understandability/maintainablity thing.

 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
   of higher addresses, for aarch64. The (poorly named) series is
   "Restructure va_high_addr_switch".

 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".

 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
   the series "Enhance soft hwpoison handling and injection".

 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything. Some landed in this pull.

 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
   has simplified migration's use of hardware-offload memory copying.

 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".

 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code. This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.

 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.

 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP. By default this
   is a no-op - users must opt in vis sysfs controls. Dramatic
   improvements in pagefault latency are realized.

 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".

 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".

 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".

 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".

 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances. A 10x speedup is seen in a microbenchmark.

   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.

 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".

 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.

 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.

 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".

 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.

 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".

 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.

 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.

 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large
   folio userspace copying.

 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers. From SeongJae Park.

 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.

 - David Hildenbrand sends along more cleanups, this time against the
   migration code. The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".

 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code. He addresses this in the series "mm: Fix various
   readahead quirks".

 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's
   self testing code.

 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code. The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this. The series is marked cc:stable.

 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.

 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion. The series (which also makes the memcg-v1 code
   Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
   under config option" and "mm: memcg: put cgroup v1-specific memcg
   data under CONFIG_MEMCG_V1"

 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.

 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of
   excessive correctable memory errors. In order to permit userspace to
   monitor and handle this situation.

 - Kefeng Wang's series "mm: migrate: support poison recover from
   migrate folio" teaches the kernel to appropriately handle migration
   from poisoned source folios rather than simply panicing.

 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.

 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory
   utilization.

 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than
   bare refcount increments. So these paes can first be moved aside if
   they reside in the movable zone or a CMA block.

 - Andrii Nakryiko has added a binary ioctl()-based API to
   /proc/pid/maps for much faster reading of vma information. The series
   is "query VMAs from /proc/<pid>/maps".

 - In the series "mm: introduce per-order mTHP split counters" Lance
   Yang improves the kernel's presentation of developer information
   related to multisize THP splitting.

 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)". This permits
   userspace to use all available huge page sizes.

 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and
   not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
  mm/mglru: fix ineffective protection calculation
  mm/zswap: fix a white space issue
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix possible recursive locking detected warning
  mm/gup: clear the LRU flag of a page before adding to LRU batch
  mm/numa_balancing: teach mpol_to_str about the balancing mode
  mm: memcg1: convert charge move flags to unsigned long long
  alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
  lib: reuse page_ext_data() to obtain codetag_ref
  lib: add missing newline character in the warning message
  mm/mglru: fix overshooting shrinker memory
  mm/mglru: fix div-by-zero in vmpressure_calc_level()
  mm/kmemleak: replace strncpy() with strscpy()
  mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
  mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
  mm: ignore data-race in __swap_writepage
  hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
  mm: shmem: rename mTHP shmem counters
  mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
  mm/migrate: putback split folios when numa hint migration fails
  ...
2024-07-21 17:15:46 -07:00
Linus Torvalds
2c9b351240 ARM:
* Initial infrastructure for shadow stage-2 MMUs, as part of nested
   virtualization enablement
 
 * Support for userspace changes to the guest CTR_EL0 value, enabling
   (in part) migration of VMs between heterogenous hardware
 
 * Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of
   the protocol
 
 * FPSIMD/SVE support for nested, including merged trap configuration
   and exception routing
 
 * New command-line parameter to control the WFx trap behavior under KVM
 
 * Introduce kCFI hardening in the EL2 hypervisor
 
 * Fixes + cleanups for handling presence/absence of FEAT_TCRX
 
 * Miscellaneous fixes + documentation updates
 
 LoongArch:
 
 * Add paravirt steal time support.
 
 * Add support for KVM_DIRTY_LOG_INITIALLY_SET.
 
 * Add perf kvm-stat support for loongarch.
 
 RISC-V:
 
 * Redirect AMO load/store access fault traps to guest
 
 * perf kvm stat support
 
 * Use guest files for IMSIC virtualization, when available
 
 ONE_REG support for the Zimop, Zcmop, Zca, Zcf, Zcd, Zcb and Zawrs ISA
 extensions is coming through the RISC-V tree.
 
 s390:
 
 * Assortment of tiny fixes which are not time critical
 
 x86:
 
 * Fixes for Xen emulation.
 
 * Add a global struct to consolidate tracking of host values, e.g. EFER
 
 * Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC
   bus frequency, because TDX.
 
 * Print the name of the APICv/AVIC inhibits in the relevant tracepoint.
 
 * Clean up KVM's handling of vendor specific emulation to consistently act on
   "compatible with Intel/AMD", versus checking for a specific vendor.
 
 * Drop MTRR virtualization, and instead always honor guest PAT on CPUs
   that support self-snoop.
 
 * Update to the newfangled Intel CPU FMS infrastructure.
 
 * Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads
   '0' and writes from userspace are ignored.
 
 * Misc cleanups
 
 x86 - MMU:
 
 * Small cleanups, renames and refactoring extracted from the upcoming
   Intel TDX support.
 
 * Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't
   hold leafs SPTEs.
 
 * Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager
   page splitting, to avoid stalling vCPUs when splitting huge pages.
 
 * Bug the VM instead of simply warning if KVM tries to split a SPTE that is
   non-present or not-huge.  KVM is guaranteed to end up in a broken state
   because the callers fully expect a valid SPTE, it's all but dangerous
   to let more MMU changes happen afterwards.
 
 x86 - AMD:
 
 * Make per-CPU save_area allocations NUMA-aware.
 
 * Force sev_es_host_save_area() to be inlined to avoid calling into an
   instrumentable function from noinstr code.
 
 * Base support for running SEV-SNP guests.  API-wise, this includes
   a new KVM_X86_SNP_VM type, encrypting/measure the initial image into
   guest memory, and finalizing it before launching it.  Internally,
   there are some gmem/mmu hooks needed to prepare gmem-allocated pages
   before mapping them into guest private memory ranges.
 
   This includes basic support for attestation guest requests, enough to
   say that KVM supports the GHCB 2.0 specification.
 
   There is no support yet for loading into the firmware those signing
   keys to be used for attestation requests, and therefore no need yet
   for the host to provide certificate data for those keys.  To support
   fetching certificate data from userspace, a new KVM exit type will be
   needed to handle fetching the certificate from userspace. An attempt to
   define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle
   this was introduced in v1 of this patchset, but is still being discussed
   by community, so for now this patchset only implements a stub version
   of SNP Extended Guest Requests that does not provide certificate data.
 
 x86 - Intel:
 
 * Remove an unnecessary EPT TLB flush when enabling hardware.
 
 * Fix a series of bugs that cause KVM to fail to detect nested pending posted
   interrupts as valid wake eents for a vCPU executing HLT in L2 (with
   HLT-exiting disable by L1).
 
 * KVM: x86: Suppress MMIO that is triggered during task switch emulation
 
   Explicitly suppress userspace emulated MMIO exits that are triggered when
   emulating a task switch as KVM doesn't support userspace MMIO during
   complex (multi-step) emulation.  Silently ignoring the exit request can
   result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to
   userspace for some other reason prior to purging mmio_needed.
 
   See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write exits if
   emulator detects exception") for more details on KVM's limitations with
   respect to emulated MMIO during complex emulator flows.
 
 Generic:
 
 * Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE,
   because the special casing needed by these pages is not due to just
   unmovability (and in fact they are only unmovable because the CPU cannot
   access them).
 
 * New ioctl to populate the KVM page tables in advance, which is useful to
   mitigate KVM page faults during guest boot or after live migration.
   The code will also be used by TDX, but (probably) not through the ioctl.
 
 * Enable halt poll shrinking by default, as Intel found it to be a clear win.
 
 * Setup empty IRQ routing when creating a VM to avoid having to synchronize
   SRCU when creating a split IRQCHIP on x86.
 
 * Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag
   that arch code can use for hooking both sched_in() and sched_out().
 
 * Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
   truncating a bogus value from userspace, e.g. to help userspace detect bugs.
 
 * Mark a vCPU as preempted if and only if it's scheduled out while in the
   KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest
   memory when retrieving guest state during live migration blackout.
 
 Selftests:
 
 * Remove dead code in the memslot modification stress test.
 
 * Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs.
 
 * Print the guest pseudo-RNG seed only when it changes, to avoid spamming the
   log for tests that create lots of VMs.
 
 * Make the PMU counters test less flaky when counting LLC cache misses by
   doing CLFLUSH{OPT} in every loop iteration.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaZQB0UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNkZwf/bv2jiENaLFNGPe/VqTKMQ6PHQLMG
 +sNHx6fJPP35gTM8Jqf0/7/ummZXcSuC1mWrzYbecZm7Oeg3vwNXHZ4LquwwX6Dv
 8dKcUzLbWDAC4WA3SKhi8C8RV2v6E7ohy69NtAJmFWTc7H95dtIQm6cduV2osTC3
 OEuHe1i8d9umk6couL9Qhm8hk3i9v2KgCsrfyNrQgLtS3hu7q6yOTR8nT0iH6sJR
 KE5A8prBQgLmF34CuvYDw4Hu6E4j+0QmIqodovg2884W1gZQ9LmcVqYPaRZGsG8S
 iDdbkualLKwiR1TpRr3HJGKWSFdc7RblbsnHRvHIZgFsMQiimh4HrBSCyQ==
 =zepX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Initial infrastructure for shadow stage-2 MMUs, as part of nested
     virtualization enablement

   - Support for userspace changes to the guest CTR_EL0 value, enabling
     (in part) migration of VMs between heterogenous hardware

   - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
     of the protocol

   - FPSIMD/SVE support for nested, including merged trap configuration
     and exception routing

   - New command-line parameter to control the WFx trap behavior under
     KVM

   - Introduce kCFI hardening in the EL2 hypervisor

   - Fixes + cleanups for handling presence/absence of FEAT_TCRX

   - Miscellaneous fixes + documentation updates

  LoongArch:

   - Add paravirt steal time support

   - Add support for KVM_DIRTY_LOG_INITIALLY_SET

   - Add perf kvm-stat support for loongarch

  RISC-V:

   - Redirect AMO load/store access fault traps to guest

   - perf kvm stat support

   - Use guest files for IMSIC virtualization, when available

  s390:

   - Assortment of tiny fixes which are not time critical

  x86:

   - Fixes for Xen emulation

   - Add a global struct to consolidate tracking of host values, e.g.
     EFER

   - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
     effective APIC bus frequency, because TDX

   - Print the name of the APICv/AVIC inhibits in the relevant
     tracepoint

   - Clean up KVM's handling of vendor specific emulation to
     consistently act on "compatible with Intel/AMD", versus checking
     for a specific vendor

   - Drop MTRR virtualization, and instead always honor guest PAT on
     CPUs that support self-snoop

   - Update to the newfangled Intel CPU FMS infrastructure

   - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
     it reads '0' and writes from userspace are ignored

   - Misc cleanups

  x86 - MMU:

   - Small cleanups, renames and refactoring extracted from the upcoming
     Intel TDX support

   - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
     that can't hold leafs SPTEs

   - Unconditionally drop mmu_lock when allocating TDP MMU page tables
     for eager page splitting, to avoid stalling vCPUs when splitting
     huge pages

   - Bug the VM instead of simply warning if KVM tries to split a SPTE
     that is non-present or not-huge. KVM is guaranteed to end up in a
     broken state because the callers fully expect a valid SPTE, it's
     all but dangerous to let more MMU changes happen afterwards

  x86 - AMD:

   - Make per-CPU save_area allocations NUMA-aware

   - Force sev_es_host_save_area() to be inlined to avoid calling into
     an instrumentable function from noinstr code

   - Base support for running SEV-SNP guests. API-wise, this includes a
     new KVM_X86_SNP_VM type, encrypting/measure the initial image into
     guest memory, and finalizing it before launching it. Internally,
     there are some gmem/mmu hooks needed to prepare gmem-allocated
     pages before mapping them into guest private memory ranges

     This includes basic support for attestation guest requests, enough
     to say that KVM supports the GHCB 2.0 specification

     There is no support yet for loading into the firmware those signing
     keys to be used for attestation requests, and therefore no need yet
     for the host to provide certificate data for those keys.

     To support fetching certificate data from userspace, a new KVM exit
     type will be needed to handle fetching the certificate from
     userspace.

     An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
     exit type to handle this was introduced in v1 of this patchset, but
     is still being discussed by community, so for now this patchset
     only implements a stub version of SNP Extended Guest Requests that
     does not provide certificate data

  x86 - Intel:

   - Remove an unnecessary EPT TLB flush when enabling hardware

   - Fix a series of bugs that cause KVM to fail to detect nested
     pending posted interrupts as valid wake eents for a vCPU executing
     HLT in L2 (with HLT-exiting disable by L1)

   - KVM: x86: Suppress MMIO that is triggered during task switch
     emulation

     Explicitly suppress userspace emulated MMIO exits that are
     triggered when emulating a task switch as KVM doesn't support
     userspace MMIO during complex (multi-step) emulation

     Silently ignoring the exit request can result in the
     WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
     for some other reason prior to purging mmio_needed

     See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write
     exits if emulator detects exception") for more details on KVM's
     limitations with respect to emulated MMIO during complex emulator
     flows

  Generic:

   - Rename the AS_UNMOVABLE flag that was introduced for KVM to
     AS_INACCESSIBLE, because the special casing needed by these pages
     is not due to just unmovability (and in fact they are only
     unmovable because the CPU cannot access them)

   - New ioctl to populate the KVM page tables in advance, which is
     useful to mitigate KVM page faults during guest boot or after live
     migration. The code will also be used by TDX, but (probably) not
     through the ioctl

   - Enable halt poll shrinking by default, as Intel found it to be a
     clear win

   - Setup empty IRQ routing when creating a VM to avoid having to
     synchronize SRCU when creating a split IRQCHIP on x86

   - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
     a flag that arch code can use for hooking both sched_in() and
     sched_out()

   - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
     truncating a bogus value from userspace, e.g. to help userspace
     detect bugs

   - Mark a vCPU as preempted if and only if it's scheduled out while in
     the KVM_RUN loop, e.g. to avoid marking it preempted and thus
     writing guest memory when retrieving guest state during live
     migration blackout

  Selftests:

   - Remove dead code in the memslot modification stress test

   - Treat "branch instructions retired" as supported on all AMD Family
     17h+ CPUs

   - Print the guest pseudo-RNG seed only when it changes, to avoid
     spamming the log for tests that create lots of VMs

   - Make the PMU counters test less flaky when counting LLC cache
     misses by doing CLFLUSH{OPT} in every loop iteration"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  crypto: ccp: Add the SNP_VLEK_LOAD command
  KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
  KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
  KVM: x86: Replace static_call_cond() with static_call()
  KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  x86/sev: Move sev_guest.h into common SEV header
  KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: x86: Suppress MMIO that is triggered during task switch emulation
  KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
  KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
  KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
  KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
  KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
  KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
  KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
  KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
  KVM: Document KVM_PRE_FAULT_MEMORY ioctl
  mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
  perf kvm: Add kvm-stat for loongarch64
  LoongArch: KVM: Add PV steal time support in guest side
  ...
2024-07-20 12:41:03 -07:00
Linus Torvalds
13a7871541 6.11 updates for libnvdimm
One small cleanup to use sizeof(*pointer)
 
 A series of patches to add MODULE_DESCRIPTIONS() to eliminate make W=1
 warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQSgX9xt+GwmrJEQ+euebuN7TNx1MQUCZpqNARQcaXJhLndlaW55
 QGludGVsLmNvbQAKCRCebuN7TNx1McgTAQDx5VvRC3htc7UM/i6524si2kurfIOd
 uuB+AHV53PfrkAD/ad0DfzW22kWR/QzyXtVLguNYoNKN+ipOHnJ0Atzgxgw=
 =HPWt
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Ira Weiny:

 - One small cleanup to use sizeof(*pointer)

 - Add MODULE_DESCRIPTIONS() to eliminate make W=1 warnings

* tag 'libnvdimm-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  testing: nvdimm: Add MODULE_DESCRIPTION() macros
  testing: nvdimm: iomap: add MODULE_DESCRIPTION()
  dax: add missing MODULE_DESCRIPTION() macros
  nvdimm: add missing MODULE_DESCRIPTION() macros
  ACPI: NFIT: add missing MODULE_DESCRIPTION() macro
  nvdimm/btt: use sizeof(*pointer) instead of sizeof(type)
2024-07-20 11:26:02 -07:00
Linus Torvalds
f557af081d RISC-V Patches for the 6.11 Merge Window, Part 1
* Support for various new ISA extensions:
     * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector
       extension.
     * Zimop and Zcmop for may-be-operations.
     * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension.
     * Zawrs,
 * riscv,cpu-intc is now dtschema.
 * A handful of performance improvements and cleanups to text patching.
 * Support for memory hot{,un}plug
 * The highest user-allocatable virtual address is now visible in
   hwprobe.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmabIGETHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiQe8D/9QPCaOnoP5OCZbwjkRBwaVxyknNyD0
 l+YNXk7Jk3B/oaOv3d7Bz+uWt1SG4j4jkfyuGJ81StZykp4/R7T823TZrPhog9VX
 IJm580MtvE49I2i1qJ+ZQti9wpiM+80lFnyMCzY6S7rrM9m62tKgUpARZcWoA55P
 iUo5bku99TYCcU2k1pnPrNSPQvVpECpv7tG0PwKpQd5DiYjbPp+aw5cQWN+izdOB
 6raOZ0buzP7McszvO/gcJs+kuHwrp0JSRvNxc2pwYZ0lx00p3hSV8UdtIMlI9Qm/
 z3gkQGHwc6UVMPHo1x0Gr5ShUTCI/iSwy4/7aY4NNXF6Sj99b8alt9GcbYqNAE7V
 k7sibCR7dhL4ods/GFMmzR7cQYlwlwtO+/ILak7rXhNvA32Xy1WUABguhP9ElTmw
 1ZS2hnRv6wc7MA2V7HBamf5mPXM6HQyC3oKy3njzDSJdiGIG7aa+TOfRAD+L/1Du
 QjIrKp6XcPIsZNjh8H3nMDVJ0VvDNnS4d4LbfNQc23VPzf57kFUqbli1pS0hBjFT
 ELEItH9dgSx+T5Qebdy/QMC3RG8Yc1IUdw6VQ7Jny/uCCEZNq+VZ+bXxspMmswCp
 sUIyDplJTJfRt3G2OxK0b95x6oj8jbaJOQfv6PBF71dDBsChg8eXFVJ2NDrX4Bvr
 h2MPK7vGBtFz8w==
 =+ICi
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for various new ISA extensions:
     * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector
       extension
     * Zimop and Zcmop for may-be-operations
     * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension
     * Zawrs

 - riscv,cpu-intc is now dtschema

 - A handful of performance improvements and cleanups to text patching

 - Support for memory hot{,un}plug

 - The highest user-allocatable virtual address is now visible in
   hwprobe

* tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (58 commits)
  riscv: lib: relax assembly constraints in hweight
  riscv: set trap vector earlier
  KVM: riscv: selftests: Add Zawrs extension to get-reg-list test
  KVM: riscv: Support guest wrs.nto
  riscv: hwprobe: export Zawrs ISA extension
  riscv: Add Zawrs support for spinlocks
  dt-bindings: riscv: Add Zawrs ISA extension description
  riscv: Provide a definition for 'pause'
  riscv: hwprobe: export highest virtual userspace address
  riscv: Improve sbi_ecall() code generation by reordering arguments
  riscv: Add tracepoints for SBI calls and returns
  riscv: Optimize crc32 with Zbc extension
  riscv: Enable DAX VMEMMAP optimization
  riscv: mm: Add support for ZONE_DEVICE
  virtio-mem: Enable virtio-mem for RISC-V
  riscv: Enable memory hotplugging for RISC-V
  riscv: mm: Take memory hotplug read-lock during kernel page table dump
  riscv: mm: Add memory hotplugging support
  riscv: mm: Add pfn_to_kaddr() implementation
  riscv: mm: Refactor create_linear_mapping_range() for memory hot add
  ...
2024-07-20 09:11:27 -07:00
Jann Horn
64e166099b kallsyms: get rid of code for absolute kallsyms
Commit cf8e865810 ("arch: Remove Itanium (IA-64) architecture")
removed the last use of the absolute kallsyms.

Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20240221202655.2423854-1-jannh@google.com/
[masahiroy@kernel.org: rebase the code and reword the commit description]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-20 16:33:21 +09:00
Linus Torvalds
3c3ff7be97 powerpc updates for 6.11
- Remove support for 40x CPUs & platforms.
 
  - Add support to the 64-bit BPF JIT for cpu v4 instructions.
 
  - Fix PCI hotplug driver crash on powernv.
 
  - Fix doorbell emulation for KVM on PAPR guests (nestedv2).
 
  - Fix KVM nested guest handling of some less used SPRs.
 
  - Online NUMA nodes with no CPU/memory if they have a PCI device attached.
 
  - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels.
 
  - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE.
 
 Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian King,
 Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra, Gautam Menghani,
 Haren Myneni, Hari Bathini, Jeff Johnson, Krishna Kumar, Krzysztof Kozlowski,
 Nathan Lynch, Nicholas Piggin, Nick Bowler, Nilay Shroff, Rob Herring (Arm),
 Shawn Anastasio, Shivaprasad G Bhat, Sourabh Jain, Srikar Dronamraju, Timothy
 Pearson, Uwe Kleine-König, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmaaUNITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDA+D/4o7OZ+SY0plTlMKSy3hW/SRXVj/byA
 CCKdizNY+3Rf/+K7KhuLOUPXhZOemLPE0xfKS3ND4mIEKCswzzXqmi6kjPH0qd8q
 qUhkHbt/LNpNJzZOYYw+usaklMTMdZtAl/jD9WEvGwgu2EYHgrujRIq04kEI1b0e
 OPiRnXOZcfevRBepQmYZKHvFlCRRa5vvsQcvLfY64yFqD0AsKTHgIi/48Dn33pb2
 hqHYyV1tZA3uT86Z1TgF1OG83VOSDsgc19Sb2xn14O9aJJ7lD2TOgVa4P4FfBlXA
 TXYYGQwK31ymGVWGcGfebVdC1ECeTem9n28vlk5I0NO9xNgPok/Ov4DAiZ+u1G0E
 3CXRDx9Uz2yPcGBJI2dpxfp2iw83Ad2DtBzAdukMD36xnC7xfrQz+W9SQfbcPJ8e
 I5SMAstWuLNgrX7YkjAOnXh1N41kht/mdV6KHdcMxPc7jOtAD65gUOZcgwYLeXlT
 Av17Ax0PMbiQ1BpFe2KNr/0T9Ba5k5rN7oDSKncDAq4uX8LcZKHj4bSHT9KroT1C
 q+GERspoCYp2VDMO742Jm7KTmQDHsS5y4Q+iSdOR8cQBXF613FaryDxSoJZhg2pf
 C2zIVED13RGcjIFcWlv73iA6QpBsphM+WWFz7mjULyJhxFQwm6BYt+Wy6jFu84oH
 sOgvPH8YyaK2uA==
 =eHVd
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Remove support for 40x CPUs & platforms

 - Add support to the 64-bit BPF JIT for cpu v4 instructions

 - Fix PCI hotplug driver crash on powernv

 - Fix doorbell emulation for KVM on PAPR guests (nestedv2)

 - Fix KVM nested guest handling of some less used SPRs

 - Online NUMA nodes with no CPU/memory if they have a PCI device
   attached

 - Reduce memory overhead of enabling kfence on 64-bit Radix MMU kernels

 - Reimplement the iommu table_group_ops for pseries for VFIO SPAPR TCE

Thanks to: Anjali K, Artem Savkov, Athira Rajeev, Breno Leitao, Brian
King, Celeste Liu, Christophe Leroy, Esben Haabendal, Gaurav Batra,
Gautam Menghani, Haren Myneni, Hari Bathini, Jeff Johnson, Krishna
Kumar, Krzysztof Kozlowski, Nathan Lynch, Nicholas Piggin, Nick Bowler,
Nilay Shroff, Rob Herring (Arm), Shawn Anastasio, Shivaprasad G Bhat,
Sourabh Jain, Srikar Dronamraju, Timothy Pearson, Uwe Kleine-König, and
Vaibhav Jain.

* tag 'powerpc-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (57 commits)
  Documentation/powerpc: Mention 40x is removed
  powerpc: Remove 40x leftovers
  macintosh/therm_windtunnel: fix module unload.
  powerpc: Check only single values are passed to CPU/MMU feature checks
  powerpc/xmon: Fix disassembly CPU feature checks
  powerpc: Drop clang workaround for builtin constant checks
  powerpc64/bpf: jit support for signed division and modulo
  powerpc64/bpf: jit support for sign extended mov
  powerpc64/bpf: jit support for sign extended load
  powerpc64/bpf: jit support for unconditional byte swap
  powerpc64/bpf: jit support for 32bit offset jmp instruction
  powerpc/pci: Hotplug driver bridge support
  pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv
  powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC
  powerpc: add missing MODULE_DESCRIPTION() macros
  macintosh/mac_hid: add MODULE_DESCRIPTION()
  KVM: PPC: add missing MODULE_DESCRIPTION() macros
  powerpc/kexec: Use of_property_read_reg()
  powerpc/64s/radix/kfence: map __kfence_pool at page granularity
  powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API
  ...
2024-07-19 21:00:33 -07:00
Linus Torvalds
04d17331ca USB/Thunderbolt updates for 6.11-rc1
Here is the big set of USB and Thunderbolt changes for 6.11-rc1.
 Nothing earth-shattering in here, just constant forward progress in
 adding support for new hardware and better debugging functionalities for
 thunderbolt devices and the subsystem.  Included in here are:
   - thunderbolt debugging update and driver additions
   - xhci driver updates
   - typec driver updates
   - kselftest device driver changes (acked by the relevant maintainers,
     depended on other changes in this tree.)
   - cdns3 driver updates
   - gadget driver updates
   - MODULE_DESCRIPTION() additions
   - dwc3 driver updates and fixes
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZppaNA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylXZwCgrEtIAQw0x6EF7w/iTWVS5UJj9AEAoLCj5UwO
 WX978uThyUctuYYKbw+8
 =Cm7j
 -----END PGP SIGNATURE-----

Merge tag 'usb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt updates from Greg KH:
 "Here is the big set of USB and Thunderbolt changes for 6.11-rc1.

  Nothing earth-shattering in here, just constant forward progress in
  adding support for new hardware and better debugging functionalities
  for thunderbolt devices and the subsystem. Included in here are:

   - thunderbolt debugging update and driver additions

   - xhci driver updates

   - typec driver updates

   - kselftest device driver changes (acked by the relevant maintainers,
     depended on other changes in this tree.)

   - cdns3 driver updates

   - gadget driver updates

   - MODULE_DESCRIPTION() additions

   - dwc3 driver updates and fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'usb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (112 commits)
  kselftest: devices: Add test to detect device error logs
  kselftest: Move ksft helper module to common directory
  kselftest: devices: Move discoverable devices test to subdirectory
  usb: gadget: f_uac2: fix non-newline-terminated function name
  USB: uas: Implement the new shutdown callback
  USB: core: add 'shutdown' callback to usb_driver
  usb: typec: Drop explicit initialization of struct i2c_device_id::driver_data to 0
  usb: dwc3: enable CCI support for AMD-xilinx DWC3 controller
  usb: dwc2: add support for other Lantiq SoCs
  usb: gadget: Use u16 types for 16-bit fields
  usb: gadget: midi2: Fix incorrect default MIDI2 protocol setup
  usb: dwc3: core: Check all ports when set phy suspend
  usb: typec: tcpci: add support to set connector orientation
  dt-bindings: usb: Convert fsl-usb to yaml
  usb: typec: ucsi: reorder operations in ucsi_run_command()
  usb: typec: ucsi: extract common code for command handling
  usb: typec: ucsi: inline ucsi_read_message_in
  usb: typec: ucsi: rework command execution functions
  usb: typec: ucsi: split read operation
  usb: typec: ucsi: simplify command sending API
  ...
2024-07-19 15:37:48 -07:00
Linus Torvalds
d7e78951a8 Notably this includes fixes for a s390 build breakage.
Including fixes from netfilter.
 
 Current release - new code bugs:
 
   - eth: fbnic: fix s390 build.
 
   - eth: airoha: fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue()
 
 Previous releases - regressions:
 
   - flow_dissector: use DEBUG_NET_WARN_ON_ONCE
 
   - ipv4: fix incorrect TOS in route get reply
 
   - dsa: fix chip-wide frame size config in some drivers
 
 Previous releases - always broken:
 
   - netfilter: nf_set_pipapo: fix initial map fill
 
   - eth: gve: fix XDP TX completion handling when counters overflow
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmaafj4SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkSsIP/jLODokNb/RcIuOYZVlgn2C5icMeZtqv
 6BIliZeE1EcnMWqOnheHaJVklq17ChAYbj/GNN8zQeOQhPA2eFCzPPLl/UpF9/ik
 wuvk+QQ4EM3T/SwWLKmht9UJioVi66szl3vZ+ByJ4BgGiJWORW1dK/AYzyYrsplk
 BMpQTs/Q/ekzWJzBtX+1Cz8izX1gl+dXhMezSdg9cW4KaBu1Hqwny954HF6keUQn
 h7bbAWiOAb5bhqUzgCdgF4gXp+uaWEzG1a1kY1G86NjXC5H0G03+vl37RbY/1kYh
 lUa/3nfH2V7Sy+MYTvjQe66obVeQeOh/PhRMlbkEVphpKs8XtHfsP433iOMZZn2D
 Z2Yb4yICnpNwbPmK1fyFvOrY7zGp2yZ2dSDEhowGjcyoQPO6RPnBfvArl66phIOm
 DWfEq79dYa0eEnCM174BLZbsjcHeeYQX2b8lagUslC0Oel+ijUG26XeMqs5W3at9
 QVcMV+aPE7pibpAq4M+qqkldv49QmXRsQai0BMa0+qEPyDKLe2nnf6L+TrDfN7Zf
 tpiiZZZGpcctZ1cOfyuebB37z/jtHJQ94IfLCD9RzpHlpOtCFycO18adrohhI58Q
 TaZf7U1CqC/UREckwE5F9ypQhdiQHEH84rgvKT3zFFRYftWTQmBCAUs3JzBPbYwO
 RO2GDFM/htOu
 =Q1xF
 -----END PGP SIGNATURE-----

Merge tag 'net-6.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Notably this includes fixes for a s390 build breakage.

  Current release - new code bugs:

   - eth: fbnic: fix s390 build

   - eth: airoha: fix NULL pointer dereference in
     airoha_qdma_cleanup_rx_queue()

  Previous releases - regressions:

   - flow_dissector: use DEBUG_NET_WARN_ON_ONCE

   - ipv4: fix incorrect TOS in route get reply

   - dsa: fix chip-wide frame size config in some drivers

  Previous releases - always broken:

   - netfilter: nf_set_pipapo: fix initial map fill

   - eth: gve: fix XDP TX completion handling when counters overflow"

* tag 'net-6.11-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net:
  eth: fbnic: don't build the driver when skb has more than 21 frags
  net: dsa: b53: Limit chip-wide jumbo frame config to CPU ports
  net: dsa: mv88e6xxx: Limit chip-wide frame size config to CPU ports
  net: airoha: Fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue()
  net: wwan: t7xx: add support for Dell DW5933e
  ipv4: Fix incorrect TOS in fibmatch route get reply
  ipv4: Fix incorrect TOS in route get reply
  net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE
  driver core: auxiliary bus: Fix documentation of auxiliary_device
  net: airoha: fix error branch in airoha_dev_xmit and airoha_set_gdm_ports
  gve: Fix XDP TX completion handling when counters overflow
  ipvs: properly dereference pe in ip_vs_add_service
  selftests: netfilter: add test case for recent mismatch bug
  netfilter: nf_set_pipapo: fix initial map fill
  netfilter: ctnetlink: use helper function to calculate expect ID
  eth: fbnic: fix s390 build.
2024-07-19 14:58:12 -07:00
Linus Torvalds
12cc3d5389 sound updates for 6.11-rc1
Lots of changes in this cycle, but mostly for cleanups and
 refactoring.  Significant amount of changes are about DT schema
 conversions for ASoC at this time while we see other usual
 suspects, too.  Some highlights below:
 
 Core:
 - Re-introduction of PCM sync ID support API
 - MIDI2 time-base extension in ALSA sequencer API
 
 ASoC:
 - Syncing of features between simple-audio-card and the two
   audio-graph cards
 - Support for specifying the order of operations for components
   within cards to allow quirking for unusual systems
 - Lots of DT schema conversions
 - Continued SOF/Intel updates for topology, SoundWire, IPC3/4
 - New support for Asahi Kasei AK4619, Cirrus Logic CS530x, Everest
   Semiconductors ES8311, NXP i.MX95 and LPC32xx, Qualcomm LPASS
   v2.5 and WCD937x, Realtek RT1318 and RT1320 and Texas
   Instruments PCM5242
 
 HD-audio:
 - More quirks, Intel PantherLake support, senarytech codec support
 - Refactoring of Cirrus codec component-binding
 
 Others:
 - ALSA control kselftest improvements, and fixes for input value
   checks in various drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmaZNdoOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE/PWw//XYFQ2v+bc0x62LI1rIEt1/mSz6R1moHf85fK
 CjDOvHoGlZEkXuTmycK8b522/9tslHyE+8P97TZAy/6ph/yT44JgwQaadAvTZdWK
 eKrchogf+v6DaQar8+nmXp8409HBcfJdrSJth2xR5OhY741/kGBF1/YCBHZaIQan
 T87ag0tu1PVWQuLhdRlghkNYds+oaSX6wMaLRzVYI2TFYfHZOWYfVYd/NACb8KtO
 z66TqybOxOpq4xCi+umNaGn2TxdDvo427JgioAKzcGLodowRKmqNV+mXddfrhBEE
 Fwq4o8YGxgX+oaNn4aLQdrrREc1tuwQj0Kwpt/rkh4ESTgugcElq5hJCgPY8U3Ej
 5+ih7ZeIojKnfjNivHuath7tXe1inqPEK3RBt3qMoUldIxNhJ8WfIF0RNzW/QRY2
 g4JAI/4lswqPz6vYKULatDk+ZEW6PiV72kwW+4Vt7NxZnn9VFzP27qHuwkUHP5HM
 0q4/NKrv+MFPedOLEeEm/1dmE7NRT4tRJuIV+RwMJ0cyP4l2jSCwyDpxfkFqGitc
 wB0AXK3YLwISlKjziCox1cAex8F2XhjCdpOyOV6hTc3Dv/DySMHysv+4Uf4/kvst
 3GrqdkMHy4cEUYj/Sj+VunfColsX2KnQAN+e4Sonn+5nPsw7ypGkpM1Kf+wTQuNK
 EoxpzGo=
 =hn0h
 -----END PGP SIGNATURE-----

Merge tag 'sound-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "Lots of changes in this cycle, but mostly for cleanups and
  refactoring.

  Significant amount of changes are about DT schema conversions for ASoC
  at this time while we see other usual suspects, too.

  Some highlights below:

  Core:
   - Re-introduction of PCM sync ID support API
   - MIDI2 time-base extension in ALSA sequencer API

  ASoC:
   - Syncing of features between simple-audio-card and the two
     audio-graph cards
   - Support for specifying the order of operations for components
     within cards to allow quirking for unusual systems
   - Lots of DT schema conversions
   - Continued SOF/Intel updates for topology, SoundWire, IPC3/4
   - New support for Asahi Kasei AK4619, Cirrus Logic CS530x, Everest
     Semiconductors ES8311, NXP i.MX95 and LPC32xx, Qualcomm LPASS v2.5
     and WCD937x, Realtek RT1318 and RT1320 and Texas Instruments
     PCM5242

  HD-audio:
   - More quirks, Intel PantherLake support, senarytech codec support
   - Refactoring of Cirrus codec component-binding

  Others:
   - ALSA control kselftest improvements, and fixes for input value
     checks in various drivers"

* tag 'sound-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (349 commits)
  kselftest/alsa: Log the PCM ID in pcm-test
  kselftest/alsa: Use card name rather than number in test names
  ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book Pro 360
  ALSA: hda/tas2781: Add new quirk for Lenovo Hera2 Laptop
  ALSA: seq: ump: Skip useless ports for static blocks
  ALSA: pcm_dmaengine: Don't synchronize DMA channel when DMA is paused
  ALSA: usb: Use BIT() for bit values
  ALSA: usb: Fix UBSAN warning in parse_audio_unit()
  ALSA: hda/realtek: Enable headset mic on Positivo SU C1400
  ASoC: tas2781: Add new Kontrol to set tas2563 digital Volume
  ASoC: codecs: wcd937x: Remove separate handling for vdd-buck supply
  ASoC: codecs: wcd937x: Remove the string compare in MIC BIAS widget settings
  ASoC: codecs: wcd937x-sdw: Fix Unbalanced pm_runtime_enable
  ASoC: dt-bindings: cirrus,cs42xx8: Convert to dtschema
  ASoC: cs530x: Remove bclk from private structure
  ASoC: cs530x: Calculate proper bclk rate using TDM
  ASoC: dt-bindings: cirrus,cs4270: Convert to dtschema
  firmware: cs_dsp: Rename fw_ver to wmfw_ver
  firmware: cs_dsp: Clarify wmfw format version log message
  firmware: cs_dsp: Make wmfw and bin filename arguments const char *
  ...
2024-07-19 12:39:34 -07:00
Linus Torvalds
f4f92db439 virtio: features, fixes, cleanups
Several new features here:
 
 - Virtio find vqs API has been reworked
   (required to fix the scalability issue we have with
    adminq, which I hope to merge later in the cycle)
 
 - vDPA driver for Marvell OCTEON
 
 - virtio fs performance improvement
 
 - mlx5 migration speedups
 
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaXjQQPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpnIsH/jVNqAQbe/vaBQdNMdnsA+P9A9unLbYRxYCQ
 tN73mQRIXKtnZHBRAEbMGq52HPYg8HlN2HJSgyNo6I6t8VD+PiOco7m+3GpmqEcW
 aXPOPl0BAbVoDgyutxRuuodP8Z61lBx0mG6iOxpzTXOPGlpQqtPCFHO8YnodqnPf
 tMix/5uAqgZKV2siCbw5DtzwEc0gDHU8qsD0/nyoS5nBDF9yh/ardr5P/qiyFDQH
 atCNYTOhIFU83pLAaw0fpCGbkt7gxf+5RpWVx3wkYww+/MwvYhsveRvQyaGbBz3n
 WDtET3SOtVTta98OAGIKCq/2z8f6mYXBP7vXapBgnJG3vwS/poQ=
 =LYua
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "Several new features here:

   - Virtio find vqs API has been reworked (required to fix the
     scalability issue we have with adminq, which I hope to merge later
     in the cycle)

   - vDPA driver for Marvell OCTEON

   - virtio fs performance improvement

   - mlx5 migration speedups

  Fixes, cleanups all over the place"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits)
  virtio: rename virtio_find_vqs_info() to virtio_find_vqs()
  virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers
  virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()
  virtio_balloon: convert to use virtio_find_vqs_info()
  virtiofs: convert to use virtio_find_vqs_info()
  scsi: virtio_scsi: convert to use virtio_find_vqs_info()
  virtio_net: convert to use virtio_find_vqs_info()
  virtio_crypto: convert to use virtio_find_vqs_info()
  virtio_console: convert to use virtio_find_vqs_info()
  virtio_blk: convert to use virtio_find_vqs_info()
  virtio: rename find_vqs_info() op to find_vqs()
  virtio: remove the original find_vqs() op
  virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly
  virtio: convert find_vqs() op implementations to find_vqs_info()
  virtio_pci: convert vp_*find_vqs() ops to find_vqs_info()
  virtio: introduce virtio_queue_info struct and find_vqs_info() config op
  virtio: make virtio_find_single_vq() call virtio_find_vqs()
  virtio: make virtio_find_vqs() call virtio_find_vqs_ctx()
  caif_virtio: use virtio_find_single_vq() for single virtqueue finding
  vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
  ...
2024-07-19 11:57:55 -07:00
Jason A. Donenfeld
4920a2590e selftests/vDSO: add tests for vgetrandom
This adds two tests for vgetrandom. The first one, vdso_test_chacha,
simply checks that the assembly implementation of chacha20 matches that
of libsodium, a basic sanity check that should catch most errors. The
second, vdso_test_getrandom, is a full "libc-like" implementation of the
userspace side of vgetrandom() support. It's meant to be used also as
example code for libcs that might be integrating this.

Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:12 +02:00
Jason A. Donenfeld
9651fcedf7 mm: add MAP_DROPPABLE for designating always lazily freeable mappings
The vDSO getrandom() implementation works with a buffer allocated with a
new system call that has certain requirements:

- It shouldn't be written to core dumps.
  * Easy: VM_DONTDUMP.
- It should be zeroed on fork.
  * Easy: VM_WIPEONFORK.

- It shouldn't be written to swap.
  * Uh-oh: mlock is rlimited.
  * Uh-oh: mlock isn't inherited by forks.

- It shouldn't reserve actual memory, but it also shouldn't crash when
  page faulting in memory if none is available
  * Uh-oh: VM_NORESERVE means segfaults.

It turns out that the vDSO getrandom() function has three really nice
characteristics that we can exploit to solve this problem:

1) Due to being wiped during fork(), the vDSO code is already robust to
   having the contents of the pages it reads zeroed out midway through
   the function's execution.

2) In the absolute worst case of whatever contingency we're coding for,
   we have the option to fallback to the getrandom() syscall, and
   everything is fine.

3) The buffers the function uses are only ever useful for a maximum of
   60 seconds -- a sort of cache, rather than a long term allocation.

These characteristics mean that we can introduce VM_DROPPABLE, which
has the following semantics:

a) It never is written out to swap.
b) Under memory pressure, mm can just drop the pages (so that they're
   zero when read back again).
c) It is inherited by fork.
d) It doesn't count against the mlock budget, since nothing is locked.
e) If there's not enough memory to service a page fault, it's not fatal,
   and no signal is sent.

This way, allocations used by vDSO getrandom() can use:

    VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE

And there will be no problem with OOMing, crashing on overcommitment,
using memory when not in use, not wiping on fork(), coredumps, or
writing out to swap.

In order to let vDSO getrandom() use this, expose these via mmap(2) as
MAP_DROPPABLE.

Note that this involves removing the MADV_FREE special case from
sort_folio(), which according to Yu Zhao is unnecessary and will simply
result in an extra call to shrink_folio_list() in the worst case. The
chunk removed reenables the swapbacked flag, which we don't want for
VM_DROPPABLE, and we can't conditionalize it here because there isn't a
vma reference available.

Finally, the provided self test ensures that this is working as desired.

Cc: linux-mm@kvack.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:12 +02:00
Linus Torvalds
ef7c8f2b1f iommufd for 6.11 merge window
Major changes:
 
 - The iova_bitmap logic for efficiently reporting dirty pages back to
   userspace has a few more tricky corner case bugs that have been resolved
   and backed with new tests. The revised version has simpler logic.
 
 - Shared branch with iommu for handle support when doing domain
   attach. Handles allow the domain owner to include additional private data
   on a per-device basis.
 
 - IO Page Fault Reporting to userspace via iommufd. Page faults can be
   generated on fault capable HWPTs when a translation is not present.
   Routing them to userspace would allow a VMM to be able to virtualize them
   into an emulated vIOMMU. This is the next step to fully enabling vSVA
   support.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZpfo4AAKCRCFwuHvBreF
 YTO6APwMLxeWmHbE1H+7ZPuXP7B1aDuwRLczZOo3i816pIj+bQD+OywEA/NcljK6
 6NLeqyUe7tECtVrFPSiRT9lWVuzZSQs=
 =rnN/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:

 - The iova_bitmap logic for efficiently reporting dirty pages back to
   userspace has a few more tricky corner case bugs that have been
   resolved and backed with new tests.

   The revised version has simpler logic.

 - Shared branch with iommu for handle support when doing domain attach.

   Handles allow the domain owner to include additional private data on
   a per-device basis.

 - IO Page Fault Reporting to userspace via iommufd. Page faults can be
   generated on fault capable HWPTs when a translation is not present.

   Routing them to userspace would allow a VMM to be able to virtualize
   them into an emulated vIOMMU. This is the next step to fully enabling
   vSVA support.

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (26 commits)
  iommufd: Put constants for all the uAPI enums
  iommufd: Fix error pointer checking
  iommufd: Add check on user response code
  iommufd: Remove IOMMUFD_PAGE_RESP_FAILURE
  iommufd: Require drivers to supply the cache_invalidate_user ops
  iommufd/selftest: Add coverage for IOPF test
  iommufd/selftest: Add IOPF support for mock device
  iommufd: Associate fault object with iommufd_hw_pgtable
  iommufd: Fault-capable hwpt attach/detach/replace
  iommufd: Add iommufd fault object
  iommufd: Add fault and response message definitions
  iommu: Extend domain attach group with handle support
  iommu: Add attach handle to struct iopf_group
  iommu: Remove sva handle list
  iommu: Introduce domain attachment handle
  iommufd/iova_bitmap: Remove iterator logic
  iommufd/iova_bitmap: Dynamic pinning on iova_bitmap_set()
  iommufd/iova_bitmap: Consolidate iova_bitmap_set exit conditionals
  iommufd/iova_bitmap: Move initial pinning to iova_bitmap_for_each()
  iommufd/iova_bitmap: Cache mapped length in iova_bitmap_map struct
  ...
2024-07-19 09:42:29 -07:00
Linus Torvalds
76d9b92e68 slab updates for 6.11
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmaXl0kACgkQu+CwddJF
 iJrOlgf+N/G7BmgoW2CBF7mKsvCYs+pX3xeBuxPtsuq4FD386nsPFMN8gWAYLG3q
 ZU1z1S+0M8LhTg6/G9jMYLHt2Y7WhYbhFTjTHmULJkuhMDTUP9CRYy4XZ+hdPtHF
 30ezSdJQF9x/XxCSaaRVK1s+SMVHFg5xAOHKpfkNSamcMz9g+ZkYyPBr10/VoKd0
 JqwhW7r6hrlvWAiqY3QKCOvohIWglgvBUnNjUGMh1cUkOE2aYLYHklhRwICKgA6z
 p/2BUXiAEWUtgBkUrizwm/pdhJXLs0pOeYarVZP1v83tQMxyrc6XLNnqhvxP3DPW
 31thF5Rf9I8WaWTczXhxsAwFjqO3KQ==
 =4uf9
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:
 "The most prominent change this time is the kmem_buckets based
  hardening of kmalloc() allocations from Kees Cook.

  We have also extended the kmalloc() alignment guarantees for
  non-power-of-two sizes in a way that benefits rust.

  The rest are various cleanups and non-critical fixups.

   - Dedicated bucket allocator (Kees Cook)

     This series [1] enhances the probabilistic defense against heap
     spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year.

     kmalloc() users that are known to be useful for exploits can get
     completely separate set of kmalloc caches that can't be shared with
     other users. The first converted users are alloc_msg() and
     memdup_user().

     The hardening is enabled by CONFIG_SLAB_BUCKETS.

   - Extended kmalloc() alignment guarantees (Vlastimil Babka)

     For years now we have guaranteed natural alignment for power-of-two
     allocations, but nothing was defined for other sizes (in practice,
     we have two such buckets, kmalloc-96 and kmalloc-192).

     To avoid unnecessary padding in the rust layer due to its alignment
     rules, extend the guarantee so that the alignment is at least the
     largest power-of-two divisor of the requested size.

     This fits what rust needs, is a superset of the existing
     power-of-two guarantee, and does not in practice change the layout
     (and thus does not add overhead due to padding) of the kmalloc-96
     and kmalloc-192 caches, unless slab debugging is enabled for them.

   - Cleanups and non-critical fixups (Chengming Zhou, Suren
     Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka)

     Various tweaks related to the new alloc profiling code, folio
     conversion, debugging and more leftovers after SLAB"

Link: https://lore.kernel.org/all/20240701190152.it.631-kees@kernel.org/ [1]

* tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/memcg: alignment memcg_data define condition
  mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING
  mm, slab: move allocation tagging code in the alloc path into a hook
  mm/util: Use dedicated slab buckets for memdup_user()
  ipc, msg: Use dedicated slab buckets for alloc_msg()
  mm/slab: Introduce kmem_buckets_create() and family
  mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
  mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
  mm/slab: Introduce kmem_buckets typedef
  slab, rust: extend kmalloc() alignment guarantees to remove Rust padding
  slab: delete useless RED_INACTIVE and RED_ACTIVE
  slab: don't put freepointer outside of object if only orig_size
  slab: make check_object() more consistent
  mm: Reduce the number of slab->folio casts
  mm, slab: don't wrap internal functions with alloc_hooks()
2024-07-18 15:08:12 -07:00
Linus Torvalds
b2fc97c186 memblock: updates for 6.11-rc1
* reserve_mem command line parameter to allow creation of named memory
   reservation at boot time.
   The driving use-case is to improve the ability of pstore to retain
   ramoops data across reboots.
 * cleaunps and small improvements in memblock and mm_init
 * new tests cases in memblock test suite
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmaXfoIQHHJwcHRAa2Vy
 bmVsLm9yZwAKCRA5A4Ymyw79kU5mCAC23vIrB8FRlORczMYj+V3VFss3OjKT92lS
 fHGwq2oxHW+rdDpHXFObHU0D3k8d2l5jyrENRAAyA02qR0L6Pv8Na6pGxQua1eic
 VIdw0PFQMsizD1AIj84Y6skkyyF/tvZHpmX0B12D5+Ur65DC/Z867Cm/lE33/fHv
 /1+QB0JlG7W+FzxVstYyebY5/DVkH+bC7/A57FE2oB4BRXjEd8v9tTHBS4kRSvrE
 zE2KFxeGajN749LHztIpIprPKehu8Gc3oLrYLNJO+uLFVCV8ey3OqVj0RXMG2wLl
 hmVYqhbZM/Uz59D/P8pULD49f1Thjv/5A/MvUZ3SxM6zpWlsincf
 =xrZd
 -----END PGP SIGNATURE-----

Merge tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock updates from Mike Rapoport:

 - 'reserve_mem' command line parameter to allow creation of named
   memory reservation at boot time.

   The driving use-case is to improve the ability of pstore to retain
   ramoops data across reboots.

 - cleanups and small improvements in memblock and mm_init

 - new tests cases in memblock test suite

* tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: fix implicit declaration of function 'numa_valid_node'
  memblock: Move late alloc warning down to phys alloc
  pstore/ramoops: Add ramoops.mem_name= command line option
  mm/memblock: Add "reserve_mem" to reserved named memory at boot up
  mm/mm_init.c: don't initialize page->lru again
  mm/mm_init.c: not always search next deferred_init_pfn from very beginning
  mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop condition
  mm/mm_init.c: get the highest zone directly
  mm/mm_init.c: move nr_initialised reset down a bit
  mm/memblock: fix a typo in description of for_each_mem_region()
  mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfn
  mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap
  mm/memblock: return true directly on finding overlap region
  memblock tests: add memblock_overlaps_region_checks
  mm/memblock: fix comment for memblock_isolate_range()
  memblock tests: add memblock_reserve_many_may_conflict_check()
  memblock tests: add memblock_reserve_all_locations_check()
  mm/memblock: remove empty dummy entry
2024-07-18 14:48:11 -07:00
Linus Torvalds
68b5973045 perf tools changes for v6.11
Build
 -----
 * Build each directory as a library so that depedency check for the
   python extension module can be automatic.  But it also introduces
   some trivial merge conflicts with other trees that touched perf tools
   codes.  Basically it changes perf-y to perf-util-y or similar and you
   can find the resolution in the perf-next tree here.
 
   - https://lore.kernel.org/r/Zn8HeRRX3JV2IcxQ@sirena.org.uk
   - https://lore.kernel.org/r/20240709100536.238f4d12@canb.auug.org.au
 
 * Use pkg-config to check libtraceevent and libtracefs.
 
 perf sched
 ----------
 * Add --task-name and --fuzzy-name options for `perf sched map`.  It's
   to focus on selected tasks only by removing unrelated tasks in the
   output.  It matches the task comm with the given string and the
   --fuzzy-name option allows the partial matching.
 
     $ sudo perf sched record -a sleep 1
 
     $ sudo perf sched map --task-name kworker --fuzzy-name
        .   .   .   .   -  *A0  .   .    481065.315131 secs A0 => kworker/5:2-i91:438521
        .   .   .   .   -  *-   .   .    481065.315160 secs
       *B0  .   .   .   -   .   .   .    481065.316435 secs B0 => kworker/0:0-i91:437860
       *-   .   .   .   .   .   .   .    481065.316441 secs
        .   .   .   .   .  *A0  .   .    481065.318703 secs
        .   .   .   .   .  *-   .   .    481065.318717 secs
        .   .  *C0  .   .   .   .   .    481065.320544 secs C0 => kworker/u16:30-:430186
        .   .  *-   .   .   .   .   .    481065.320555 secs
        .   .  *D0  .   .   .   .   .    481065.328524 secs D0 => kworker/2:0-kdm:429654
       *B0  .   D0  .   -   .   .   .    481065.328527 secs
       *-   .   D0  .   -   .   .   .    481065.328535 secs
        .   .  *-   .   .   .   .   .    481065.328535 secs
 
 * Fix -r/--repeat option of perf sched replay.  The documentation said
   -1 will work as infinity but it didn't accept the value.  Update the
   code and document to use 0 instead.
 
 * Fix perf sched timehist to account the delay time for preempted tasks.
 
 Perf event filtering
 --------------------
 * perf top gained filtering support on regular events using BPF like
   perf record.  Previously it was able to use it for tracepoints only.
 
 * The BPF filter now supports filtering by UID/GID.  This should be
   preferred than -u <UID> option as it's racy to scan /proc to check
   tasks for the user and fails to open an event for the task if it's
   already gone.
 
     $ sudo perf top -e cycles --filter "uid == $(id -u)"
 
 perf report
 -----------
 * Skip dummy events in the group output by default.  The --skip-empty
   option controls display of empty events without samples.  But perf
   report can force display all events in a group.  In this case, auto-
   added a dummy event (for a system-wide record) ends up in the output.
   Now it can skip those empty events even in the group display mode.
   To preserve the old behavior, run this:
 
     $ perf report --group --no-skip-empty
 
 perf stat
 ---------
 * Choose the most disaggregate option when multiple aggregation options
   are given.  It used to pick the last option in the command line but
   it can be confusing and not consistent.  Now it'll choose the smallest
   unit.
 
   For example, it'd aggregate the result per-core when the user gave
   both --per-socket and --per-core options at the same time.
 
 Internals
 ---------
 * Fix `perf bench` when some CPUs are offline.
 
 * Fix handling of JIT symbol mappings to accept "/tmp/perf-${PID}.map
   patterns only so that it can not be confused by other /tmp/perf-*
   files.
 
 * Many improvements and fixes for `perf test`.
 
 Others
 ------
 * Support some new instructions for Intel-PT.
 * Fix syscall ID mapping in perf trace.
 * Document AMD IBS PMU usages.
 * Change `perf lock info` to show map and thread info by default.
 
 Vendor JSON events
 ------------------
 * Update Intel events and metrics
 * Add i.MX9[35] DDR metrics
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZpbe+QAKCRCMstVUGiXM
 g4X6AQCXJ+eCuBy/9IvDpo86KjemPQk/brA0X8Ufi7JCd+Vj4QD7BvDobClewn+v
 W3MhNQwIlFPJ8u3Act+tJZfUJjVF4wU=
 =ZkPv
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.11-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Namhyung Kim:
 "Build:

   - Build each directory as a library so that depedency check for the
     python extension module can be automatic

   - Use pkg-config to check libtraceevent and libtracefs

  perf sched:

   - Add --task-name and --fuzzy-name options for `perf sched map`

     It focuses on selected tasks only by removing unrelated tasks in
     the output. It matches the task comm with the given string and the
     --fuzzy-name option allows the partial matching:

       $ sudo perf sched record -a sleep 1

       $ sudo perf sched map --task-name kworker --fuzzy-name
          .   .   .   .   -  *A0  .   .    481065.315131 secs A0 => kworker/5:2-i91:438521
          .   .   .   .   -  *-   .   .    481065.315160 secs
         *B0  .   .   .   -   .   .   .    481065.316435 secs B0 => kworker/0:0-i91:437860
         *-   .   .   .   .   .   .   .    481065.316441 secs
          .   .   .   .   .  *A0  .   .    481065.318703 secs
          .   .   .   .   .  *-   .   .    481065.318717 secs
          .   .  *C0  .   .   .   .   .    481065.320544 secs C0 => kworker/u16:30-:430186
          .   .  *-   .   .   .   .   .    481065.320555 secs
          .   .  *D0  .   .   .   .   .    481065.328524 secs D0 => kworker/2:0-kdm:429654
         *B0  .   D0  .   -   .   .   .    481065.328527 secs
         *-   .   D0  .   -   .   .   .    481065.328535 secs
          .   .  *-   .   .   .   .   .    481065.328535 secs

   - Fix -r/--repeat option of perf sched replay

     The documentation said -1 will work as infinity but it didn't
     accept the value. Update the code and document to use 0 instead

   - Fix perf sched timehist to account the delay time for preempted
     tasks

  Perf event filtering:

   - perf top gained filtering support on regular events using BPF like
     perf record. Previously it was able to use it for tracepoints only

   - The BPF filter now supports filtering by UID/GID. This should be
     preferred than -u <UID> option as it's racy to scan /proc to check
     tasks for the user and fails to open an event for the task if it's
     already gone

       $ sudo perf top -e cycles --filter "uid == $(id -u)"

  perf report:

   - Skip dummy events in the group output by default. The --skip-empty
     option controls display of empty events without samples. But perf
     report can force display all events in a group

     In this case, auto-added a dummy event (for a system-wide record)
     ends up in the output. Now it can skip those empty events even in
     the group display mode

     To preserve the old behavior, run this:

       $ perf report --group --no-skip-empty

  perf stat:

   - Choose the most disaggregate option when multiple aggregation
     options are given. It used to pick the last option in the command
     line but it can be confusing and not consistent. Now it'll choose
     the smallest unit

     For example, it'd aggregate the result per-core when the user gave
     both --per-socket and --per-core options at the same time

  Internals:

   - Fix `perf bench` when some CPUs are offline

   - Fix handling of JIT symbol mappings to accept "/tmp/perf-${PID}.map
     patterns only so that it can not be confused by other /tmp/perf-*
     files

   - Many improvements and fixes for `perf test`

  Others:

   - Support some new instructions for Intel-PT

   - Fix syscall ID mapping in perf trace

   - Document AMD IBS PMU usages

   - Change `perf lock info` to show map and thread info by default

  Vendor JSON events:

   - Update Intel events and metrics

   - Add i.MX9[35] DDR metrics"

* tag 'perf-tools-for-v6.11-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (125 commits)
  perf trace: Fix iteration of syscall ids in syscalltbl->entries
  perf dso: Fix address sanitizer build
  perf mem: Warn if memory events are not supported on all CPUs
  perf arm-spe: Support multiple Arm SPE PMUs
  perf build x86: Fix SC2034 error in syscalltbl.sh
  perf record: Fix memset out-of-range error
  perf sched map: Add --fuzzy-name option for fuzzy matching in task names
  perf sched map: Add support for multiple task names using CSV
  perf sched map: Add task-name option to filter the output map
  perf build: Conditionally add feature check flags for libtrace{event,fs}
  perf install: Don't propagate subdir to Documentation submake
  perf vendor events arm64:: Add i.MX95 DDR Performance Monitor metrics
  perf vendor events arm64:: Add i.MX93 DDR Performance Monitor metrics
  perf dsos: When adding a dso into sorted dsos maintain the sort order
  perf comm str: Avoid sort during insert
  perf report: Calling available function for stats printing
  perf intel-pt: Fix exclude_guest setting
  perf intel-pt: Fix aux_watermark calculation for 64-bit size
  perf sched replay: Fix -r/--repeat command line option for infinity
  perf: pmus: Remove unneeded semicolon
  ...
2024-07-18 14:16:35 -07:00
Linus Torvalds
1777e471e1 tracing/tools: Trivial updates for 6.11
- Use pretty formatting only on interactive tty in rtla/osnoise
 
 - Better reporting when histogram is empty in rtla/osnoise
 
 - Use the correct library name for "libtracefs" in feature detection
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZpbYbRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qpWSAQD69Ea2bOCRuqCW6mjH4iPgWA3Re4Xt
 rnUI3mWT7+FSNwD/T1ElUQ8/mfJptawZl1KyACNovtE1xMn365yHsFA2Mgg=
 =fnCa
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tools updates from Steven Rostedt:
 "Trivial updates for 6.11:

   - Use pretty formatting only on interactive tty in rtla/osnoise

   - Better reporting when histogram is empty in rtla/osnoise

   - Use the correct library name for "libtracefs" in feature detection"

* tag 'trace-tools-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tools: build: use correct lib name for libtracefs feature detection
  rtla/osnoise: Better report when histogram is empty
  rtla/osnoise: Use pretty formatting only on interactive tty
2024-07-18 14:01:37 -07:00
Linus Torvalds
70045bfc4c ftrace: Rewrite of function graph tracer
Up until now, the function graph tracer could only have a single user
 attached to it. If another user tried to attach to the function graph
 tracer while one was already attached, it would fail. Allowing function
 graph tracer to have more than one user has been asked for since 2009, but
 it required a rewrite to the logic to pull it off so it never happened.
 Until now!
 
 There's three systems that trace the return of a function. That is
 kretprobes, function graph tracer, and BPF. kretprobes and function graph
 tracing both do it similarly. The difference is that kretprobes uses a
 shadow stack per callback and function graph tracer creates a shadow stack
 for all tasks. The function graph tracer method makes it possible to trace
 the return of all functions. As kretprobes now needs that feature too,
 allowing it to use function graph tracer was needed. BPF also wants to
 trace the return of many probes and its method doesn't scale either.
 Having it use function graph tracer would improve that.
 
 By allowing function graph tracer to have multiple users allows both
 kretprobes and BPF to use function graph tracer in these cases. This will
 allow kretprobes code to be removed in the future as it's version will no
 longer be needed. Note, function graph tracer is only limited to 16
 simultaneous users, due to shadow stack size and allocated slots.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZpbWlxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgtvAP9jxmgEiEhz4Bpe1vRKVSMYK6ozXHTT
 7MFKRMeQqQ8zeAEA2sD5Zrt9l7zKzg0DFpaDLgc3/yh14afIDxzTlIvkmQ8=
 =umuf
 -----END PGP SIGNATURE-----

Merge tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace updates from Steven Rostedt:
 "Rewrite of function graph tracer to allow multiple users

  Up until now, the function graph tracer could only have a single user
  attached to it. If another user tried to attach to the function graph
  tracer while one was already attached, it would fail. Allowing
  function graph tracer to have more than one user has been asked for
  since 2009, but it required a rewrite to the logic to pull it off so
  it never happened. Until now!

  There's three systems that trace the return of a function. That is
  kretprobes, function graph tracer, and BPF. kretprobes and function
  graph tracing both do it similarly. The difference is that kretprobes
  uses a shadow stack per callback and function graph tracer creates a
  shadow stack for all tasks. The function graph tracer method makes it
  possible to trace the return of all functions. As kretprobes now needs
  that feature too, allowing it to use function graph tracer was needed.
  BPF also wants to trace the return of many probes and its method
  doesn't scale either. Having it use function graph tracer would
  improve that.

  By allowing function graph tracer to have multiple users allows both
  kretprobes and BPF to use function graph tracer in these cases. This
  will allow kretprobes code to be removed in the future as it's version
  will no longer be needed.

  Note, function graph tracer is only limited to 16 simultaneous users,
  due to shadow stack size and allocated slots"

* tag 'ftrace-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (49 commits)
  fgraph: Use str_plural() in test_graph_storage_single()
  function_graph: Add READ_ONCE() when accessing fgraph_array[]
  ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct()
  function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove it
  function_graph: Fix up ftrace_graph_ret_addr()
  function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACE
  function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests
  fgraph: Remove some unused functions
  ftrace: Hide one more entry in stack trace when ftrace_pid is enabled
  function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled
  function_graph: Make fgraph_do_direct static key static
  ftrace: Fix prototypes for ftrace_startup/shutdown_subops()
  ftrace: Assign RCU list variable with rcu_assign_ptr()
  ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU
  ftrace: Declare function_trace_op in header to quiet sparse warning
  ftrace: Add comments to ftrace_hash_move() and friends
  ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify()
  ftrace: Add comments to ftrace_hash_rec_disable/enable()
  ftrace: Remove "filter_hash" parameter from __ftrace_hash_rec_update()
  ftrace: Rename dup_hash() and comment it
  ...
2024-07-18 13:36:33 -07:00