mirror of
https://github.com/torvalds/linux.git
synced 2026-05-15 01:43:11 +02:00
00a5acdbf3
46548 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
00a5acdbf3 |
bpf: Fix configuration-dependent BTF function references
These BTF functions are not available unconditionally, only reference them when they are available. Avoid the following build warnings: BTF .tmp_vmlinux1.btf.o btf_encoder__tag_kfunc: failed to find kfunc 'bpf_send_signal_task' in BTF btf_encoder__tag_kfuncs: failed to tag kfunc 'bpf_send_signal_task' NM .tmp_vmlinux1.syms KSYMS .tmp_vmlinux1.kallsyms.S AS .tmp_vmlinux1.kallsyms.o LD .tmp_vmlinux2 NM .tmp_vmlinux2.syms KSYMS .tmp_vmlinux2.kallsyms.S AS .tmp_vmlinux2.kallsyms.o LD vmlinux BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol prog_test_ref_kfunc WARN: resolve_btfids: unresolved symbol bpf_crypto_ctx WARN: resolve_btfids: unresolved symbol bpf_send_signal_task WARN: resolve_btfids: unresolved symbol bpf_modify_return_test_tp WARN: resolve_btfids: unresolved symbol bpf_dynptr_from_xdp WARN: resolve_btfids: unresolved symbol bpf_dynptr_from_skb Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213-bpf-cond-ids-v1-1-881849997219@weissschuh.net |
||
|
|
4d3ae294f9 |
bpf: Add fd_array_cnt attribute for prog_load
The fd_array attribute of the BPF_PROG_LOAD syscall may contain a set of file descriptors: maps or btfs. This field was introduced as a sparse array. Introduce a new attribute, fd_array_cnt, which, if present, indicates that the fd_array is a continuous array of the corresponding length. If fd_array_cnt is non-zero, then every map in the fd_array will be bound to the program, as if it was used by the program. This functionality is similar to the BPF_PROG_BIND_MAP syscall, but such maps can be used by the verifier during the program load. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-5-aspsk@isovalent.com |
||
|
|
76145f7255 |
bpf: Refactor check_pseudo_btf_id
Introduce a helper to add btfs to the env->used_maps array. Use it to simplify the check_pseudo_btf_id() function. This new helper will also be re-used in a consequent patch. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-4-aspsk@isovalent.com |
||
|
|
928f3221cb |
bpf: Move map/prog compatibility checks
Move some inlined map/prog compatibility checks from the resolve_pseudo_ldimm64() function to the dedicated check_map_prog_compatibility() function. Call the latter function from the add_used_map_from_fd() function directly. This simplifies code and optimizes logic a bit, as before these changes the check_map_prog_compatibility() function was executed on every map usage, which doesn't make sense, as it doesn't include any per-instruction checks, only map type vs. prog type. (This patch also simplifies a consequent patch which will call the add_used_map_from_fd() function from another code path.) Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-3-aspsk@isovalent.com |
||
|
|
4e885fab71 |
bpf: Add a __btf_get_by_fd helper
Add a new helper to get a pointer to a struct btf from a file descriptor. This helper doesn't increase a refcnt. Add a comment explaining this and pointing to a corresponding function which does take a reference. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-2-aspsk@isovalent.com |
||
|
|
442bc81bd3 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Cross-merge bpf fixes after downstream PR. Trivial conflict: tools/testing/selftests/bpf/prog_tests/verifier.c Adjacent changes in: Auto-merging kernel/bpf/verifier.c Auto-merging samples/bpf/Makefile Auto-merging tools/testing/selftests/bpf/.gitignore Auto-merging tools/testing/selftests/bpf/Makefile Auto-merging tools/testing/selftests/bpf/prog_tests/verifier.c Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
eadaac4dd2 |
- Fix a /proc/interrupts formatting regression
- Have the BCM2836 interrupt controller enter power management states properly - Other fixlets -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdVeI0ACgkQEsHwGGHe VUpE9A/9Gs2K9ImUlQu5giKEM2dhu0U/oEN/KNKayd6mjSR4MWEovvMeE16M2AUA uALQVxCviJKy6vl5J2aYw5QIhktfGwrdmr4zGK61pfhGWD8dClaCxJpbfUvtx1Bu 1K9U8PpSh1grScbcjDFBOIIzHsmQgpHFTJxVwktLXEktJsWJyzhxqvdiduMXvDlF T5WOIr/A5MwGtoP0kySQA49k0ymgUkXf1UgAl7nTksLEI50SMo3Wt7vpQ+qLk6xD 3RdywlARaVOLY5GlRsFtUTZP6o06/8aDrEmrUHnOiip9u7pKzMiJR4aKzlScCdTN BRKEegLxgJqA0uvQAfYP7kCP4l5fFAAVHmcenZCuXDlcexJsnLxYJChxJKUK7CAt wCMZifysp13aF3gyT6BfNKEPZOEqFDykxUvzT/F1d2t0Z7yq1GHIXcDHC8eG0p2H 3f/YOSi+5KgDTi+xzT1hOPn3HcTADhF7wrj2oppOmJa/FyJVrRVeL+DP+uNPD+ux tMlVknBp3nlYQ4Ll773GZmvdaamkUfw9U5eZRi04CrvVOcanq/KjKIL24bq9ODS0 i6dgPfZPwYI+nuenGv7VEpgo92/oynqGxuShESk1c1S/rCL+D2kMiNaRvR9cVsU3 +Gvr2v9B0KqLy79gw9Fhdqzbhnzhkcpa72B4JNJwEtHdO+t4iwI= =Uyoi -----END PGP SIGNATURE----- Merge tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a /proc/interrupts formatting regression - Have the BCM2836 interrupt controller enter power management states properly - Other fixlets * tag 'irq_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/stm32mp-exti: CONFIG_STM32MP_EXTI should not default to y when compile-testing genirq/proc: Add missing space separator back irqchip/bcm2836: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND irqchip/gic-v3: Fix irq_complete_ack() comment |
||
|
|
c25ca0c2e4 |
- Handle the case where clocksources with small counter width can, in
conjunction with overly long idle sleeps, falsely trigger the negative motion detection of clocksources -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdVb44ACgkQEsHwGGHe VUr/bA/+L5vH39rL4IGQ05LZ/TaZC5iSadsWeW26UccbV+5jw4q2hmVterH3TXDp ujsvPmRLPN/tjCdwMNKDRmVTc5nBoO+OHBErf+o3mqZMI8XdXz7bOEvNhjnlIPnT ILu/c8h2g3bHpFBYHsJjyhrIkfvfspG/yeD6V7SXI1r0StqRfoo6QmbbZwYpdrn3 +ORs8TdW6GEp7gJhdCzSXxzbfXnvtRsBZvsBLxoIter9Kqd+pFpVxj7CqoHWEiBM NQHN/2DG3uczoVVtOB7VK9edAYlpe9mzokB4wRClXo21D7JFze0m2TJGJ3hf9eRZ RbzZea0CQNa11NUlxoRUrN+jG/CHjnptNFycRJIEtb6YgKyoizJ/x8CBOWI3a8nU NTGBIwXAeYcYrrsP5f3bmDRcks9OO+E2quZiGJZorq1zDxzxnPs8ALmtwCB64UhD ro0VAT1d7JuMdnmFUKSwf35nLydnUiBqRC2cb03jMZAze+YmFCgMh5xjtzTPw+WE QDIR9Eu6ebSM80ldXGISHYn1wHxQVFtJ2cN9gmV+Lnaekys6huDQkARCoWfLhdYc CiqZSvlMPis+VAhkglwnHlxc/mGpfFTZXh47oXzNbXw2J2bWhhbyzl9X8UiSqw+A UOfq/JRis2j6NsmKkGvTU/hGtEPOzXw2EStAuXm+OVg5TDKwr74= =mARf -----END PGP SIGNATURE----- Merge tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Handle the case where clocksources with small counter width can, in conjunction with overly long idle sleeps, falsely trigger the negative motion detection of clocksources * tag 'timers_urgent_for_v6.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Make negative motion detection more robust |
||
|
|
553c89ec31 |
24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM.
The usual bunch of singletons - please see the relevant changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ1U/QwAKCRDdBJ7gKXxA jnE7AQC0eyNNvaL5pLCIxN/Vmr8YeuWP1dldgI29TjrH/JKjSQEAihZNqVZYjoIT Gf7Y+IKnc4LbfAXcTe+MfJFeDexM5AU= =U5LQ -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "24 hotfixes. 17 are cc:stable. 15 are MM and 9 are non-MM. The usual bunch of singletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2024-12-07-22-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits) iio: magnetometer: yas530: use signed integer type for clamp limits sched/numa: fix memory leak due to the overwritten vma->numab_state mm/damon: fix order of arguments in damos_before_apply tracepoint lib: stackinit: hide never-taken branch from compiler mm/filemap: don't call folio_test_locked() without a reference in next_uptodate_folio() scatterlist: fix incorrect func name in kernel-doc mm: correct typo in MMAP_STATE() macro mm: respect mmap hint address when aligning for THP mm: memcg: declare do_memsw_account inline mm/codetag: swap tags when migrate pages ocfs2: update seq_file index in ocfs2_dlm_seq_next stackdepot: fix stack_depot_save_flags() in NMI context mm: open-code page_folio() in dump_page() mm: open-code PageTail in folio_flags() and const_folio_flags() mm: fix vrealloc()'s KASAN poisoning logic Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()" selftests/damon: add _damon_sysfs.py to TEST_FILES selftest: hugetlb_dio: fix test naming ocfs2: free inode when ocfs2_get_init_inode() fails nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry() ... |
||
|
|
b5f217084a |
BPF fixes:
- Fix several issues for BPF LPM trie map which were found by syzbot and during addition of new test cases (Hou Tao) - Fix a missing process_iter_arg register type check in the BPF verifier (Kumar Kartikeya Dwivedi, Tao Lyu) - Fix several correctness gaps in the BPF verifier when interacting with the BPF stack without CAP_PERFMON (Kumar Kartikeya Dwivedi, Eduard Zingerman, Tao Lyu) - Fix OOB BPF map writes when deleting elements for the case of xsk map as well as devmap (Maciej Fijalkowski) - Fix xsk sockets to always clear DMA mapping information when unmapping the pool (Larysa Zaremba) - Fix sk_mem_uncharge logic in tcp_bpf_sendmsg to only uncharge after sent bytes have been finalized (Zijian Zhang) - Fix BPF sockmap with vsocks which was missing a queue check in poll and sockmap cleanup on close (Michal Luczaj) - Fix tools infra to override makefile ARCH variable if defined but empty, which addresses cross-building tools. (Björn Töpel) - Fix two resolve_btfids build warnings on unresolved bpf_lsm symbols (Thomas Weißschuh) - Fix a NULL pointer dereference in bpftool (Amir Mohammadi) - Fix BPF selftests to check for CONFIG_PREEMPTION instead of CONFIG_PREEMPT (Sebastian Andrzej Siewior) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> -----BEGIN PGP SIGNATURE----- iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZ1N8bhUcZGFuaWVsQGlv Z2VhcmJveC5uZXQACgkQ2yufC7HISIO6ZAD+ITpujJgxvFGC0R7E9o3XJ7V1SpmR SlW0lGpj6vOHTUAA/2MRoZurJSTbdT3fbWiCUgU1rMcwkoErkyxUaPuBci0D =kgXL -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Daniel Borkmann:: - Fix several issues for BPF LPM trie map which were found by syzbot and during addition of new test cases (Hou Tao) - Fix a missing process_iter_arg register type check in the BPF verifier (Kumar Kartikeya Dwivedi, Tao Lyu) - Fix several correctness gaps in the BPF verifier when interacting with the BPF stack without CAP_PERFMON (Kumar Kartikeya Dwivedi, Eduard Zingerman, Tao Lyu) - Fix OOB BPF map writes when deleting elements for the case of xsk map as well as devmap (Maciej Fijalkowski) - Fix xsk sockets to always clear DMA mapping information when unmapping the pool (Larysa Zaremba) - Fix sk_mem_uncharge logic in tcp_bpf_sendmsg to only uncharge after sent bytes have been finalized (Zijian Zhang) - Fix BPF sockmap with vsocks which was missing a queue check in poll and sockmap cleanup on close (Michal Luczaj) - Fix tools infra to override makefile ARCH variable if defined but empty, which addresses cross-building tools. (Björn Töpel) - Fix two resolve_btfids build warnings on unresolved bpf_lsm symbols (Thomas Weißschuh) - Fix a NULL pointer dereference in bpftool (Amir Mohammadi) - Fix BPF selftests to check for CONFIG_PREEMPTION instead of CONFIG_PREEMPT (Sebastian Andrzej Siewior) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (31 commits) selftests/bpf: Add more test cases for LPM trie selftests/bpf: Move test_lpm_map.c to map_tests bpf: Use raw_spinlock_t for LPM trie bpf: Switch to bpf mem allocator for LPM trie bpf: Fix exact match conditions in trie_get_next_key() bpf: Handle in-place update for full LPM trie correctly bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem bpf: Remove unnecessary check when updating LPM trie selftests/bpf: Add test for narrow spill into 64-bit spilled scalar selftests/bpf: Add test for reading from STACK_INVALID slots selftests/bpf: Introduce __caps_unpriv annotation for tests bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS bpf: Zero index arg error string for dynptr and iter selftests/bpf: Add tests for iter arg check bpf: Ensure reg is PTR_TO_STACK in process_iter_arg tools: Override makefile ARCH variable if defined, but empty selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap ... |
||
|
|
6a5c63d43c |
bpf: Use raw_spinlock_t for LPM trie
After switching from kmalloc() to the bpf memory allocator, there will be no blocking operation during the update of LPM trie. Therefore, change trie->lock from spinlock_t to raw_spinlock_t to make LPM trie usable in atomic context, even on RT kernels. The max value of prefixlen is 2048. Therefore, update or deletion operations will find the target after at most 2048 comparisons. Constructing a test case which updates an element after 2048 comparisons under a 8 CPU VM, and the average time and the maximal time for such update operation is about 210us and 900us. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-8-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
3d8dc43eb2 |
bpf: Switch to bpf mem allocator for LPM trie
Multiple syzbot warnings have been reported. These warnings are mainly about the lock order between trie->lock and kmalloc()'s internal lock. See report [1] as an example: ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted ------------------------------------------------------ syz.3.2069/15008 is trying to acquire lock: ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ... but task is already holding lock: ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ... which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&trie->lock){-.-.}-{2:2}: __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3a/0x60 trie_delete_elem+0xb0/0x820 ___bpf_prog_run+0x3e51/0xabd0 __bpf_prog_run32+0xc1/0x100 bpf_dispatcher_nop_func ...... bpf_trace_run2+0x231/0x590 __bpf_trace_contention_end+0xca/0x110 trace_contention_end.constprop.0+0xea/0x170 __pv_queued_spin_lock_slowpath+0x28e/0xcc0 pv_queued_spin_lock_slowpath queued_spin_lock_slowpath queued_spin_lock do_raw_spin_lock+0x210/0x2c0 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x42/0x60 __put_partials+0xc3/0x170 qlink_free qlist_free_all+0x4e/0x140 kasan_quarantine_reduce+0x192/0x1e0 __kasan_slab_alloc+0x69/0x90 kasan_slab_alloc slab_post_alloc_hook slab_alloc_node kmem_cache_alloc_node_noprof+0x153/0x310 __alloc_skb+0x2b1/0x380 ...... -> #0 (&n->list_lock){-.-.}-{2:2}: check_prev_add check_prevs_add validate_chain __lock_acquire+0x2478/0x3b30 lock_acquire lock_acquire+0x1b1/0x560 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3a/0x60 get_partial_node.part.0+0x20/0x350 get_partial_node get_partial ___slab_alloc+0x65b/0x1870 __slab_alloc.constprop.0+0x56/0xb0 __slab_alloc_node slab_alloc_node __do_kmalloc_node __kmalloc_node_noprof+0x35c/0x440 kmalloc_node_noprof bpf_map_kmalloc_node+0x98/0x4a0 lpm_trie_node_alloc trie_update_elem+0x1ef/0xe00 bpf_map_update_value+0x2c1/0x6c0 map_update_elem+0x623/0x910 __sys_bpf+0x90c/0x49a0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trie->lock); lock(&n->list_lock); lock(&trie->lock); lock(&n->list_lock); *** DEADLOCK *** [1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7 A bpf program attached to trace_contention_end() triggers after acquiring &n->list_lock. The program invokes trie_delete_elem(), which then acquires trie->lock. However, it is possible that another process is invoking trie_update_elem(). trie_update_elem() will acquire trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke get_partial_node() and try to acquire &n->list_lock (not necessarily the same lock object). Therefore, lockdep warns about the circular locking dependency. Invoking kmalloc() before acquiring trie->lock could fix the warning. However, since BPF programs call be invoked from any context (e.g., through kprobe/tracepoint/fentry), there may still be lock ordering problems for internal locks in kmalloc() or trie->lock itself. To eliminate these potential lock ordering problems with kmalloc()'s internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent BPF memory allocator APIs that can be invoked in any context. The lock ordering problems with trie->lock (e.g., reentrance) will be handled separately. Three aspects of this change require explanation: 1. Intermediate and leaf nodes are allocated from the same allocator. Since the value size of LPM trie is usually small, using a single alocator reduces the memory overhead of the BPF memory allocator. 2. Leaf nodes are allocated before disabling IRQs. This handles cases where leaf_size is large (e.g., > 4KB - 8) and updates require intermediate node allocation. If leaf nodes were allocated in IRQ-disabled region, the free objects in BPF memory allocator would not be refilled timely and the intermediate node allocation may fail. 3. Paired migrate_{disable|enable}() calls for node alloc and free. The BPF memory allocator uses per-CPU struct internally, these paired calls are necessary to guarantee correctness. Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-7-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
27abc7b3fa |
bpf: Fix exact match conditions in trie_get_next_key()
trie_get_next_key() uses node->prefixlen == key->prefixlen to identify
an exact match, However, it is incorrect because when the target key
doesn't fully match the found node (e.g., node->prefixlen != matchlen),
these two nodes may also have the same prefixlen. It will return
expected result when the passed key exist in the trie. However when a
recently-deleted key or nonexistent key is passed to
trie_get_next_key(), it may skip keys and return incorrect result.
Fix it by using node->prefixlen == matchlen to identify exact matches.
When the condition is true after the search, it also implies
node->prefixlen equals key->prefixlen, otherwise, the search would
return NULL instead.
Fixes:
|
||
|
|
532d6b36b2 |
bpf: Handle in-place update for full LPM trie correctly
When a LPM trie is full, in-place updates of existing elements
incorrectly return -ENOSPC.
Fix this by deferring the check of trie->n_entries. For new insertions,
n_entries must not exceed max_entries. However, in-place updates are
allowed even when the trie is full.
Fixes:
|
||
|
|
eae6a075e9 |
bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie
Add the currently missing handling for the BPF_EXIST and BPF_NOEXIST
flags. These flags can be specified by users and are relevant since LPM
trie supports exact matches during update.
Fixes:
|
||
|
|
3d5611b4d7 |
bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
There is no need to call kfree(im_node) when updating element fails, because im_node must be NULL. Remove the unnecessary kfree() for im_node. Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
156c977c53 |
bpf: Remove unnecessary check when updating LPM trie
When "node->prefixlen == matchlen" is true, it means that the node is fully matched. If "node->prefixlen == key->prefixlen" is false, it means the prefix length of key is greater than the prefix length of node, otherwise, matchlen will not be equal with node->prefixlen. However, it also implies that the prefix length of node must be less than max_prefixlen. Therefore, "node->prefixlen == trie->max_prefixlen" will always be false when the check of "node->prefixlen == key->prefixlen" returns false. Remove this unnecessary comparison. Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241206110622.1161752-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
5f1b64e9a9 |
sched/numa: fix memory leak due to the overwritten vma->numab_state
[Problem Description]
When running the hackbench program of LTP, the following memory leak is
reported by kmemleak.
# /opt/ltp/testcases/bin/hackbench 20 thread 1000
Running with 20*40 (== 800) tasks.
# dmesg | grep kmemleak
...
kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888cd8ca2c40 (size 64):
comm "hackbench", pid 17142, jiffies 4299780315
hex dump (first 32 bytes):
ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00 .tI.....L.I.....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc bff18fd4):
[<ffffffff81419a89>] __kmalloc_cache_noprof+0x2f9/0x3f0
[<ffffffff8113f715>] task_numa_work+0x725/0xa00
[<ffffffff8110f878>] task_work_run+0x58/0x90
[<ffffffff81ddd9f8>] syscall_exit_to_user_mode+0x1c8/0x1e0
[<ffffffff81dd78d5>] do_syscall_64+0x85/0x150
[<ffffffff81e0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
This issue can be consistently reproduced on three different servers:
* a 448-core server
* a 256-core server
* a 192-core server
[Root Cause]
Since multiple threads are created by the hackbench program (along with
the command argument 'thread'), a shared vma might be accessed by two or
more cores simultaneously. When two or more cores observe that
vma->numab_state is NULL at the same time, vma->numab_state will be
overwritten.
Although current code ensures that only one thread scans the VMAs in a
single 'numa_scan_period', there might be a chance for another thread
to enter in the next 'numa_scan_period' while we have not gotten till
numab_state allocation [1].
Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000`
cannot the reproduce the issue. It is verified with 200+ test runs.
[Solution]
Use the cmpxchg atomic operation to ensure that only one thread executes
the vma->numab_state assignment.
[1] https://lore.kernel.org/lkml/1794be3c-358c-4cdc-a43d-a1f841d91ef7@amd.com/
Link: https://lkml.kernel.org/r/20241113102146.2384-1-ahuang12@lenovo.com
Fixes:
|
||
|
|
b8f52214c6 |
audit/stable-6.13 PR 20241205
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmdSMGQUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNAZg//aDeY3r9tIBdE8FKXdhLFUuggkBAL BY4LR67FGFTklelO6oy+mGPxuby06BkwL3VLo086req55vL19pUOceNnBtWA6d1G ncZz6qnmFquEXTO5aDXrCIf9fG9zSSdd/D+sDjeUpq/35YJlXnEuyUMY88nm/sAQ LKSYOhiooTNRkE6MK60Wd9c6geAi8ER3dxO7l0agD3FGRrK3TOQkK2R/WsOcZTkE HtVI/s1EL+ao0s/UiY+xGuRhOgujJ0Gtokqc29m5F6a8I6SiXtXpG6okZK/7KCv0 ta/87U3VGqXeTCf2aWKWkcDZQLZLajggJthWx0vb3OsA6ppON3FCkKqtSShqhphY 7QQZV+CWXhHabwDhm4E5UrKu5JDb9wa1duyDnuVXLoPYFAfgMcukDrkrc9PWiKCr rbfV/tuwQO3XRKp643sXRcnn6lyM9KPyG77kmxyOD8duYdwX/B5Szf0o8DROwXz0 /2htu2ZH9UPfwrQhf/MduKuN7Izz+vHFzMnMZZ2l0CVQXVgYq86kcQ4BS7C+ruXw PUD+7K1l1Q4tK0mk43K8I5zUqIf8wHERgYCiGkCwJew+q4rea7LNmE4vXvEnLl4W sMMYwtFynnNuEedMY3rhk66iz9+epZyou9W3VMoNKiwMdm5DCsPcuSX88zYUDsrP TFwMQjfqgpzOtFM= =RX05 -----END PGP SIGNATURE----- Merge tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit build problem workaround from Paul Moore: "A minor audit patch that shuffles some code slightly to workaround a GCC bug affecting a number of people. The GCC folks have been able to reproduce the problem and are discussing solutions (see the bug report link in the commit), but since the workaround is trivial let's do that in the kernel so we can unblock people who are hitting this" * tag 'audit-pr-20241205' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: workaround a GCC bug triggered by task comm changes |
||
|
|
9d6a414ad3 |
tracing fixes for v6.13:
- Fix trace histogram sort function cmp_entries_dup() The sort function cmp_entries_dup() returns either 1 or 0, and not -1 if parameter "a" is less than "b" by memcmp(). - Fix archs that call trace_hardirqs_off() without RCU watching Both x86 and arm64 no longer call any tracepoints with RCU not watching. It was assumed that it was safe to get rid of trace_*_rcuidle() version of the tracepoint calls. This was needed to get rid of the SRCU protection and be able to implement features like faultable traceponits and add rust tracepoints. Unfortunately, there were a few architectures that still relied on that logic. There's only one file that has tracepoints that are called without RCU watching. Add macro logic around the tracepoints for architectures that do not have CONFIG_ARCH_WANTS_NO_INSTR defined will check if the code is in the idle path (the only place RCU isn't watching), and enable RCU around calling the tracepoint, but only do it if the tracepoint is enabled. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ1G5gxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qnsXAQCzFHRbTrrmSmvKRHdWxUxhlYjKALHA v6DCySLgdNtv0QD8D5hHeGzhVXUhECG0mUcduZ7wvaym+yAQWU5V9gUcRwU= =E8i1 -----END PGP SIGNATURE----- Merge tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix trace histogram sort function cmp_entries_dup() The sort function cmp_entries_dup() returns either 1 or 0, and not -1 if parameter "a" is less than "b" by memcmp(). - Fix archs that call trace_hardirqs_off() without RCU watching Both x86 and arm64 no longer call any tracepoints with RCU not watching. It was assumed that it was safe to get rid of trace_*_rcuidle() version of the tracepoint calls. This was needed to get rid of the SRCU protection and be able to implement features like faultable traceponits and add rust tracepoints. Unfortunately, there were a few architectures that still relied on that logic. There's only one file that has tracepoints that are called without RCU watching. Add macro logic around the tracepoints for architectures that do not have CONFIG_ARCH_WANTS_NO_INSTR defined will check if the code is in the idle path (the only place RCU isn't watching), and enable RCU around calling the tracepoint, but only do it if the tracepoint is enabled. * tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix archs that still call tracepoints without RCU watching tracing: Fix cmp_entries_dup() to respect sort() comparison rules |
||
|
|
76031d9536 |
clocksource: Make negative motion detection more robust
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.
It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.
max_idle_ns: 597268854 ns
negative motion tripping point: 671088640 ns
If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.
Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.
For the case at hand this results in a tripping point of 1174405120ns.
Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.
Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.
Fixes:
|
||
|
|
dc1b157b82 |
tracing: Fix archs that still call tracepoints without RCU watching
Tracepoints require having RCU "watching" as it uses RCU to do updates to
the tracepoints. There are some cases that would call a tracepoint when
RCU was not "watching". This was usually in the idle path where RCU has
"shutdown". For the few locations that had tracepoints without RCU
watching, there was an trace_*_rcuidle() variant that could be used. This
used SRCU for protection.
There are tracepoints that trace when interrupts and preemption are
enabled and disabled. In some architectures, these tracepoints are called
in a path where RCU is not watching. When x86 and arm64 removed these
locations, it was incorrectly assumed that it would be safe to remove the
trace_*_rcuidle() variant and also remove the SRCU logic, as it made the
code more complex and harder to implement new tracepoint features (like
faultable tracepoints and tracepoints in rust).
Instead of bringing back the trace_*_rcuidle(), as it will not be trivial
to do as new code has already been added depending on its removal, add a
workaround to the one file that still requires it (trace_preemptirq.c). If
the architecture does not define CONFIG_ARCH_WANTS_NO_INSTR, then check if
the code is in the idle path, and if so, call ct_irq_enter/exit() which
will enable RCU around the tracepoint.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20241204100414.4d3e06d0@gandalf.local.home
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes:
|
||
|
|
d9381508ea |
audit: workaround a GCC bug triggered by task comm changes
A build failure has been reported with the following details:
In file included from include/linux/string.h:390,
from include/linux/bitmap.h:13,
from include/linux/cpumask.h:12,
from include/linux/smp.h:13,
from include/linux/lockdep.h:14,
from include/linux/spinlock.h:63,
from include/linux/wait.h:9,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from kernel/auditsc.c:37:
In function 'sized_strscpy',
inlined from '__audit_ptrace' at kernel/auditsc.c:2732:2:
>> include/linux/fortify-string.h:293:17:
error: call to '__write_overflow' declared with attribute error:
detected write beyond size of object (1st parameter)
293 | __write_overflow();
| ^~~~~~~~~~~~~~~~~~
In function 'sized_strscpy',
inlined from 'audit_signal_info_syscall' at kernel/auditsc.c:2759:3:
>> include/linux/fortify-string.h:293:17:
error: call to '__write_overflow' declared with attribute error:
detected write beyond size of object (1st parameter)
293 | __write_overflow();
| ^~~~~~~~~~~~~~~~~~
The issue appears to be a GCC bug, though the root cause remains
unclear at this time. For now, let's implement a workaround.
A bug report has also been filed with GCC [0].
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117912 [0]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410171420.1V00ICVG-lkp@intel.com/
Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20241128182435.57a1ea6f@gandalf.local.home/
Reported-by: Zhuo, Qiuxu <qiuxu.zhuo@intel.com>
Closes: https://lore.kernel.org/all/CY8PR11MB71348E568DBDA576F17DAFF389362@CY8PR11MB7134.namprd11.prod.outlook.com/
Originally-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/linux-hardening/202410171059.C2C395030@keescook/
Signed-off-by: Yafang shao <laoar.shao@gmail.com>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Yafang Shao <laoar.shao@gmail.com>
[PM: subject tweak, description line wrapping]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
||
|
|
b0e66977dc |
bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots
When CAP_PERFMON and CAP_SYS_ADMIN (allow_ptr_leaks) are disabled, the
verifier aims to reject partial overwrite on an 8-byte stack slot that
contains a spilled pointer.
However, in such a scenario, it rejects all partial stack overwrites as
long as the targeted stack slot is a spilled register, because it does
not check if the stack slot is a spilled pointer.
Incomplete checks will result in the rejection of valid programs, which
spill narrower scalar values onto scalar slots, as shown below.
0: R1=ctx() R10=fp0
; asm volatile ( @ repro.bpf.c:679
0: (7a) *(u64 *)(r10 -8) = 1 ; R10=fp0 fp-8_w=1
1: (62) *(u32 *)(r10 -8) = 1
attempt to corrupt spilled pointer on stack
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0.
Fix this by expanding the check to not consider spilled scalar registers
when rejecting the write into the stack.
Previous discussion on this patch is at link [0].
[0]: https://lore.kernel.org/bpf/20240403202409.2615469-1-tao.lyu@epfl.ch
Fixes:
|
||
|
|
69772f509e |
bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc
Inside mark_stack_slot_misc, we should not upgrade STACK_INVALID to
STACK_MISC when allow_ptr_leaks is false, since invalid contents
shouldn't be read unless the program has the relevant capabilities.
The relaxation only makes sense when env->allow_ptr_leaks is true.
However, such conversion in privileged mode becomes unnecessary, as
invalid slots can be read without being upgraded to STACK_MISC.
Currently, the condition is inverted (i.e. checking for true instead of
false), simply remove it to restore correct behavior.
Fixes:
|
||
|
|
cbd8730aea |
bpf: Improve verifier log for resource leak on exit
The verifier log when leaking resources on BPF_EXIT may be a bit confusing, as it's a problem only when finally existing from the main prog, not from any of the subprogs. Hence, update the verifier error string and the corresponding selftests matching on it. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
c8e2ee1f3d |
bpf: Introduce support for bpf_local_irq_{save,restore}
Teach the verifier about IRQ-disabled sections through the introduction of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable them, and bpf_local_irq_restore, to restore IRQ state and enable them back again. For the purposes of tracking the saved IRQ state, the verifier is taught about a new special object on the stack of type STACK_IRQ_FLAG. This is a 8 byte value which saves the IRQ flags which are to be passed back to the IRQ restore kfunc. Renumber the enums for REF_TYPE_* to simplify the check in find_lock_state, filtering out non-lock types as they grow will become cumbersome and is unecessary. To track a dynamic number of IRQ-disabled regions and their associated saved states, a new resource type RES_TYPE_IRQ is introduced, which its state management functions: acquire_irq_state and release_irq_state, taking advantage of the refactoring and clean ups made in earlier commits. One notable requirement of the kernel's IRQ save and restore API is that they cannot happen out of order. For this purpose, when releasing reference we keep track of the prev_id we saw with REF_TYPE_IRQ. Since reference states are inserted in increasing order of the index, this is used to remember the ordering of acquisitions of IRQ saved states, so that we maintain a logical stack in acquisition order of resource identities, and can enforce LIFO ordering when restoring IRQ state. The top of the stack is maintained using bpf_verifier_state's active_irq_id. To maintain the stack property when releasing reference states, we need to modify release_reference_state to instead shift the remaining array left using memmove instead of swapping deleted element with last that might break the ordering. A selftest to test this subtle behavior is added in late patches. The logic to detect initialized and unitialized irq flag slots, marking and unmarking is similar to how it's done for iterators. No additional checks are needed in refsafe for REF_TYPE_IRQ, apart from the usual check_id satisfiability check on the ref[i].id. We have to perform the same check_ids check on state->active_irq_id as well. To ensure we don't get assigned REF_TYPE_PTR by default after acquire_reference_state, if someone forgets to assign the type, let's also renumber the enum ref_state_type. This way any unassigned types get caught by refsafe's default switch statement, don't assume REF_TYPE_PTR by default. The kfuncs themselves are plain wrappers over local_irq_save and local_irq_restore macros. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
b79f5f54e1 |
bpf: Refactor mark_{dynptr,iter}_read
There is possibility of sharing code between mark_dynptr_read and mark_iter_read for updating liveness information of their stack slots. Consolidate common logic into mark_stack_slot_obj_read function in preparation for the next patch which needs the same logic for its own stack slots. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
769b0f1c82 |
bpf: Refactor {acquire,release}_reference_state
In preparation for introducing support for more reference types which have to add and remove reference state, refactor the acquire_reference_state and release_reference_state functions to share common logic. The acquire_reference_state function simply handles growing the acquired refs and returning the pointer to the new uninitialized element, which can be filled in by the caller. The release_reference_state function simply erases a reference state entry in the acquired_refs array and shrinks it. The callers are responsible for finding the suitable element by matching on various fields of the reference state and requesting deletion through this function. It is not supposed to be called directly. Existing callers of release_reference_state were using it to find and remove state for a given ref_obj_id without scrubbing the associated registers in the verifier state. Introduce release_reference_nomark to provide this functionality and convert callers. We now use this new release_reference_nomark function within release_reference as well. It needs to operate on a verifier state instead of taking verifier env as mark_ptr_or_null_regs requires operating on verifier state of the two branches of a NULL condition check, therefore env->cur_state cannot be used directly. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
1995edc5f9 |
bpf: Consolidate locks and reference state in verifier state
Currently, state for RCU read locks and preemption is in bpf_verifier_state, while locks and pointer reference state remains in bpf_func_state. There is no particular reason to keep the latter in bpf_func_state. Additionally, it is copied into a new frame's state and copied back to the caller frame's state everytime the verifier processes a pseudo call instruction. This is a bit wasteful, given this state is global for a given verification state / path. Move all resource and reference related state in bpf_verifier_state structure in this patch, in preparation for introducing new reference state types in the future. Since we switch print_verifier_state and friends to print using vstate, we now need to explicitly pass in the verifier state from the caller along with the bpf_func_state, so modify the prototype and callers to do so. To ensure func state matches the verifier state when we're printing data, take in frame number instead of bpf_func_state pointer instead and avoid inconsistencies induced by the caller. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241204030400.208005-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
e63fbd5f68 |
tracing: Fix cmp_entries_dup() to respect sort() comparison rules
The cmp_entries_dup() function used as the comparator for sort()
violated the symmetry and transitivity properties required by the
sorting algorithm. Specifically, it returned 1 whenever memcmp() was
non-zero, which broke the following expectations:
* Symmetry: If x < y, then y > x.
* Transitivity: If x < y and y < z, then x < z.
These violations could lead to incorrect sorting and failure to
correctly identify duplicate elements.
Fix the issue by directly returning the result of memcmp(), which
adheres to the required comparison properties.
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
9d9f204bdf |
genirq/proc: Add missing space separator back
The recent conversion of show_interrupts() to seq_put_decimal_ull_width()
caused a formatting regression as it drops a previosuly existing space
separator.
Add it back by unconditionally inserting a space after the interrupt
counts and removing the extra leading space from the chip name prints.
Fixes:
|
||
|
|
bd74e238ae |
bpf: Zero index arg error string for dynptr and iter
Andrii spotted that process_dynptr_func's rejection of incorrect argument register type will print an error string where argument numbers are not zero-indexed, unlike elsewhere in the verifier. Fix this by subtracting 1 from regno. The same scenario exists for iterator messages. Fix selftest error strings that match on the exact argument number while we're at it to ensure clean bisection. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241203002235.3776418-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
12659d2861 |
bpf: Ensure reg is PTR_TO_STACK in process_iter_arg
Currently, KF_ARG_PTR_TO_ITER handling missed checking the reg->type and
ensuring it is PTR_TO_STACK. Instead of enforcing this in the caller of
process_iter_arg, move the check into it instead so that all callers
will gain the check by default. This is similar to process_dynptr_func.
An existing selftest in verifier_bits_iter.c fails due to this change,
but it's because it was passing a NULL pointer into iter_next helper and
getting an error further down the checks, but probably meant to pass an
uninitialized iterator on the stack (as is done in the subsequent test
below it). We will gain coverage for non-PTR_TO_STACK arguments in later
patches hence just change the declaration to zero-ed stack object.
Fixes:
|
||
|
|
cdd30ebb1b |
module: Convert symbol namespace to string literal
Clean up the existing export namespace code along the same lines of
commit
|
||
|
|
3bfb49d73f |
bpf: Refactor bpf_tracing_func_proto() and remove bpf_get_probe_write_proto()
With bpf_get_probe_write_proto() no longer printing a message, we can avoid it being a special case with its own permission check. Refactor bpf_tracing_func_proto() similar to bpf_base_func_proto() to have a section conditional on bpf_token_capable(CAP_SYS_ADMIN), where the proto for bpf_probe_write_user() is returned. Finally, remove the unnecessary bpf_get_probe_write_proto(). This simplifies the code, and adding additional CAP_SYS_ADMIN-only helpers in future avoids duplicating the same CAP_SYS_ADMIN check. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Marco Elver <elver@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20241129090040.2690691-2-elver@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
b28573ebfa |
bpf: Remove bpf_probe_write_user() warning message
The warning message for bpf_probe_write_user() was introduced in |
||
|
|
f788b5ef1c |
- Fix a case where posix timers with a thread-group-wide target would miss
signals if some of the group's threads are exiting - Fix a hang caused by ndelay() calling the wrong delay function __udelay() - Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO (microsecond resolution) and a negative offset -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdMQ2sACgkQEsHwGGHe VUoRGBAAt8luiDBdMHIcD053RHsLr7Oocg5AI/t0PVxYxJ+89o0cSdDx2vaaXiyX +vRSkdvH5mfwvwW4XRJZkVWbzOjMiA6m7FwH667XGzEedIq4vtgs5Rd/1YStSfIx ceQfD2N+34esamxiGGBlzjNO2GdqI2XMo/Fc6LuPCTfPBqELCL8OpbEdOV8Ltwxr mRsmbCNazBtw31Yo3zp9UZIVVSAzJFmWOoK0M+xm6S91YPYaKQ9RYk2QQwLizVgR N++dniNV6yZuSLTzr4dNckrvl744Iqc4Sy8iy2CL9rNFZkb+3q5CAAQggGNlY2U9 0W95tgwpy/Qt6drfsyam3+PR5Smwjnh/0mrk3sLzUCdy9Y6L2HgKmrvHk4Rqq/66 N6uIjIDmou+L0FUcdUducRnMOgQnvfIB/l6hIAHHkDap7iL8oy74JDzzk0jnNKHw 1I5kGbKqXz0ucdxge6H1BHqCc/roobwC05/TWLPAQ5IG0BtQFPGAwd901AZtANkk /FfWUq7IT6PW05T2co7O75NjgMvU3QV0Sf5E9vkV/+R9WtTKT13FmZ8+rC6zaC7o Juml/lRWeTCyuot3vv29NtcvY6j+gy/RrKWL4iNWDlXznntR2DAhIkzRCF+1yTSb z0RSOrY2BSsk2iqeUh8ydet5OEyPiMXwiHVbxUHzJ4R/7qaxsB8= =X4bb -----END PGP SIGNATURE----- Merge tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Borislav Petkov: - Fix a case where posix timers with a thread-group-wide target would miss signals if some of the group's threads are exiting - Fix a hang caused by ndelay() calling the wrong delay function __udelay() - Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO (microsecond resolution) and a negative offset * tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timers: Target group sigqueue to current task only if not exiting delay: Fix ndelay() spuriously treated as udelay() ntp: Remove invalid cast in time offset math |
||
|
|
133577cad6 |
dma-mapping fix for Linux 6.13
- fix physical address calculation for struct dma_debug_entry
(Fedor Pchelkin)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmdKvyMLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYO6Sg/+LruMOlIBJ+X9E3H+c39JSiMteM5XVDPKLGpOXW01
W3UpOh1vRhvmsoYmQaL/6Nalr0tc/bxb+obklHzimBbBsztuwaUEuY0DPcmeYpZw
RZkUf/YX0lsf5cf5i2/bmozbiXnnbfp2g1FEv34m3W3ehLydLoBhyNZ8lqDGAt+a
JN4s30j1CG6k5/NOnhzpMa2qVfs9GNR1MC0XJaWWybdtGYQr9tFVibS/7X8K5IOk
dPUsoF2QFF5ODWBzhJqZnXlX23N0EC2EzVsgywTyKc2uCrSmcldidH2K8LnkmLPH
gdNDwSAA48AbIdL1WnfVT4zyJKBl6TBTGqAkvreY6DyIfGZN8u9++3FowLJ13jdK
vCJltoF1tf/66CBpMZAI+s9TnGT6YiwUqyheTVEIzbCSvH0Nby52iSci3FVTndoj
otVPQMBbtzo/ZgC0tWQ0Fb1030p4OJrQJsdqHH6Y/a8J6px6AqTFf1tVumeO52P8
pb3cadyX5VD3ACrqd5xl17AEwfatIBremFTq8XOlEohwRrSwSACsHValK+Mxrvzw
6NpRuNPpz51u+Ii4/AzAOTHAZ/8+9AcVc26/ARpIW04nw3sJzy5mL+ND56/6oMOd
J3T3fy+OTMZ6tKbmwTgjg/MAh8wQ7L+thlZaDGz5ubXVNqra/wHnTqFx+Gou9tRv
9cc=
=TU77
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- fix physical address calculation for struct dma_debug_entry (Fedor
Pchelkin)
* tag 'dma-mapping-6.13-2024-11-30' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: fix physical address calculation for struct dma_debug_entry
|
||
|
|
55cb93fd24 |
Driver core changes for 6.13-rc1
Here is a small set of driver core changes for 6.13-rc1.
Nothing major for this merge cycle, except for the 2 simple merge
conflicts are here just to make life interesting.
Included in here are:
- sysfs core changes and preparations for more sysfs api cleanups that
can come through all driver trees after -rc1 is out
- fw_devlink fixes based on many reports and debugging sessions
- list_for_each_reverse() removal, no one was using it!
- last-minute seq_printf() format string bug found and fixed in many
drivers all at once.
- minor bugfixes and changes full details in the shortlog
As mentioned above, there is 2 merge conflicts with your tree, one is
where the file is removed (easy enough to resolve), the second is a
build time error, that has been found in linux-next and the fix can be
seen here:
https://lore.kernel.org/r/20241107212645.41252436@canb.auug.org.au
Other than that, the changes here have been in linux-next with no other
reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ0lEog8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym+0ACgw6wN+LkLVIHWhxTq5DYHQ0QCxY8AoJrRIcKe
78h0+OU3OXhOy8JGz62W
=oI5S
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is a small set of driver core changes for 6.13-rc1.
Nothing major for this merge cycle, except for the two simple merge
conflicts are here just to make life interesting.
Included in here are:
- sysfs core changes and preparations for more sysfs api cleanups
that can come through all driver trees after -rc1 is out
- fw_devlink fixes based on many reports and debugging sessions
- list_for_each_reverse() removal, no one was using it!
- last-minute seq_printf() format string bug found and fixed in many
drivers all at once.
- minor bugfixes and changes full details in the shortlog"
* tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits)
Fix a potential abuse of seq_printf() format string in drivers
cpu: Remove spurious NULL in attribute_group definition
s390/con3215: Remove spurious NULL in attribute_group definition
perf: arm-ni: Remove spurious NULL in attribute_group definition
driver core: Constify bin_attribute definitions
sysfs: attribute_group: allow registration of const bin_attribute
firmware_loader: Fix possible resource leak in fw_log_firmware_info()
drivers: core: fw_devlink: Fix excess parameter description in docstring
driver core: class: Correct WARN() message in APIs class_(for_each|find)_device()
cacheinfo: Use of_property_present() for non-boolean properties
cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap()
drivers: core: fw_devlink: Make the error message a bit more useful
phy: tegra: xusb: Set fwnode for xusb port devices
drm: display: Set fwnode for aux bus devices
driver core: fw_devlink: Stop trying to optimize cycle detection logic
driver core: Constify attribute arguments of binary attributes
sysfs: bin_attribute: add const read/write callback variants
sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR()
sysfs: treewide: constify attribute callback of bin_attribute::llseek()
sysfs: treewide: constify attribute callback of bin_attribute::mmap()
...
|
||
|
|
63dffecfba |
posix-timers: Target group sigqueue to current task only if not exiting
A sigqueue belonging to a posix timer, which target is not a specific
thread but a whole thread group, is preferrably targeted to the current
task if it is part of that thread group.
However nothing prevents a posix timer event from queueing such a
sigqueue from a reaped yet running task. The interruptible code space
between exit_notify() and the final call to schedule() is enough for
posix_timer_fn() hrtimer to fire.
If that happens while the current task is part of the thread group
target, it is proposed to handle it but since its sighand pointer may
have been cleared already, the sigqueue is dropped even if there are
other tasks running within the group that could handle it.
As a result posix timers with thread group wide target may miss signals
when some of their threads are exiting.
Fix this with verifying that the current task hasn't been through
exit_notify() before proposing it as a preferred target so as to ensure
that its sighand is still here and stable.
complete_signal() might still reconsider the choice and find a better
target within the group if current has passed retarget_shared_pending()
already.
Fixes:
|
||
|
|
7af08b57bc |
Tracing updates for 6.13:
- Add trace flag for NEED_RESCHED_LAZY Now that NEED_RESCHED_LAZY is upstream, add it to the status bits of the common_flags. This will now show when the NEED_RESCHED_LAZY flag is set that is used for debugging latency issues in the kernel via a trace. - Remove leftover "__idx" variable when SRCU was removed from the tracepoint code - Add rcu_tasks_trace guard To add a guard() around the tracepoint code, a rcu_tasks_trace guard needs to be created first. - Remove __DO_TRACE() macro and just call __DO_TRACE_CALL() directly The DO_TRACE() macro has conditional locking depending on what was passed into the macro parameters. As the guts of the macro has been moved to __DO_TRACE_CALL() to handle static call logic, there's no reason to keep the __DO_TRACE() macro around. It is better to just do the locking in place without the conditionals and call __DO_TRACE_CALL() from those locations. The "cond" passed in can also be moved out of that macro. This simplifies the code. - Remove the "cond" from the system call tracepoint macros The "cond" variable was added to allow some tracepoints to check a condition within the static_branch (jump/nop) logic. The system calls do not need this. Removing it simplifies the code. - Replace scoped_guard() with just guard() in the tracepoint logic guard() works just as well as scoped_guard() in the tracepoint logic and the scoped_guard() causes some issues. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ0dGmBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qsZkAP9cm2psIGp2n1BgVjA+0tBRQJUnexEG RualDkF5wAETLwD9FNFI/EUwDR/E8gNt0SY309EJZ1ijRiLjtU0spbQmdgs= =awid -----END PGP SIGNATURE----- Merge tag 'trace-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing updates from Steven Rostedt: - Add trace flag for NEED_RESCHED_LAZY Now that NEED_RESCHED_LAZY is upstream, add it to the status bits of the common_flags. This will now show when the NEED_RESCHED_LAZY flag is set that is used for debugging latency issues in the kernel via a trace. - Remove leftover "__idx" variable when SRCU was removed from the tracepoint code - Add rcu_tasks_trace guard To add a guard() around the tracepoint code, a rcu_tasks_trace guard needs to be created first. - Remove __DO_TRACE() macro and just call __DO_TRACE_CALL() directly The DO_TRACE() macro has conditional locking depending on what was passed into the macro parameters. As the guts of the macro has been moved to __DO_TRACE_CALL() to handle static call logic, there's no reason to keep the __DO_TRACE() macro around. It is better to just do the locking in place without the conditionals and call __DO_TRACE_CALL() from those locations. The "cond" passed in can also be moved out of that macro. This simplifies the code. - Remove the "cond" from the system call tracepoint macros The "cond" variable was added to allow some tracepoints to check a condition within the static_branch (jump/nop) logic. The system calls do not need this. Removing it simplifies the code. - Replace scoped_guard() with just guard() in the tracepoint logic guard() works just as well as scoped_guard() in the tracepoint logic and the scoped_guard() causes some issues. * tag 'trace-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use guard() rather than scoped_guard() tracing: Remove cond argument from __DECLARE_TRACE_SYSCALL tracing: Remove conditional locking from __DO_TRACE() rcupdate_trace: Define rcu_tasks_trace lock guard tracing: Remove __idx variable from __DO_TRACE tracing: Move it_func[0] comment to the relevant context tracing: Record task flag NEED_RESCHED_LAZY. |
||
|
|
f5807b0606 |
ntp: Remove invalid cast in time offset math
Due to an unsigned cast, adjtimex() returns the wrong offest when using
ADJ_MICRO and the offset is negative. In this case a small negative offset
returns approximately 4.29 seconds (~ 2^32/1000 milliseconds) due to the
unsigned cast of the negative offset.
This cast was added when the kernel internal struct timex was changed to
use type long long for the time offset value to address the problem of a
64bit/32bit division on 32bit systems.
The correct cast would have been (s32), which is correct as time_offset can
only be in the range of [INT_MIN..INT_MAX] because the shift constant used
for calculating it is 32. But that's non-obvious.
Remove the cast and use div_s64() to cure the issue.
[ tglx: Fix white space damage, use div_s64() and amend the change log ]
Fixes:
|
||
|
|
aef7ee7649 |
dma-debug: fix physical address calculation for struct dma_debug_entry
Offset into the page should also be considered while calculating a physical
address for struct dma_debug_entry. page_to_phys() just shifts the value
PAGE_SHIFT bits to the left so offset part is zero-filled.
An example (wrong) debug assertion failure with CONFIG_DMA_API_DEBUG
enabled which is observed during systemd boot process after recent
dma-debug changes:
DMA-API: e1000 0000:00:03.0: cacheline tracking EEXIST, overlapping mappings aren't supported
WARNING: CPU: 4 PID: 941 at kernel/dma/debug.c:596 add_dma_entry
CPU: 4 UID: 0 PID: 941 Comm: ip Not tainted 6.12.0+ #288
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:add_dma_entry kernel/dma/debug.c:596
Call Trace:
<TASK>
debug_dma_map_page kernel/dma/debug.c:1236
dma_map_page_attrs kernel/dma/mapping.c:179
e1000_alloc_rx_buffers drivers/net/ethernet/intel/e1000/e1000_main.c:4616
...
Found by Linux Verification Center (linuxtesting.org).
Fixes:
|
||
|
|
b5361254c9 |
Modules changes for v6.13-rc1
Highlights for this merge window:
* The whole caching of module code into huge pages by Mike Rapoport is going
in through Andrew Morton's tree due to some other code dependencies. That's
really the biggest highlight for Linux kernel modules in this release. With
it we share huge pages for modules, starting off with x86. Expect to see that
soon through Andrew!
* Helge Deller addressed some lingering low hanging fruit alignment
enhancements by. It is worth pointing out that from his old patch series
I dropped his vmlinux.lds.h change at Masahiro's request as he would
prefer this to be specified in asm code [0].
[0] https://lore.kernel.org/all/20240129192644.3359978-5-mcgrof@kernel.org/T/#m9efef5e700fbecd28b7afb462c15eed8ba78ef5a
* Matthew Maurer and Sami Tolvanen have been tag teaming to help
get us closer to a modversions for Rust. In this cycle we take in
quite a lot of the refactoring for ELF validation. I expect modversions
for Rust will be merged by v6.14 as that code is mostly ready now.
* Adds a new modules selftests: kallsyms which helps us tests find_symbol()
and the limits of kallsyms on Linux today.
* We have a realtime mailing list to kernel-ci testing for modules now
which relies and combines patchwork, kpd and kdevops:
- https://patchwork.kernel.org/project/linux-modules/list/
- https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/README.md
- https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/kernel-ci-kpd.md
- https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/linux-modules-kdevops-ci.md
If you want to help avoid Linux kernel modules regressions, now its simple,
just add a new Linux modules sefltests under tools/testing/selftests/module/
That is it. All new selftests will be used and leveraged automatically by
the CI.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmdGbrcSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinIDEQAMa1H7hsneNT0Z/YewzOfdSKZIkTzpk3
/fLl7PfWyFvk7yHT1JiUXidS/80SEMnWb+u8Sn00/uvcJomnPcK9oTwTzBQ0vefl
FWIUM0DmBzBOi5xdjrPLjg5o6TFt7hVae3hoRJzIlLD02vGfrPYpyHo7XmRrLM4C
8p+3geziwZMpjcGM254eSiTGxNL8z1iZVRsz8QrrBruRfBDnHNgwtmK097v13Xdb
qmLX6CN2irmNPZSZwDqP8QL2sJk9qQpNdPmpjMvaY3VfaMVkM46FLy0k9yeXXNqw
E1p/GuylCZq4NG1hic9zB1I1CE910ugCztJnPcGw4C7CSm54YoLiUJrIeRyTZhk6
et9N25AlJHxyq72GIRTMQCA9Njxaavx5KilvuWYZmaILfeI0k/3gvcxUqp/EJQ9Q
axPu69HJFRSKMVh1o+QrSaPmEtSydpYwuuNJ6ONRpq5I3bzOVDSCroceAdXEMO9K
yoSfm4KwN/BSnmX6KVLonrSM91nv2/v9UokuaZMV/CsDpXIZs996PvAoopCm1Twb
K3fv0uD+2q2FTOOBInkuRJo2zBUvNnDRPAS2pE3DMXy8xhsQXdovEpjijuCGb8eC
y0R+I4RIugIB2n6YBUFfyma1veGlT3PtrWQnO6E3YJpv8bqIJoYVT5IGo9M9YRO9
lzjtR9NzGtmh
=Ny84
-----END PGP SIGNATURE-----
Merge tag 'modules-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull modules updates from Luis Chamberlain:
- The whole caching of module code into huge pages by Mike Rapoport is
going in through Andrew Morton's tree due to some other code
dependencies. That's really the biggest highlight for Linux kernel
modules in this release. With it we share huge pages for modules,
starting off with x86. Expect to see that soon through Andrew!
- Helge Deller addressed some lingering low hanging fruit alignment
enhancements by. It is worth pointing out that from his old patch
series I dropped his vmlinux.lds.h change at Masahiro's request as he
would prefer this to be specified in asm code [0].
[0] https://lore.kernel.org/all/20240129192644.3359978-5-mcgrof@kernel.org/T/#m9efef5e700fbecd28b7afb462c15eed8ba78ef5a
- Matthew Maurer and Sami Tolvanen have been tag teaming to help get us
closer to a modversions for Rust. In this cycle we take in quite a
lot of the refactoring for ELF validation. I expect modversions for
Rust will be merged by v6.14 as that code is mostly ready now.
- Adds a new modules selftests: kallsyms which helps us tests
find_symbol() and the limits of kallsyms on Linux today.
- We have a realtime mailing list to kernel-ci testing for modules now
which relies and combines patchwork, kpd and kdevops:
https://patchwork.kernel.org/project/linux-modules/list/
https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/README.md
https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/kernel-ci-kpd.md
https://github.com/linux-kdevops/kdevops/blob/main/docs/kernel-ci/linux-modules-kdevops-ci.md
If you want to help avoid Linux kernel modules regressions, now its
simple, just add a new Linux modules sefltests under
tools/testing/selftests/module/ That is it. All new selftests will be
used and leveraged automatically by the CI.
* tag 'modules-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
tests/module/gen_test_kallsyms.sh: use 0 value for variables
scripts: Remove export_report.pl
selftests: kallsyms: add MODULE_DESCRIPTION
selftests: add new kallsyms selftests
module: Reformat struct for code style
module: Additional validation in elf_validity_cache_strtab
module: Factor out elf_validity_cache_strtab
module: Group section index calculations together
module: Factor out elf_validity_cache_index_str
module: Factor out elf_validity_cache_index_sym
module: Factor out elf_validity_cache_index_mod
module: Factor out elf_validity_cache_index_info
module: Factor out elf_validity_cache_secstrings
module: Factor out elf_validity_cache_sechdrs
module: Factor out elf_validity_ehdr
module: Take const arg in validate_section_offset
modules: Add missing entry for __ex_table
modules: Ensure 64-bit alignment on __ksymtab_* sections
|
||
|
|
3b83203538
|
Revert "fs: don't block i_writecount during exec"
This reverts commit |
||
|
|
f5f4745a7f |
- The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code.
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of task_struct.comm[].
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest.
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code.
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification.
- The series "add detect count for hung tasks" from Lance Yang adds more
userspace visibility into the hung-task detector's activity.
- Apart from that, singelton patches in many places - please see the
individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ0L6lQAKCRDdBJ7gKXxA
jmEIAPwMSglNPKRIOgzOvHh8MUJW1Dy8iKJ2kWCO3f6QTUIM2AEA+PazZbUd/g2m
Ii8igH0UBibIgva7MrCyJedDI1O23AA=
=8BIU
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "resource: A couple of cleanups" from Andy Shevchenko
performs some cleanups in the resource management code
- The series "Improve the copy of task comm" from Yafang Shao addresses
possible race-induced overflows in the management of
task_struct.comm[]
- The series "Remove unnecessary header includes from
{tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
small fix to the list_sort library code and to its selftest
- The series "Enhance min heap API with non-inline functions and
optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
min_heap library code
- The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
finishes off nilfs2's folioification
- The series "add detect count for hung tasks" from Lance Yang adds
more userspace visibility into the hung-task detector's activity
- Apart from that, singelton patches in many places - please see the
individual changelogs for details
* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
gdb: lx-symbols: do not error out on monolithic build
kernel/reboot: replace sprintf() with sysfs_emit()
lib: util_macros_kunit: add kunit test for util_macros.h
util_macros.h: fix/rework find_closest() macros
Improve consistency of '#error' directive messages
ocfs2: fix uninitialized value in ocfs2_file_read_iter()
hung_task: add docs for hung_task_detect_count
hung_task: add detect count for hung tasks
dma-buf: use atomic64_inc_return() in dma_buf_getfile()
fs/proc/kcore.c: fix coccinelle reported ERROR instances
resource: avoid unnecessary resource tree walking in __region_intersects()
ocfs2: remove unused errmsg function and table
ocfs2: cluster: fix a typo
lib/scatterlist: use sg_phys() helper
checkpatch: always parse orig_commit in fixes tag
nilfs2: convert metadata aops from writepage to writepages
nilfs2: convert nilfs_recovery_copy_block() to take a folio
nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
nilfs2: remove nilfs_writepage
nilfs2: convert checkpoint file to be folio-based
...
|
||
|
|
ab244dd7cf |
bpf: fix OOB devmap writes when deleting elements
Jordy reported issue against XSKMAP which also applies to DEVMAP - the
index used for accessing map entry, due to being a signed integer,
causes the OOB writes. Fix is simple as changing the type from int to
u32, however, when compared to XSKMAP case, one more thing needs to be
addressed.
When map is released from system via dev_map_free(), we iterate through
all of the entries and an iterator variable is also an int, which
implies OOB accesses. Again, change it to be u32.
Example splat below:
[ 160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000
[ 160.731662] #PF: supervisor read access in kernel mode
[ 160.736876] #PF: error_code(0x0000) - not-present page
[ 160.742095] PGD 0 P4D 0
[ 160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP
[ 160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ #487
[ 160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 160.767642] Workqueue: events_unbound bpf_map_free_deferred
[ 160.773308] RIP: 0010:dev_map_free+0x77/0x170
[ 160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 <48> 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff
[ 160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202
[ 160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024
[ 160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000
[ 160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001
[ 160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122
[ 160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000
[ 160.838310] FS: 0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000
[ 160.846528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0
[ 160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 160.874092] PKRU: 55555554
[ 160.876847] Call Trace:
[ 160.879338] <TASK>
[ 160.881477] ? __die+0x20/0x60
[ 160.884586] ? page_fault_oops+0x15a/0x450
[ 160.888746] ? search_extable+0x22/0x30
[ 160.892647] ? search_bpf_extables+0x5f/0x80
[ 160.896988] ? exc_page_fault+0xa9/0x140
[ 160.900973] ? asm_exc_page_fault+0x22/0x30
[ 160.905232] ? dev_map_free+0x77/0x170
[ 160.909043] ? dev_map_free+0x58/0x170
[ 160.912857] bpf_map_free_deferred+0x51/0x90
[ 160.917196] process_one_work+0x142/0x370
[ 160.921272] worker_thread+0x29e/0x3b0
[ 160.925082] ? rescuer_thread+0x4b0/0x4b0
[ 160.929157] kthread+0xd4/0x110
[ 160.932355] ? kthread_park+0x80/0x80
[ 160.936079] ret_from_fork+0x2d/0x50
[ 160.943396] ? kthread_park+0x80/0x80
[ 160.950803] ret_from_fork_asm+0x11/0x20
[ 160.958482] </TASK>
Fixes:
|
||
|
|
8618f5ffba |
bpf, lsm: Remove getlsmprop hooks BTF IDs
These hooks are not useful for BPF LSM currently.
Furthermore a recent renaming introduced build warnings:
BTFIDS vmlinux
WARN: resolve_btfids: unresolved symbol bpf_lsm_task_getsecid_obj
WARN: resolve_btfids: unresolved symbol bpf_lsm_current_getsecid_subj
Link: https://lore.kernel.org/lkml/20241123-bpf_lsm_task_getsecid_obj-v1-1-0d0f94649e05@weissschuh.net/
Fixes:
|
||
|
|
43a43faf53 |
futex: improve user space accesses
Josh Poimboeuf reports that he got a "will-it-scale.per_process_ops 1.9% improvement" report for his patch that changed __get_user() to use pointer masking instead of the explicit speculation barrier. However, that patch doesn't actually work in the general case, because some (very bad) architecture-specific code actually depends on __get_user() also working on kernel addresses. A profile showed that the offending __get_user() was the futex code, which really should be fixed up to not use that horrid legacy case. Rewrite futex_get_value_locked() to use the modern user acccess helpers, and inline it so that the compiler not only avoids the function call for a few instructions, but can do CSE on the address masking. It also turns out the x86 futex functions have unnecessary barriers in other places, so let's fix those up too. Link: https://lore.kernel.org/all/20241115230653.hfvzyf3aqqntgp63@jpoimboe/ Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |