linux

mirror of https://github.com/torvalds/linux.git synced 2026-07-27 01:32:21 +02:00

Author	SHA1	Message	Date
Linus Torvalds	ae453eef92	BPF fixes: - Fix tcp_bpf_sendmsg() error path mistaking a concurrently-freed sk_psock->cork for the local temporary message and freeing it again. (Chengfeng Ye) - Reject passing scalar NULL to nonnull arg of a global subprog. Previously the verifier did not account for the cases directly passing scalars to a global subprog, e.g.: 'global_func(0);' would pass even if 'global_func' argument was marked nonnull. (Amery Hung) Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRINyQBqoQUC24dy5htleuBPyPTXgUCamPsBwAKCRBtleuBPyPT Xo/uAP0YR6/9em1W/sZnss6Wkwfwd9NdmmlrJqSlt7aq+A+W7AD/V5HbigM4pJ76 CZFkKvnzhhU3dSXm/dBT1/GGJbL4pwo= =uzKM -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Eduard Zingerman: - Fix tcp_bpf_sendmsg() error path mistaking a concurrently-freed sk_psock->cork for the local temporary message and freeing it again (Chengfeng Ye) - Reject passing scalar NULL to nonnull arg of a global subprog. Previously the verifier did not account for the cases directly passing scalars to a global subprog, e.g.: 'global_func(0);' would pass even if 'global_func' argument was marked nonnull (Amery Hung) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Fix cork use-after-free in tcp_bpf_sendmsg() selftests/bpf: Test passing scalar NULL to nonnull global subprog bpf: Reject passing scalar NULL to nonnull arg of a global subprog	2026-07-24 19:31:12 -07:00
Chengfeng Ye	2d66a03386	bpf, sockmap: Fix cork use-after-free in tcp_bpf_sendmsg() tcp_bpf_sendmsg() keeps msg_tx across sk_stream_wait_memory(), which drops and reacquires the socket lock. Its error path tries to decide whether msg_tx names the local temporary message by comparing it with the current value of psock->cork. This comparison is unsafe when two threads send on the same socket: Thread A Thread B msg_tx = psock->cork sk_msg_alloc() fails sk_stream_wait_memory() releases the socket lock acquires the socket lock completes the cork psock->cork = NULL frees the cork reacquires the socket lock msg_tx != psock->cork sk_msg_free(msg_tx) The stale cork is therefore mistaken for the local temporary message and freed again. KASAN reported: BUG: KASAN: slab-use-after-free in sk_msg_free+0x49/0x50 Read of size 4 at addr ffff88810c908800 by task poc/90 Call Trace: sk_msg_free+0x49/0x50 tcp_bpf_sendmsg+0x14f5/0x1cc0 __sys_sendto+0x32c/0x3a0 __x64_sys_sendto+0xdb/0x1b0 Allocated by task 89: __kasan_kmalloc+0x8f/0xa0 tcp_bpf_sendmsg+0x16b3/0x1cc0 Freed by task 91: __kasan_slab_free+0x43/0x70 kfree+0x131/0x3c0 tcp_bpf_sendmsg+0xec3/0x1cc0 msg_tx can only name the stack-local tmp or the shared cork. Check for tmp directly so a changed psock->cork cannot turn a shared message into an apparent local one. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Chengfeng Ye <nicoyip.dev@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/87fr18lmzo.fsf%40cloudflare.com/ Link: https://lore.kernel.org/netdev/20260719161630.2901208-1-nicoyip.dev%40gmail.com/ [v1] Link: https://patch.msgid.link/20260724103856.3399001-1-nicoyip.dev@gmail.com Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>	2026-07-24 15:04:11 -07:00
Linus Torvalds	dad0a87d79	A bunch of assorted fixes with the majority being hardening against malformed input and invalid data scenarios that don't happen in real deployments but can be utilized to trigger use-after-free and similar issues, some error path leak fixups and two patches from Max to avoid a potential hang in __ceph_get_caps() and unintended nesting of current->journal_info while handling replies from the MDS. All marked for stable. -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmpjx6wTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzixDmB/4/J+dqhhDKg69t0ADnAPLgxe3AoXwi 7GRd2Uk5/AU9o+1fexyfWt2u+3ZtAVGLVRZPF1ARFQQDGUoj99X06NlTyVAFRNMb p78Sun5iuiDJb72UOD3WzW9lmpjSeCVUbTuTadtmF4y34KIvZ7AYltjZBpycE2Yj lyiUr4CSkXMAa/wWDKg+8SAw2tBI61WdyJeyu8ESCUmm9Q2XLq1+2N/jhfifk0oC Dc7EHUZmg7YNbTr0DKyLXdAoZHE54zKC2qDrvfTPmpTsOhMI2v5LY6JtXzm8/l6c wAGgIj5hRePGDDPb1V5Mbz3rkY/VIGdKmkqEV4PZn+IHafZGp45lnS7V =q2wM -----END PGP SIGNATURE----- Merge tag 'ceph-for-7.2-rc5' of https://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "A bunch of assorted fixes with the majority being hardening against malformed input and invalid data scenarios that don't happen in real deployments but can be utilized to trigger use-after-free and similar issues, some error path leak fixups and two patches from Max to avoid a potential hang in __ceph_get_caps() and unintended nesting of current->journal_info while handling replies from the MDS. All marked for stable" * tag 'ceph-for-7.2-rc5' of https://github.com/ceph/ceph-client: ceph: avoid fs reclaim while using current->journal_info ceph: add owner/capability checks for CEPH_IOC_SET_LAYOUT* ceph: fix hanging __ceph_get_caps() with stale mds_wanted rbd: Reset positive result codes to zero in object map update path libceph: bound pg_{temp,upmap,upmap_items} length to CEPH_PG_MAX_SIZE libceph: refresh auth->authorizer_buf{,_len} after authorizer update ceph: fix refcount leak in ceph_readdir() libceph: guard missing CRUSH type name lookup libceph: remove debugfs files before client teardown libceph: bound get_version reply decode to front len ceph: fix writeback_count leak in write_folio_nounlock() libceph: fix two unsafe bare decodes in decode_lockers() ceph: fix pre-auth out-of-bounds read on snaptrace in ceph_handle_caps() libceph: Reject monmaps advertising zero monitors libceph: reject zero bucket types in crush_decode libceph: Fix multiplication overflow in decode_new_up_state_weight()	2026-07-24 13:22:41 -07:00
Linus Torvalds	d326f83e81	Lots of fixes, double the count even for the "new normal". Largely due to my time off followed by a networking conference which distracted most maintainers (less so the AI generators). Including fixes from Bluetooth and WiFi. Current release - regressions: - wifi: mt76: fix MAC address for non OF pcie cards Current release - new code bugs: - mptcp: fix BUILD_BUG_ON on legacy ARM config - wifi: cfg80211: guard optional PMSR nominal time Previous releases - regressions: - qrtr: ns: raise node count limit to 512, we arbitrarily picked 256 as a limit, turns out it was too low for real world deployments - vhost-net: fix TX stall when vhost owns virtio-net header - eth: amd-xgbe: fix MAC_AUTO_SW handling in CL37 AN - wifi: ath12k: fix low MLO RX throughput on WCN7850 Previous releases - always broken: - number of random AI fixes for SCTP, RDS and TIPC protocols - more AI-looking fixes for WiFi drivers - number of fixes for missing pointer reloading after skb pull - reject BPF redirect use from qdisc qevent block - tcp: initialize standalone TCP-AO response padding - vsock/virtio: collapse receive queue under memory pressure to avoid client OOMing the host with tiny messages - ipv4: icmp: fill flow parameters in icmp_route_lookup decoy lookup, make sure the ICMP response routing follows the routing policy - gro: fix double aggregation of flush-marked skbs - ovpn: fix various refcount bugs - tls: device: push pending open record on splice EOF - eth: mlx5: - use sender devcom for MPV master-up - fix MCIA register buffer overflow on 32 dword reads Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmpiX68ACgkQMUZtbf5S IrvYsRAAkuMhUpz0Ss9aF7rBY8iTp4SofSvFeVe06ywraUfqPuflGlak07t1Lz/i G4MuKXN0q8m+B0EZddfMeYw6rCGd0SCtFAkxUI3dd+pu4hssgioaCPL193drSsfC /lYeacjVL45jNrQvAWwKsRaAs3xdwzxWf0ddIXWvVWbdDsVfIf/mYahSS3TvniWw MQtEbWPnFwPvOrHzb+1ChLELCtig/yvK+3xS9JrwOkjUF4BczOUgqrYlG5MWerXP f/JDLsegPcoZaTycW5F5fshY05umeRQza/zCFqMKQNcQux49fjREnYxBuyTacVCo 0cxhsNbKOhvBpBFNsHA6TjUbDxuiyL8L/g3e7VOlQFxI4hX3IMsnsP+UrSdE2zyG lgFAQ6HIcelgFnzFcwp9YEGsiZ5nDoJKe5aBcgftzTFPx3Plh1UeCrNjYtJawcjk 1POovopI+G6eszwluVOoucUdDD3wf0jPgDqvdOcI9P9FVTsFmvRESsfen7NbdjG0 v5mk9+sasWL1dns6mre6nt5is4QWSg7PDjufQUhuPKSSEnld+csEgmyxmUm0/FgL krUZLHdx0Yj9yIOAIYAvz8QoW9jHIyK05Mr7CoL4a/9RJ4rtxjb+3CT9qebeyd49 jK5uzYX6tPHvILFK4CgZwcE/z9S+DoxCAuDEp6LhfstKsJW4KIM= =DIF8 -----END PGP SIGNATURE----- Merge tag 'net-7.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Lots of fixes, double the count even for the 'new normal'. Largely due to my time off followed by a networking conference which distracted most maintainers (less so the AI generators). Including fixes from Bluetooth and WiFi. Current release - regressions: - wifi: mt76: fix MAC address for non OF pcie cards Current release - new code bugs: - mptcp: fix BUILD_BUG_ON on legacy ARM config - wifi: cfg80211: guard optional PMSR nominal time Previous releases - regressions: - qrtr: ns: raise node count limit to 512, we arbitrarily picked 256 as a limit, turns out it was too low for real world deployments - vhost-net: fix TX stall when vhost owns virtio-net header - eth: amd-xgbe: fix MAC_AUTO_SW handling in CL37 AN - wifi: ath12k: fix low MLO RX throughput on WCN7850 Previous releases - always broken: - number of random AI fixes for SCTP, RDS and TIPC protocols - more AI-looking fixes for WiFi drivers - number of fixes for missing pointer reloading after skb pull - reject BPF redirect use from qdisc qevent block - tcp: initialize standalone TCP-AO response padding - vsock/virtio: collapse receive queue under memory pressure to avoid client OOMing the host with tiny messages - ipv4: icmp: fill flow parameters in icmp_route_lookup decoy lookup, make sure the ICMP response routing follows the routing policy - gro: fix double aggregation of flush-marked skbs - ovpn: fix various refcount bugs - tls: device: push pending open record on splice EOF - eth: mlx5: - use sender devcom for MPV master-up - fix MCIA register buffer overflow on 32 dword reads" * tag 'net-7.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (234 commits) drop_monitor: perform u64_stats updates under IRQ-disabled section drop_monitor: fix size calculations for 64-bit attributes net: drop_monitor: fix info leak in NET_DM_ATTR_PAYLOAD mptcp: fix BUILD_BUG_ON on legacy ARM config selftests: mptcp: userspace_pm: fix undefined variable port mptcp: fix stale skb->sk reference on subflow close mptcp: pm: userspace: fix use-after-free in get_local_id mptcp: decrement subflows counter on failed passive join mac802154: hold an interface reference across the scan worker sctp: don't free the ASCONF's own transport in DEL-IP processing phonet: check register_netdevice_notifier() error in phonet_device_init() phonet: pep: fix use-after-free in pep_get_sb() bnge/bng_re: fix ring ID widths tipc: fix integer overflow in tipc_recvmsg() and tipc_recvstream() net: airoha: fix ETS channel derivation in airoha_tc_setup_qdisc_ets() mctp: check register_netdevice_notifier() error in mctp_device_init() ptp: netc: explicitly clear TMR_OFF during initialization rds: tcp: unregister sysctl before tearing down listen socket ipv6: Change allocation flags to match rcu_read_lock section requirements net: slip: serialize receive against buffer reallocation ...	2026-07-23 12:58:08 -07:00
Xiang Mei	9f00f9cf2b	libceph: bound pg_{temp,upmap,upmap_items} length to CEPH_PG_MAX_SIZE __decode_pg_temp() decodes an user-controlled length but only rejects values large enough to overflow the allocation; it does not bound it to CEPH_PG_MAX_SIZE. The helper backs both pg_temp and pg_upmap decoding, and apply_upmap()/get_temp_osds() later copy the decoded list into the fixed-size on-stack array struct ceph_osds.osds[CEPH_PG_MAX_SIZE]. A monitor that sends an OSDMap with a pg_temp/pg_upmap entry longer than 32 thus causes a stack out-of-bounds write. An OSD set for a single PG can never exceed CEPH_PG_MAX_SIZE, so reject longer entries at decode time. The bound is well below the old overflow threshold, so it also covers the allocation-size overflow the previous check guarded against. BUG: KASAN: stack-out-of-bounds in ceph_pg_to_up_acting_osds Write of size 4 ... by task exploit kasan_report (mm/kasan/report.c:595) ceph_pg_to_up_acting_osds (net/ceph/osdmap.c:2617 net/ceph/osdmap.c:2833) calc_target (net/ceph/osd_client.c:1638) __submit_request (net/ceph/osd_client.c:2394) ceph_osdc_start_request (net/ceph/osd_client.c:2490) ceph_osdc_call (net/ceph/osd_client.c:5164) rbd_dev_image_probe (drivers/block/rbd.c:6899) do_rbd_add (drivers/block/rbd.c:7138) ... kernel BUG at net/ceph/osdmap.c:2670! [ idryomov: do the same in __decode_pg_upmap_items() ] Cc: stable@vger.kernel.org Fixes: `a303bb0e58` ("libceph: introduce and switch to decode_pg_mapping()") Reported-by: Weiming Shi <bestswngs@gmail.com> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:42 +02:00
Shuangpeng Bai	937d61f86d	libceph: refresh auth->authorizer_buf{,_len} after authorizer update ceph_x_create_authorizer() caches au->buf->vec.iov_base and au->buf->vec.iov_len in struct ceph_auth_handshake. These cached values are then used by the messenger connect code when sending the authorizer. ceph_x_update_authorizer() can rebuild the authorizer when a newer service ticket is available. If the rebuilt authorizer no longer fits in the existing buffer, ceph_x_build_authorizer() drops its reference to au->buf and allocates a new one. If this is the final reference, ceph_buffer_put() frees the old ceph_buffer and its vec.iov_base, but auth->authorizer_buf still points at that freed memory. A subsequent msgr1 reconnect can therefore queue the stale pointer and trigger a KASAN slab-use-after-free in _copy_from_iter() while tcp_sendmsg() copies the authorizer. Refresh auth->authorizer_buf and auth->authorizer_buf_len after a successful authorizer rebuild so the messenger sends the current buffer. Cc: stable@vger.kernel.org Fixes: `0bed9b5c52` ("libceph: add update_authorizer auth method") Closes: https://lore.kernel.org/all/E378850E-106C-427B-A241-970EB2D054D7@gmail.com/ Signed-off-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:41 +02:00
Zhao Zhang	bbeae12fda	libceph: guard missing CRUSH type name lookup Localized read selection can walk a parent bucket whose name exists in the CRUSH map while its type has no matching entry in type_names. get_immediate_parent() then dereferences a NULL type_cn and passes an invalid pointer into strcmp(), causing a null-ptr-deref. Skip such malformed parent buckets unless both the bucket name and type name metadata are present. This keeps malformed hierarchy data from crashing locality lookup and safely falls back to "not local". [ idryomov: add WARN_ON_ONCE ] Cc: stable@vger.kernel.org Fixes: `117d96a04f` ("libceph: support for balanced and localized reads") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Zhao Zhang <zzhan461@ucr.edu> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:41 +02:00
Douya Le	e4c804726c	libceph: remove debugfs files before client teardown ceph_destroy_client() tears down the monitor client before removing the per-client debugfs files. A concurrent read of the monmap debugfs file can enter monmap_show() after ceph_monc_stop() has freed monc->monmap, triggering a use-after-free. Remove the debugfs files before stopping the OSD and monitor clients. debugfs_remove() drains active handlers and prevents new accesses, so the debugfs callbacks can no longer race the rest of client teardown. Cc: stable@vger.kernel.org Fixes: `76aa844d5b` ("ceph: debugfs") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Douya Le <ldy3087146292@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:41 +02:00
Douya Le	d3c32939fa	libceph: bound get_version reply decode to front len handle_get_version_reply() uses msg->front_alloc_len as the decode boundary for MON_GET_VERSION_REPLY. That is the size of the reused reply buffer, not the number of bytes actually received. A truncated reply can therefore pass ceph_decode_need() and decode the second u64 from stale tail bytes left in the buffer by an earlier message, causing an uninitialized memory read. Use msg->front.iov_len as the receive-side decode boundary, matching other libceph reply handlers and limiting decoding to the bytes that were actually read from the wire. Cc: stable@vger.kernel.org Fixes: `513a8243d6` ("libceph: mon_get_version request infrastructure") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Douya Le <ldy3087146292@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:41 +02:00
Pavitra Jha	a109a55611	libceph: fix two unsafe bare decodes in decode_lockers() decode_lockers() in cls_lock_client.c contains two bare decode operations that allow a malicious or compromised OSD to trigger slab-out-of-bounds reads: 1. ceph_decode_32(p) at the num_lockers field has no preceding bounds check. ceph_start_decoding() accepts struct_len=0 as valid -- the internal ceph_decode_need(p, end, 0, bad) always passes -- so when an OSD sends struct_len=0, ceph_start_decoding() returns success with p == end. The immediately following bare ceph_decode_32(p) then reads 4 bytes past the validated buffer boundary. The garbage value is passed directly to kzalloc_objs() as the locker count. The sibling function decode_watchers() in osd_client.c already uses ceph_decode_32_safe() after its own ceph_start_decoding() call. decode_lockers() was the only site using the bare variant. 2. ceph_decode_8(p) after the decode_locker() loop has no preceding bounds check. If an OSD crafts num_lockers such that the loop advances p exactly to end, the subsequent bare ceph_decode_8(p) reads one byte past the validated buffer boundary. The result is passed directly into type, which is used as a lock type discriminator by callers, giving an OSD-controlled one-byte OOB read with direct influence over the lock type field. Fix both by replacing bare operations with their safe variants: ceph_decode_32(p) -> ceph_decode_32_safe(p, end, num_lockers, err_inval) ceph_decode_8(p) -> ceph_decode_8_safe(p, end, type, err_free_lockers) The goto targets differ intentionally: err_inval: is a new label returning -EINVAL directly. It is used for the pre-allocation failure path where lockers is not yet allocated and must not be passed to ceph_free_lockers(). err_free_lockers: is the existing label. It is used for the post-allocation failure path where *lockers is allocated and must be freed. ret is set to -EINVAL before ceph_decode_8_safe() so that err_free_lockers returns the correct error code on bounds violation. Without this, err_free_lockers would return a stale ret value (0 from the successful decode_locker() loop), silently swallowing the error. -EINVAL is correct for both failure paths. The data received from the OSD is structurally malformed. -ENOMEM would misrepresent the failure class to callers and to stable@ backporters triaging error paths. Attacker model: a malicious or compromised OSD in a multi-tenant Ceph deployment can trigger this against any kernel client that issues the lock.get_info class method (e.g. during RBD exclusive lock acquisition). [ idryomov: trim changelog, formatting ] Cc: stable@vger.kernel.org Fixes: `d4ed4a5305` ("libceph: support for lock.lock_info") Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:41 +02:00
Raphael Zimmer	40480eee36	libceph: Reject monmaps advertising zero monitors A message of type CEPH_MSG_MON_MAP contains a monmap that is sent from a monitor to the client. This monmap contains information about the existing monitors in the cluster. Currently, a monmap indicating that there are zero monitors in the cluster is treated as valid. However, it is impossible to have zero monitors in the cluster and still receive a valid monmap from a monitor. Therefore, such a monmap must be corrupted and should be treated as invalid. Furthermore, a monmap with a monitor count of zero can subsequently crash the client when attempting to open a session with a monitor in __open_session(). This happens because the "BUG_ON(monc->monmap->num_mon < 1)" assertion in pick_new_mon() is triggered. This patch extends a check in ceph_monmap_decode() to also reject arriving mon_maps with num_mon == 0 rather than only with num_mon > CEPH_MAX_MON. [ idryomov: drop "log output for unusual values of num_mon" part ] Cc: stable@vger.kernel.org Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:40 +02:00
Douya Le	05f9028422	libceph: reject zero bucket types in crush_decode CRUSH bucket type 0 is reserved for devices. The mapper relies on that invariant and uses type 0 to identify leaf devices. If crush_decode() accepts a bucket with type 0, a malformed CRUSH map can make the mapper treat a negative bucket ID as a device and pass it to is_out(), which then indexes the OSD weight array with a negative value. Reject zero bucket types while decoding the CRUSH map so the invalid state never reaches the mapper. Cc: stable@vger.kernel.org Fixes: `f24e9980eb` ("ceph: OSD client") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Douya Le <ldy3087146292@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:40 +02:00
Raphael Zimmer	98917a499e	libceph: Fix multiplication overflow in decode_new_up_state_weight() If a message of type CEPH_MSG_OSD_MAP contains a (maliciously) corrupted osdmap, out-of-bounds memory accesses may occur in decode_new_up_state_weight(). This happens because the bounds check for the new_state part is based on calculating its length depending on a len value read from the incoming message. This calculation may overflow leading to an incorrect bounds check. Subsequently, out-of-bounds reads may occur when decoding this part. This patch switches the multiplication to use check_mul_overflow() to abort processing the osdmap if an overflow occurred. Therefore, osdmaps/messages containing large values for len that result in a multiplication overflow are treated as invalid. [ idryomov: rename new_state_len -> new_state_item_size, formatting ] Cc: stable@vger.kernel.org Fixes: `930c532869` ("libceph: apply new_state before new_up_client on incrementals") Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-07-23 20:29:40 +02:00
Eric Dumazet	fd098a23bf	drop_monitor: perform u64_stats updates under IRQ-disabled section In net_dm_packet_trace_kfree_skb_hit() and net_dm_hw_trap_packet_probe(), u64_stats_update_begin() / u64_stats_inc() / u64_stats_update_end() were called after spin_unlock_irqrestore(&...drop_queue.lock, flags), when local IRQs had already been re-enabled. Tracepoint probes can execute in IRQ or softirq context. On 32-bit architectures, u64_stats_update_begin() disables preemption but not interrupts, relying on seqcount writes. If a nested interrupt occurs on the same CPU during the 64-bit stats update, the reentrant seqcount update can corrupt the seqcount state or stats value. Fix this by performing the 64-bit per-CPU stats update before releasing drop_queue.lock via spin_unlock_irqrestore(), ensuring local interrupts remain disabled during the u64_stats update. Fixes: `e9feb58020` ("drop_monitor: Expose tail drop counter") Fixes: `5e58109b1e` ("drop_monitor: Add support for packet alert mode for hardware drops") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260722141743.3266924-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 11:00:01 -07:00
Eric Dumazet	7089f7ab99	drop_monitor: fix size calculations for 64-bit attributes net_dm_packet_report_fill() and net_dm_hw_packet_report_fill() use nla_put_u64_64bit() to append 64-bit attributes (NET_DM_ATTR_PC and NET_DM_ATTR_TIMESTAMP). On 32-bit architectures without CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, nla_put_u64_64bit() may append a 4-byte NET_DM_ATTR_PAD attribute for 64-bit alignment. However, net_dm_packet_report_size() and net_dm_hw_packet_report_size() used nla_total_size(sizeof(u64)) instead of nla_total_size_64bit(sizeof(u64)), budgeting 12 bytes instead of up to 16 bytes. This under-estimation of SKB size can lead to an skb_over_panic() when __nla_reserve() or skb_put() is subsequently called. Fix this by using nla_total_size_64bit(sizeof(u64)) in both size calculations. Fixes: `ca30707dee` ("drop_monitor: Add packet alert mode") Fixes: `5e58109b1e` ("drop_monitor: Add support for packet alert mode for hardware drops") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260722141743.3266924-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 11:00:01 -07:00
Yehyeong Lee	5e9c8baee0	net: drop_monitor: fix info leak in NET_DM_ATTR_PAYLOAD net_dm_packet_report_fill() and net_dm_hw_packet_report_fill() open code the NET_DM_ATTR_PAYLOAD attribute to avoid zeroing the packet payload before overwriting it with skb_copy_bits(). skb_put() reserves nla_total_size(payload_len), i.e. the header plus the NLA_ALIGN() padding, but only payload_len bytes are copied in. When payload_len is not a multiple of 4 the 1-3 padding bytes are never initialized and are leaked to user space inside the netlink message. KMSAN confirms the leak for the software path when the packet payload length is not 4-byte aligned: BUG: KMSAN: kernel-infoleak in _copy_to_iter _copy_to_iter __skb_datagram_iter skb_copy_datagram_iter netlink_recvmsg sock_recvmsg __sys_recvfrom Uninit was created at: kmem_cache_alloc_node_noprof __alloc_skb net_dm_packet_work Bytes 173-175 of 176 are uninitialized Use __nla_reserve(), which sets up the attribute header and zeroes the padding, instead of open coding the attribute construction. Fixes: `ca30707dee` ("drop_monitor: Add packet alert mode") Fixes: `5e58109b1e` ("drop_monitor: Add support for packet alert mode for hardware drops") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yehyeong Lee <yhlee@isslab.korea.ac.kr> Link: https://patch.msgid.link/20260722122817.5548-1-yhlee@isslab.korea.ac.kr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:59:27 -07:00
Matthieu Baerts (NGI0)	133cca19d7	mptcp: fix BUILD_BUG_ON on legacy ARM config The 0-day bot managed to find kernel configs that cause build failures, e.g. when using the StrongARM SA1100 target (ARMv4). On such legacy ARM architecture, all structures are apparently aligned to 32 bits, causing build issue here. Indeed, on such architecture, 'flags' size is not equivalent to sizeof(u16) as expected, but to sizeof(u32). Instead, use memset(). It was not used before to ensure a simple clear operation was used by the compiler. But at the end, it shouldn't matter, and the compiler should optimise this to the same operation with or without memset() when -O above 0 is used. So let's switch to memset() to fix this issue, and reduce this complexity. Fixes: `5e939544f9` ("mptcp: fix uninit-value in mptcp_established_options") Cc: stable@vger.kernel.org Suggested-by: Frank Ranner <frank.ranner@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605312026.Srgsz7Tp-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607031100.upQfRZTM-lkp@intel.com/ Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260722-net-mptcp-misc-fixes-7-2-rc5-v1-5-6fb595bc86ef@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:50:38 -07:00
Kalpan Jani	bd7aae448f	mptcp: fix stale skb->sk reference on subflow close The backlog list is updated by mptcp_data_ready() under mptcp_data_lock(). The cleanup of backlog references to a closing subflow, however, was performed in mptcp_close_ssk(), before __mptcp_close_ssk() acquires the ssk lock, and while holding neither the ssk lock nor mptcp_data_lock(). Because that traversal ran without mptcp_data_lock(), concurrent softirq RX processing on another CPU (subflow_data_ready() -> mptcp_data_ready() -> __mptcp_add_backlog(), under mptcp_data_lock()) could add a backlog entry referencing the ssk while the cleanup loop was in progress. Such an entry could be missed by the cleanup, or the concurrent list update could corrupt the traversal, leaving skb->sk pointing at the ssk after it is freed. A later mptcp_backlog_purge() then dereferences the stale pointer, triggering a warning in inet_sock_destruct() (ssk->sk_rmem_alloc != 0) followed by a use-after-free in mptcp_backlog_purge(). Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after subflow->closing is set to 1 and while the ssk lock is still held, serialized under mptcp_data_lock(). The cleanup runs only on the push path (MPTCP_CF_PUSH), where backlog references accumulate; on other teardown paths the caller already handles cleanup. With subflow->closing set and mptcp_data_lock() held across the purge, any concurrent mptcp_data_ready() either completes its enqueue before the purge runs and is caught, or observes closing=1 and bails out. Once mptcp_data_unlock() is reached, no new skb referencing the ssk can be enqueued, so the cleanup is exhaustive. Remove the unprotected traversal from mptcp_close_ssk() entirely. Fixes: `ee458a3f31` ("mptcp: introduce mptcp-level backlog") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Reported-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621 Signed-off-by: Kalpan Jani <kalpan.jani@mpiricsoftware.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260722-net-mptcp-misc-fixes-7-2-rc5-v1-3-6fb595bc86ef@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:50:38 -07:00
Geliang Tang	9bc6d5e4ca	mptcp: pm: userspace: fix use-after-free in get_local_id In mptcp_pm_userspace_get_local_id(), the address entry is looked up under spinlock, but its id is read after dropping the lock. A concurrent deletion can free the entry between the unlock and the read, leading to UAF. The race window is narrow. It was reproduced only with a locally constructed stress test that repeatedly overlaps an MP_JOIN SYN with a MPTCP_PM_CMD_SUBFLOW_DESTROY request. However, the KASAN report below confirms that the race is reachable: [ 666.319376] BUG: KASAN: slab-use-after-free in mptcp_userspace_pm_get_local_id+0x1dc/0x1f0 [ 666.319386] Read of size 1 at addr ffff888124845610 by task swapper/0/0 ... [ 666.319401] Call Trace: [ 666.319405] <IRQ> [ 666.319408] dump_stack_lvl+0x53/0x70 [ 666.319412] print_address_description.constprop.0+0x2c/0x3b0 [ 666.319418] print_report+0xbe/0x2b0 [ 666.319421] ? mptcp_userspace_pm_get_local_id+0x1dc/0x1f0 [ 666.319423] kasan_report+0xce/0x100 [ 666.319426] ? mptcp_userspace_pm_get_local_id+0x1dc/0x1f0 [ 666.319429] mptcp_userspace_pm_get_local_id+0x1dc/0x1f0 [ 666.319433] mptcp_pm_get_local_id+0x371/0x440 ... [ 666.319821] Allocated by task 45539: [ 666.319844] kasan_save_stack+0x33/0x60 [ 666.319855] kasan_save_track+0x14/0x30 [ 666.319858] __kasan_kmalloc+0x8f/0xa0 [ 666.319863] __kmalloc_noprof+0x1e7/0x520 [ 666.319867] sock_kmalloc+0xdf/0x130 [ 666.319885] sock_kmemdup+0x1b/0x40 [ 666.319888] mptcp_userspace_pm_append_new_local_addr+0x261/0x500 [ 666.319910] mptcp_pm_nl_announce_doit+0x16a/0x610 ... [ 666.319967] Freed by task 45560: [ 666.319988] kasan_save_stack+0x33/0x60 [ 666.319991] kasan_save_track+0x14/0x30 [ 666.319994] kasan_save_free_info+0x3b/0x60 [ 666.319998] __kasan_slab_free+0x43/0x70 [ 666.320000] kfree+0x166/0x440 [ 666.320003] sock_kfree_s+0x1d/0x50 [ 666.320007] mptcp_userspace_pm_delete_local_addr.isra.0+0x157/0x200 [ 666.320011] mptcp_pm_nl_subflow_destroy_doit+0x51d/0xea0 Fix by copying the id into a local variable while still holding the lock, and use -1 as a "not found" sentinel. Fixes: `f012d796a6` ("mptcp: check addrs list in userspace_pm_get_local_id") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Tested-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260722-net-mptcp-misc-fixes-7-2-rc5-v1-2-6fb595bc86ef@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:50:38 -07:00
Chenguang Zhao	f3ca0ee2cc	mptcp: decrement subflows counter on failed passive join mptcp_pm_allow_new_subflow() increments extra_subflows before __mptcp_finish_join() on the passive MP_JOIN path. In case of race conditions, the subflow is dropped without calling mptcp_close_ssk(), so the counter is not rolled back. Call mptcp_pm_close_subflow() when the join completion fails to decrement the subflows counter. Fixes: `10f6d46c94` ("mptcp: fix race between MP_JOIN and close") Cc: stable@vger.kernel.org Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260722-net-mptcp-misc-fixes-7-2-rc5-v1-1-6fb595bc86ef@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:50:38 -07:00
Ibrahim Hashimov	234e5e898b	mac802154: hold an interface reference across the scan worker mac802154_scan_worker() captures the scanning sub-interface under RCU and then keeps dereferencing sdata->dev after rcu_read_unlock() and outside the rtnl -- in the failure traces, in mac802154_transmit_beacon_req() (skb->dev = sdata->dev), and in the end_scan cleanup. Nothing keeps that netdev alive across the worker iteration. A concurrent DEL_INTERFACE or PHY removal can unregister the interface once the worker drops the rtnl between its two drv_set_channel() sections. unregister_netdevice() frees the netdev asynchronously from netdev_run_todo() with the rtnl already dropped, so neither holding the rtnl nor the per-PHY IEEE802154_IS_SCANNING flag prevents a stale worker iteration from dereferencing the freed netdev -- a KASAN slab-use-after-free, reachable by racing TRIGGER_SCAN against DEL_INTERFACE (both CAP_NET_ADMIN). Pin the netdev with netdev_hold() while the RCU read lock is still held, and release it at every worker exit. Fixes: `57588c7117` ("mac802154: Handle passive scanning") Cc: stable@vger.kernel.org Signed-off-by: Ibrahim Hashimov <security@auditcode.ai> Link: https://patch.msgid.link/20260721211228.34578-1-security@auditcode.ai Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:37:46 -07:00
Jun Yang	9b2854f86f	sctp: don't free the ASCONF's own transport in DEL-IP processing sctp_process_asconf() caches the transport the ASCONF chunk is processed against in asconf->transport (== chunk->transport, set once in sctp_rcv()). For an ASCONF located through its Address Parameter by __sctp_rcv_asconf_lookup(), that cached transport corresponds to the Address Parameter, which need not be the packet's source address. sctp_process_asconf_param() rejects a DEL-IP for the packet source address (ADDIP D8, SCTP_ERROR_DEL_SRC_IP), but nothing protects asconf->transport. A single ASCONF can therefore carry, in order: [Address Parameter L] [DEL-IP L] [DEL-IP 0.0.0.0] where L differs from the source. The DEL-IP for L passes the D8 check and calls sctp_assoc_rm_peer() on the transport that asconf->transport still points at, freeing it (RCU-deferred). The following wildcard DEL-IP then reuses the now-dangling asconf->transport in sctp_assoc_set_primary() and sctp_assoc_del_nonprimary_peers(): set_primary() dereferences the freed transport (->ipaddr, ->state) and plants the dangling pointer into asoc->peer.primary_path / active_path, and del_nonprimary_peers(), keeping only the pointer that is no longer on the list, removes every real transport, leaving the association with a transport_count of 0 and primary_path/active_path pointing at freed memory. Reject a DEL-IP that targets the transport the ASCONF is being processed against, mirroring the existing source-address guard, so the wildcard branch can never reuse a freed transport. Fixes: `42e30bf346` ("[SCTP]: Handle the wildcard ADD-IP Address parameter") Cc: stable@kernel.org Signed-off-by: Jun Yang <junvyyang@tencent.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/tencent_73762ED1DF08CC9D5F5F61954B01350CFE0A@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:30:31 -07:00
Minhong He	d1ff66b661	phonet: check register_netdevice_notifier() error in phonet_device_init() phonet_device_init() registers a netdevice notifier before calling phonet_netlink_register(), but does not check whether notifier registration succeeded. On failure, netlink setup still proceeds and init may return success without the notifier in place. Also, the existing phonet_netlink_register() failure path called phonet_device_exit(), which runs rtnl_unregister_all() even though rtnl_register_many() already unwound any partial registration. Calling the full exit helper on a partial init is not correct. Check each registration error, including proc_create_net(), and unwind only the steps that have succeeded so far, in reverse order. Signed-off-by: Minhong He <heminhong@kylinos.cn> Link: https://patch.msgid.link/20260721093956.162617-1-heminhong@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:28:50 -07:00
Breno Leitao	0f71f852a9	phonet: pep: fix use-after-free in pep_get_sb() pep_get_sb() doesn't consider that pskb_may_pull() might have relocated the skb data, and continue to access the older pointer, causing UAF. Reproduced under KASAN: BUG: KASAN: slab-use-after-free in pep_get_sb+0x234/0x3b0 Read of size 1 at addr ff11000105510f50 by task repro/157 pep_get_sb+0x234/0x3b0 pipe_handler_do_rcv+0x5f7/0xa10 pep_do_rcv+0x203/0x410 __sk_receive_skb+0x471/0x4a0 phonet_rcv+0x5b3/0x6c0 __netif_receive_skb+0xcc/0x1d0 Refetch the header with skb_header_pointer() after pskb_may_pull(), so the possibly stale pointer is no longer dereferenced. There are better ways to solve this, but, this is the less instrusive one. Fixes: `9641458d3e` ("Phonet: Pipe End Point for Phonet Pipes protocol") Cc: stable@vger.kernel.org Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260721-phonet_get_sb_uaf-v1-1-95fd7881cc4e@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:27:08 -07:00
Cen Zhang (Microsoft)	47f42ff521	tipc: fix integer overflow in tipc_recvmsg() and tipc_recvstream() In tipc_recvmsg(), the copy length is computed as: copy = min_t(int, dlen - offset, buflen); buflen is size_t but min_t(int, ...) casts it to int. When buflen exceeds INT_MAX (e.g. 0xFFFFFFFF via io_uring provided buffers), it wraps negative, wins the comparison, and the negative copy length propagates to simple_copy_to_iter() where int-to-size_t promotion makes it SIZE_MAX, triggering a WARN_ON. tipc_recvstream() has the same pattern. Kernel panic - not syncing: kernel: panic_on_warn set ... RIP: 0010:simple_copy_to_iter+0x9e/0xd0 (net/core/datagram.c:521) Call Trace: __skb_datagram_iter+0x123/0x8b0 (net/core/datagram.c:402) skb_copy_datagram_iter+0x77/0x1a0 (net/core/datagram.c:534) tipc_recvmsg+0x3d7/0xe80 (net/tipc/socket.c:1934) io_recvmsg+0x47e/0xda0 Fix by changing min_t(int, ...) to min_t(size_t, ...) in both functions. The result is always <= (dlen - offset), which is bounded by TIPC maximum message size (0x1ffff bytes), so the implicit narrowing on assignment to int copy is always safe. Fixes: `e9f8b10101` ("tipc: refactor function tipc_sk_recvmsg()") Fixes: `ec8a09fbbe` ("tipc: refactor function tipc_sk_recv_stream()") Reported-by: AutonomousCodeSecurity@microsoft.com Signed-off-by: Cen Zhang (Microsoft) <blbllhy@gmail.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20260720214103.47732-1-blbllhy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 10:11:48 -07:00
Minhong He	d9a33cadc7	mctp: check register_netdevice_notifier() error in mctp_device_init() mctp_device_init() handles errors from rtnl_af_register() and rtnl_register_many(), but ignores the return value of register_netdevice_notifier(). If notifier registration fails, init can still return success while the module is only partially initialized. Check the notifier registration error and fail module init early. Fixes: `583be982d9` ("mctp: Add device handling and netlink interface") Signed-off-by: Minhong He <heminhong@kylinos.cn> Link: https://patch.msgid.link/20260720072518.112614-1-heminhong@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 09:17:36 -07:00
Cen Zhang (Microsoft)	167e54c703	rds: tcp: unregister sysctl before tearing down listen socket rds_tcp_exit_net() frees the per-netns RDS TCP listen socket via rds_tcp_kill_sock() before unregistering the per-netns sysctl table. Since rds_tcp_skbuf_handler() derives the netns from rtn->rds_tcp_listen_sock->sk, a concurrent sysctl write can race with netns teardown and dereference the freed socket/sk. KASAN reports the race as: BUG: KASAN: slab-use-after-free in rds_tcp_skbuf_handler+0x2aa/0x2e0 rds_tcp_skbuf_handler net/rds/tcp.c:721 proc_sys_call_handler fs/proc/proc_sysctl.c vfs_write fs/read_write.c __x64_sys_pwrite64 fs/read_write.c Fix this by unregistering the RDS TCP sysctl table before calling rds_tcp_kill_sock(). unregister_net_sysctl_table() prevents new sysctl handlers from starting and waits for in-flight handlers to finish, so the listen socket can then be released safely. The fix was tested against the linked reproducer. Fixes: `7f5611cbc4` ("rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy") Reported-by: AutonomousCodeSecurity@microsoft.com Link: https://lore.kernel.org/all/20260719203718.9680-1-blbllhy@gmail.com Reviewed-by: Allison Henderson <achender@kernel.org> Signed-off-by: Cen Zhang (Microsoft) <blbllhy@gmail.com> Link: https://patch.msgid.link/20260719210357.10179-1-blbllhy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 09:15:20 -07:00
Nikola Z. Ivanov	313a123e1f	ipv6: Change allocation flags to match rcu_read_lock section requirements Since the call to __ip6_del_rt_siblings has been converted under rcu read lock and it only has one call point we should no longer block or yield. Our stack trace from the syzbot reproducer looks as follows: __ip6_del_rt_siblings rtnl_notify (Here we pass gfp_any() -> GFP_KERNEL) nlmsg_notify nlmsg_multicast nlmsg_multicast_filtered netlink_broadcast_filtered (GFP_KERNEL passed from earlier) netlink_broadcast_filtered can yield if GFP_KERNEL is passed, which we do not want to happen. Fix this by changing the allocation flag of rtnl_notify. Also change the flag passed to nlmsg_new. Even though it is not related to the syzbot generated bug it still falls under the same requirements. Reported-by: syzbot+84d4a405ed798b40c96d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=84d4a405ed798b40c96d Fixes: `bd11ff421d` ("ipv6: Get rid of RTNL for SIOCDELRT and RTM_DELROUTE.") Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260719105759.558050-1-zlatistiv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 09:13:13 -07:00
Li RongQing	440e274da4	net: ipv6: fix dif and sdif mismatch in raw6_icmp_error In raw6_icmp_error(), raw_v6_match() is called with inet6_iif(skb) passed to both the 'dif' and 'sdif' arguments. This is a copy-paste or typo error, as the last argument should represent the secondary interface index (sdif). This mismatch breaks ICMPv6 error handling for IPv6 raw sockets in VRF (Virtual Routing and Forwarding) environments. When a raw socket is bound to a VRF master device, raw_v6_match() fails to find a match because it is not given the correct sdif value, causing the socket to miss relevant ICMPv6 error notifications. Fix this by properly passing inet6_sdif(skb) as the last argument to raw_v6_match(). Fixes: `5108ab4bf4` ("net: ipv6: add second dif to raw socket lookups") Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260717143230.1836-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 08:31:44 -07:00
Yuxiang Yang	a28c4fcbf7	tcp: challenge ACK for non-exact RST in SYN-RECEIVED The SYN-RECEIVED request-socket path in tcp_check_req() accepts an in-window RST without requiring SEG.SEQ to exactly match RCV.NXT. A non-exact RST therefore removes the request instead of eliciting a challenge ACK. RFC 9293 section 3.10.7.4 applies the RFC 5961 reset check in SYN-RECEIVED: an exact RST resets the connection, while a non-exact in-window RST must trigger a challenge ACK and be dropped. Apply that check before the ACK-field validation, following the RFC sequence-number, RST, then ACK processing order. Factor the per-netns challenge ACK quota out of tcp_send_challenge_ack() so request sockets can share it. Use the request socket's send_ack() callback and its own out-of-window ACK timestamp to send and rate-limit the response. Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Ao Wang <wangao@seu.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Fixes: `282f23c6ee` ("tcp: implement RFC 5961 3.2") Cc: stable@vger.kernel.org Signed-off-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260717081443.809393-2-yangyx22@mails.tsinghua.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 08:27:22 -07:00
Doruk Tan Ozturk	fd3a3f28ed	mac802154: llsec: reject frames shorter than the authentication tag llsec_do_decrypt_auth() computes the associated-data length for the AEAD request as assoclen += datalen - authlen; where datalen is the number of bytes after the MAC header and authlen (4, 8 or 16) is the length of the authentication tag. Nothing verifies that the frame actually carries at least authlen payload bytes. A secured frame whose payload is shorter than the tag makes datalen - authlen negative; assoclen is then passed to aead_request_set_ad() as an unsigned value close to 4 GiB, so crypto_aead_decrypt() walks far off the end of the scatterlist that only spans the real frame. The frame is fully attacker-controlled and reaches this path from any IEEE 802.15.4 peer in radio range. Reject frames whose payload is shorter than the authentication tag before the subtraction. Dynamically reproduced on a KASAN kernel as a general-protection-fault in the AEAD scatterwalk, and the fix confirmed. Fixes: `4c14a2fb5d` ("mac802154: add llsec decryption method") Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai> Link: https://patch.msgid.link/20260716193423.32498-1-doruk@0sec.ai Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 08:12:25 -07:00
Runyu Xiao	18f116931f	raw: annotate lockless match fields in raw_v4_match() raw_v4_match() is a lockless match helper under sk_for_each_rcu(). It still reads inet->inet_daddr, inet->inet_rcv_saddr and sk->sk_bound_dev_if with plain loads while bind, connect and bind-to-device paths can update the same match fields concurrently. Annotate only those mutable match fields in raw_v4_match(), and do so at the point of use instead of hoisting the bound-device read before the earlier short-circuit tests. Also annotate the raw bind writer and the shared IPv4 datagram connect writer used by raw sockets, so the address fields updated on bind and connect match explicit WRITE_ONCE() updates. This version intentionally leaves the shared disconnect-side IPv4 writers to follow-up cleanup and limits the writer changes here to the raw bind path and the datagram connect path directly exercised by raw sockets. Fixes: `0daf07e527` ("raw: convert raw sockets to RCU") Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn> Link: https://patch.msgid.link/20260716142958.3064224-1-runyu.xiao@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 07:46:12 -07:00
Aldo Ariel Panzardo	3b536db8fb	net: qrtr: restrict socket creation to the initial network namespace QRTR keeps its entire port and node state in module-global variables that are not partitioned per network namespace: qrtr_local_nid is a single global node id (always 1) and qrtr_ports is a single global xarray. qrtr_port_lookup() and qrtr_local_enqueue() operate on that global state with no network-namespace check, and qrtr_create() places no restriction on the namespace a socket is created in. As a result an unprivileged process that creates an AF_QIPCRTR socket in a separate network namespace, e.g. via unshare(CLONE_NEWUSER \| CLONE_NEWNET), can send QRTR datagrams - including control-plane messages such as QRTR_TYPE_NEW_SERVER - to QRTR sockets owned by another namespace, and vice versa. The receiving socket sees such a message as coming from node id 1, indistinguishable from a legitimate local client, breaking the isolation that network namespaces are expected to provide. QRTR is a transport to global hardware endpoints (the modem and other remote processors) and has no per-namespace semantics; its in-kernel name service already creates its socket in init_net only. Confine the socket family to the initial network namespace, as other non-namespace-aware socket families do (see llc_ui_create() and the ieee802154 socket code). Fixes: `bdabad3e36` ("net: Add Qualcomm IPC router") Signed-off-by: Aldo Ariel Panzardo <qwe.aldo@gmail.com> Link: https://patch.msgid.link/20260716154319.3297699-1-qwe.aldo@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 07:44:00 -07:00
Eric Dumazet	853e164c2b	ipv4: icmp: fill flow parameters in icmp_route_lookup decoy lookup When Linux forwards a packet and needs to generate an ICMP error, icmp_route_lookup() performs a reverse-path relookup. For non-local destinations, it performs a decoy lookup to find the expected egress interface (rt2->dst.dev) before validating the path with ip_route_input(). Currently, the decoy flow structure (fl4_2) only sets .daddr = fl4_dec.saddr, leaving .saddr, .flowi4_dscp, .flowi4_proto, .flowi4_mark, .flowi4_oif, .fl4_sport, .fl4_dport, and .flowi4_uid zeroed out. When policy routing rules (such as ip rule add from $SRC lookup 100, or dscp/fwmark/ipproto/port rules, or VRF bindings) are configured: 1. The decoy lookup fails to match the policy rule because saddr and other key flow selectors are missing in fl4_2. 2. It resolves a route using the default table instead, returning an incorrect egress netdev. 3. Passing the wrong netdev to ip_route_input() causes strict reverse-path filtering (rp_filter=1) to fail, logging false-positive "martian source" warnings and causing the relookup to fail. Fix this by initializing fl4_2 from fl4_dec and: - Swapping source/destination IP addresses. - Swapping L4 ports for transport protocols with ports (TCP, UDP, SCTP, DCCP) so port-based policy routing matches correctly. Non-port protocols (such as ICMP or GRE) leave the flowi_uli union fields intact to prevent corruption. - Setting .flowi4_oif = l3mdev_master_ifindex(route_lookup_dev) to ensure VRF routing tables are respected. - Setting .flowi4_flags \|= FLOWI_FLAG_ANYSRC to allow output route lookups for non-local source IP addresses. - Using __ip_route_output_key() instead of ip_route_output_key() for fl4_2 so that raw FIB routing is used without triggering spurious XFRM policy lookups on the decoy flow (the actual XFRM lookup is performed later using fl4_dec). Fixes: `415b3334a2` ("icmp: Fix regression in nexthop resolution during replies.") Reported-by: Muhammad Ziad <muhzi100@gmail.com> Closes: https://lore.kernel.org/netdev/CAOAwikA60AYKdFr_UDLyja3oU4hqyAE7uFZWqum5uRdaQsgRYg@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260722104236.2938082-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-23 07:01:24 -07:00
Kuniyuki Iwashima	3671f0419d	mpls: Set rt->rt_nhn just before returning from mpls_nh_build_multi(). Commit `f0914b8436` ("mpls: Hold dev refcnt for mpls_nh.") added change_nexthops() loop to call netdev_put() for the nexthop devices before freeing mpls_route. Then, mpls_nh_build_multi() was also changed to avoid iterating uninitialised nexthops in mpls_rt_free_rcu(). However, setting rt->rt_nhn to 0 at the entry of mpls_nh_build_multi() makes the following change_nexthops() no-op. Let's set rt->rt_nhn just before returning from mpls_nh_build_multi(). Fixes: `f0914b8436` ("mpls: Hold dev refcnt for mpls_nh.") Reported-by: Anthony Doeraene <anthony.doeraene@uclouvain.be> Closes: https://lore.kernel.org/netdev/036a0c95-f5d4-46ab-88e7-1eab567d7a84@uclouvain.be/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260716170609.804629-1-kuniyu@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-07-23 13:36:46 +02:00
Yun Zhou	675ed582c1	net: gre: fix lltx regression for GRE tunnels with SEQ/CSUM Before commit `00d066a4d4` ("netdev_features: convert NETIF_F_LLTX to dev->lltx"), NETIF_F_LLTX was set unconditionally in both __gre_tunnel_init() and ip6gre_tnl_init_features() alongside GRE_FEATURES: dev->features \|= GRE_FEATURES \| NETIF_F_LLTX; When that commit converted NETIF_F_LLTX to the dev->lltx flag, it placed 'dev->lltx = true' after the SEQ/CSUM early returns instead of before them. This causes GRE/GRETAP/ip6gre tunnels with SEQ or CSUM+encap to lose lockless TX, reintroducing _xmit_lock acquisition around their ndo_start_xmit. Since GRE xmit re-enters the stack via ip_tunnel_xmit(), holding _xmit_lock risks ABBA deadlock with the underlay device. CPU0 CPU1 ---- ---- lock(&qdisc_xmit_lock_key#6); lock(&qdisc_xmit_lock_key#3); lock(&qdisc_xmit_lock_key#6); lock(&qdisc_xmit_lock_key#3); Fix by moving dev->lltx = true before the early returns in both functions, restoring the original unconditional behavior. Fixes: `00d066a4d4` ("netdev_features: convert NETIF_F_LLTX to dev->lltx") Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260713150945.1779628-1-yun.zhou@windriver.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-07-23 13:01:57 +02:00
Daehyeon Ko	ba0533fc16	tipc: clear sock->sk on the failed-insert path in tipc_sk_create() When tipc_sk_create() fails to insert the new socket (tipc_sk_insert() returns non-zero), its error path frees the sk with sk_free() but leaves sock->sk pointing at the freed object: if (tipc_sk_insert(tsk)) { sk_free(sk); pr_warn("Socket create failed; port number exhausted\n"); return -EINVAL; } This is harmless for plain socket(): the syscall layer clears sock->ops before releasing, so tipc_release() is never called. It is not harmless on the accept() path. tipc_accept() creates the pre-allocated child socket with tipc_sk_create(net, new_sock, 0, kern); on failure it leaves new_sock->sk dangling and new_sock->ops non-NULL, and do_accept() then fput()s the new file, so __sock_release() -> tipc_release() runs lock_sock(new_sock->sk) on the freed sk -- a use-after-free write of the sk_lock spinlock. tipc_release() already guards this exact "failed accept() releases a pre-allocated child" case with "if (sk == NULL) return 0;", but the guard is bypassed because tipc_sk_create() left sock->sk non-NULL (dangling) rather than NULL. Clear sock->sk on the failed-insert path so the existing tipc_release() NULL check fires and the use-after-free is avoided. The tipc_sk_insert() failure is reached when the per-netns socket rhashtable hits its max_size (tsk_rht_params.max_size = 1048576, ~2M elements) -- i.e. once a netns holds ~2M TIPC sockets every insert returns -E2BIG. BUG: KASAN: slab-use-after-free in lock_sock_nested (net/core/sock.c:3839) Write of size 8 at addr ffff8880047cdc38 by task init/1 lock_sock_nested (net/core/sock.c:3839) tipc_release (net/tipc/socket.c:638) __sock_release (net/socket.c:710) sock_close (net/socket.c:1501) __fput (fs/file_table.c:512) Allocated by task 1: sk_alloc (net/core/sock.c:2308) tipc_sk_create (net/tipc/socket.c:487) tipc_accept (net/tipc/socket.c:2744) do_accept (net/socket.c:2034) Freed by task 1: __sk_destruct (net/core/sock.c:2391) tipc_sk_create (net/tipc/socket.c:504) tipc_accept (net/tipc/socket.c:2744) do_accept (net/socket.c:2034) Fixes: `00aff3590f` ("net: tipc: fix possible refcount leak in tipc_sk_create()") Cc: stable@vger.kernel.org Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Daehyeon Ko <4ncienth@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260714131939.1255974-1-4ncienth@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-07-23 12:58:06 +02:00
Xiang Mei (Microsoft)	980a813452	bpf: tcp: fix double sock release on batch realloc bpf_iter_tcp_batch() releases the current batch via bpf_iter_tcp_put_batch(), which drops the socket refs and rewrites each slot with the socket cookie, then grows the batch. cur_sk/end_sk are kept for bpf_iter_tcp_resume(), but on realloc failure the function returns ERR_PTR() before resume runs, leaving cur_sk < end_sk over slots that now hold cookies rather than sock pointers. bpf_iter_tcp_seq_stop() then calls bpf_iter_tcp_put_batch() again and dereferences a cookie as a struct sock. Empty the batch on the failure path so stop() does not release it again. The sockets were already freed by the first bpf_iter_tcp_put_batch(), so nothing leaks, and a later read() rescans the bucket from the start instead of skipping it. The sibling GFP_NOWAIT failure path still holds real socket references and is left for stop() to release. BUG: KASAN: null-ptr-deref in __sock_gen_cookie Read of size 8 at addr 0000000000000059 by task exploit ... __sock_gen_cookie (net/core/sock_diag.c:28) bpf_iter_tcp_put_batch (net/ipv4/tcp_ipv4.c:2918) bpf_iter_tcp_seq_stop (net/ipv4/tcp_ipv4.c:3270) bpf_seq_read (kernel/bpf/bpf_iter.c:205) vfs_read (fs/read_write.c:572) ksys_read (fs/read_write.c:716) do_syscall_64 entry_SYSCALL_64_after_hwframe Kernel panic - not syncing: Fatal exception Fixes: `cdec67a489` ("bpf: tcp: Make sure iter->batch always contains a full bucket snapshot") Reported-by: AutonomousCodeSecurity@microsoft.com Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jordan Rife <jordan@jrife.io> Link: https://patch.msgid.link/20260713233230.3553593-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-07-23 12:20:34 +02:00
David Lee	5499e0602d	net/x25: fix use-after-free in x25_kill_by_neigh() x25_kill_by_neigh() walks the global X.25 socket list looking for sockets attached to a terminating neighbour. x25_list_lock protects list membership while the lookup is in progress, but it does not pin a socket's lifetime after the lock is dropped. The function currently drops x25_list_lock before calling lock_sock(s). A concurrent close can run x25_release(), remove the same socket from x25_list, and drop the last socket reference in that window. The neighbour teardown path can then lock or inspect a freed struct sock/struct x25_sock. Take sock_hold(s) while x25_list_lock still proves that the list entry is live, then drop the temporary reference after the socket has been locked, rechecked, and released. Recheck x25_sk(s)->neighbour after lock_sock(), because another path may have disconnected the socket before this path acquired the socket lock. Restart the list walk after each disconnect because the list lock was dropped and the previous iterator state may no longer be valid. A QEMU/KASAN run against origin/master reproduced a slab-use-after-free in x25_kill_by_neigh(). Fixes: `7781607938` ("net/x25: Fix null-ptr-deref caused by x25_disconnect") Cc: stable@vger.kernel.org Signed-off-by: David Lee <david.lee@trailofbits.com> Assisted-by: Codex:gpt-5.5 Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://patch.msgid.link/20260713104752.241175-1-david.lee@trailofbits.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-07-23 11:58:20 +02:00
Cen Zhang (Microsoft)	9f29cd8a8e	tipc: fix u16 MTU truncation in media and bearer MTU validation Both TIPC_NL_MEDIA_SET and TIPC_NL_BEARER_SET accept user-supplied MTU values but only enforce a minimum bound, not a maximum. When a user sets the MTU to a value exceeding U16_MAX (65535), it passes validation but is silently truncated when assigned to u16 fields l->mtu and l->advertised_mtu in tipc_link_create(). Values like 65536 (0x10000) truncate to 0, causing a division by zero in tipc_link_set_queue_limits() which computes TIPC_MAX_PUBL / (l->mtu / ITEM_SIZE). Other overflowing values (e.g. 65537-131071) produce small incorrect MTU values, resulting in link malfunction behaviors. Crash stack (triggered as unprivileged user via user namespace): tipc_link_set_queue_limits net/tipc/link.c:2531 tipc_link_create net/tipc/link.c:520 tipc_node_check_dest net/tipc/node.c:1279 tipc_disc_rcv net/tipc/discover.c:252 tipc_rcv net/tipc/node.c:2129 tipc_udp_recv net/tipc/udp_media.c:392 Two independent paths lack the upper bound check: 1. tipc_udp_mtu_bad() -- called from __tipc_nl_media_set() (MEDIA_SET) 2. inline check in __tipc_nl_bearer_set() at bearer.c:1160 (BEARER_SET) Fix both by rejecting MTU values above U16_MAX. Fixes: `901271e040` ("tipc: implement configuration of UDP media MTU") Reported-by: AutonomousCodeSecurity@microsoft.com Closes: https://lore.kernel.org/all/CAB8m9WgETt0AjmFwE=F-CKjGXsK6_WDv0=kbYRcC8-noo+amnA@mail.gmail.com Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Cen Zhang (Microsoft) <blbllhy@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260714041541.307702-1-blbllhy@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-07-23 11:41:03 +02:00
Aldo Ariel Panzardo	f43ee0c073	net/sched: serialize qdisc_rtab_list against concurrent get/put qdisc_get_rtab() and qdisc_put_rtab() mutate the process-global singly linked list qdisc_rtab_list and a plain non-atomic 'int refcnt' with no lock. This was only safe because every caller historically held the RTNL mutex, which serialized all rate-table lookups, inserts and frees. That invariant no longer holds. cls_flower sets TCF_PROTO_OPS_DOIT_UNLOCKED, so tc_new_tfilter() keeps rtnl_held == false for it and sets TCA_ACT_FLAGS_NO_RTNL. That flag propagates through tcf_exts_validate_ex() -> tcf_action_init() -> tcf_action_init_1() -> tcf_police_init(), which calls qdisc_get_rtab()/qdisc_put_rtab() with the RTNL mutex NOT held. Two RTM_NEWTFILTER requests on different CPUs, each adding a flower filter with a police action carrying the same rate, then race on qdisc_rtab_list and on the non-atomic refcnt, leading to a use-after-free / double-free of the kmalloc-2k struct qdisc_rate_table. qdisc_rtab_list is a single global (not per-netns), so the corrupted object is shared system-wide. BUG: KASAN: slab-use-after-free in qdisc_put_rtab+0x12f/0x160 qdisc_put_rtab+0x12f/0x160 tcf_police_init+0xda9/0x1590 tcf_action_init_1+0x460/0x6b0 tcf_action_init+0x439/0xa40 tcf_exts_validate_ex+0x42d/0x550 fl_change+0xddd/0x7da0 tc_new_tfilter+0xaa7/0x2420 rtnetlink_rcv_msg+0x95e/0xe90 which belongs to the cache kmalloc-2k of size 2048 Protect qdisc_rtab_list and the refcount with a dedicated spinlock. The (sleeping, GFP_KERNEL) allocation in qdisc_get_rtab() is performed before taking the lock; if a concurrent inserter added an identical table in the meantime the freshly allocated one is freed under the lock, so no duplicate is leaked. qdisc_put_rtab() now decrements the refcount and unlinks under the same lock. Fixes: `470502de5b` ("net: sched: unlock rules update API") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Aldo Ariel Panzardo <qwe.aldo@gmail.com> Cc: stable@vger.kernel.org Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260715114114.446841-1-qwe.aldo@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-22 15:07:53 -07:00
Michael Bommarito	92d3817649	ila: reload IPv6 header after pskb_may_pull in checksum adjust ila_csum_adjust_transport() caches ip6h = ipv6_hdr(skb) before calling pskb_may_pull(). On a non-linear skb whose transport header sits in a page fragment, pskb_may_pull() can call __pskb_pull_tail() / pskb_expand_head() and free the old skb head, leaving ip6h dangling; the following get_csum_diff(ip6h, p) then reads freed memory. ila_update_ipv6_locator() uses ip6h (and the iaddr derived from it) again after the csum-adjust call and additionally writes the new locator through that pointer. Impact: a remote IPv6 packet routed through a configured ILA csum-adjust-transport route or receive-side mapping triggers a slab-use-after-free in ila_update_ipv6_locator() (KASAN). The route or mapping requires CAP_NET_ADMIN to configure, but trigger packets are unauthenticated once it exists. Reload ip6h after each pskb_may_pull() in ila_csum_adjust_transport() before the csum-diff read. In ila_update_ipv6_locator() only the ILA_CSUM_ADJUST_TRANSPORT case pulls the skb, so reload ip6h and iaddr in that case alone before the destination-address write; the neutral-map modes never pull and keep their cached pointers. Fixes: `33f11d1614` ("ila: Create net/ipv6/ila directory") Cc: stable@vger.kernel.org Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20260714114903.3763420-1-michael.bommarito@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-22 14:00:41 -07:00
Qing Luo	8e04823c12	sctp: auth: verify auth requirement when auth_chunk is NULL sctp_auth_chunk_verify() returns true unconditionally when chunk->auth_chunk is NULL, silently skipping authentication. This is incorrect when: 1. skb_clone() failed in the BH receive path, leaving auth_chunk NULL. In sctp_endpoint_bh_rcv() asoc is NULL for new connections, so the early sctp_auth_recv_cid() check cannot catch this. 2. No AUTH chunk precedes COOKIE-ECHO, so skb_clone() is never called and auth_chunk remains NULL. Fix by checking sctp_auth_recv_cid() when auth_chunk is NULL: if authentication is required, return false to drop the chunk; otherwise continue normally. Fixes: `bbd0d59809` ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Qing Luo <luoqing@kylinos.cn> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260721015532.120157-2-l1138897701@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-22 13:16:08 -07:00
Eric Dumazet	dcf15eaf56	net: hsr: fix memory leak on slave unregistration by removing synced VLANs When an HSR master device is brought UP, it auto-adds VLAN 0 via vlan_vid0_add(), which propagates VID 0 to its slave devices (slave A and B). If a slave device is later unregistered while HSR is active (e.g., during netns cleanup or interface destruction), hsr_del_port() is called to detach the slave port from the HSR master. However, hsr_del_port() currently does not delete the VLAN IDs that were synced to the slave device by HSR. As a result, the slave device retains a refcount on VID 0 (and any other synced VLANs). When the slave device is destroyed, its vlan_info / vlan_vid_info structure remains allocated, leading to a memory leak. Fix this by calling vlan_vids_del_by_dev(port->dev, master->dev) in hsr_del_port() before unlinking slave A or slave B ports, matching the propagation logic in hsr_ndo_vlan_rx_add_vid() / hsr_ndo_vlan_rx_kill_vid() and the cleanup behavior in bonding and team drivers. Fixes: `1a8a63a530` ("net: hsr: Add VLAN CTAG filter support") Reported-by: syzbot+456957213f32970c0762@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a4cb6ca.57639fcc.86d58.000b.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Felix Maurer <fmaurer@redhat.com> Link: https://patch.msgid.link/20260721101240.995597-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-22 10:24:44 -07:00
Nikolay Aleksandrov	43171c97e4	net: bridge: vlan: fix vlan range dumps starting with pvid There is a bug in all range dumps that rely on br_vlan_can_enter_range() when the PVID is a range starting VLAN, all following VLANs that match its flags can enter the range, but when the range is filled in only the PVID VLAN is dumped and the rest of the range is discarded because br_vlan_fill_vids() checks for the PVID flag. Since the PVID VLAN can be only one, we need to break ranges around it, the best way to do that consistently for all is to alter br_vlan_can_enter_range() to take into account the PVID and return false to break the range when it's matched. Before the fix: $ ip l add br0 type bridge vlan_filtering 1 $ ip l add dumdum type dummy $ ip l set dumdum master br0 $ ip l set br0 up $ ip l set dumdum up $ bridge vlan add dev dumdum vid 1 pvid untagged master $ bridge vlan add dev dumdum vid 2 untagged master $ bridge vlan show dev dumdum # use legacy dump to show all vlans port vlan-id dumdum 1 PVID Egress Untagged 2 Egress Untagged $ bridge -d vlan show dev dumdum # use the new dump (RTM_GETVLAN) port vlan-id dumdum 1 PVID Egress Untagged state forwarding mcast_router 1 VLAN 2 is missing, and if there are more matching VLANs afterwards they'd be missing too. After the fix: [ same setup steps ] $ bridge vlan show dev dumdum port vlan-id dumdum 1 PVID Egress Untagged 2 Egress Untagged $ bridge -d vlan show dev dumdum # use the new dump (RTM_GETVLAN) port vlan-id dumdum 1 PVID Egress Untagged state forwarding mcast_router 1 2 Egress Untagged state forwarding mcast_router 1 Fixes: `0ab5587951` ("net: bridge: vlan: add rtm range support") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260721140922.682265-2-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-22 10:23:18 -07:00
Jakub Kicinski	9a67bbfe48	bluetooth pull request for net: - hci_sync: Protect UUID list traversal - RFCOMM: Fix session UAF in set_termios - btusb: validate Realtek vendor event length -----BEGIN PGP SIGNATURE----- iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmpfl74ZHGx1aXoudm9u LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKZFrD/9Rz76Gvy+aZyGh2+UuESoJ vHf9SVxbrm9x2wEYE2rAoAQJuJu8ZgqDQdcYfJV83DHgNoiPrhSCRd+4jtPo685S 8GCgzoqNU/xBiKGoFZ8ZGxp0RvmfuUmsxfB72Zv8mUY1wknyl8d+Paro2xkz2N72 aqD20OKjhbqoe9tUwY901Oshct2IQlalaIuc5GjabGH7dGM7c4MALrj+0noBYWEJ XIgdcKe6cum2wnUL+ENEUi1Fg4T2u3b1L3aK8YY8Uz6Afx95vSRFLGThmNblDP4k X1QgPYRuRTZhFVCJdVcsJY/3gGl8pOYQAYZyAvxkJaxmKQSx7tAIMW3iuU6Bn6HX tzgQQ0gXHKbzMx1pEYnxzC5x5UHri4LNjoUhmkcy2J7/LT83pRw021OVyi7j6ll2 dslrATixXFD6qQsVib+N+J07bSSuD6vSy6ys0tEUeWvxLvkocoGv8nsqQf9b+G6Y dbD6REScKk2U9meiEYfoIRFMzEFvZAsjieKoG/GFs9rNJjRfKsKopWxBAxg6Q2+K D2YYlrAc4kIbjE7iPBTMwJD98M7wywY93NQ/AR1+Ba1PNToJcGJp0XQ2LMq+nVte z8C8/0Yud6UVS58aJTHMN9hEuVCmyeE4OYNNfispH8QNi/vNZX5GSgYmnoqcrFaP uG5sKL8odHeNPAUh4xyBbQ== =Au1d -----END PGP SIGNATURE----- Merge tag 'for-net-2026-07-21' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: Protect UUID list traversal - RFCOMM: Fix session UAF in set_termios - btusb: validate Realtek vendor event length * tag 'for-net-2026-07-21' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: btusb: validate Realtek vendor event length Bluetooth: RFCOMM: Fix session UAF in set_termios Bluetooth: hci_sync: Protect UUID list traversal ==================== Link: https://patch.msgid.link/20260721160240.884274-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-22 10:18:59 -07:00
Jakub Kicinski	fec15bb3da	Lots of fixes: - mostly driver security/robustness/warning fixes - ath12k: fix MLO throughput regression - iwlwifi: add UNII-9 to avoid regression - brcmfmac: - fix 802.1X-SHA256 - SDIO fix for some boards - mac80211: - fix traffic indication for sleeping STAs - fix NAN throughput -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmpgjBQACgkQ10qiO8sP aACNoQ//dgLEKav5m9IfUvhZUV/G++3klPkCTUNU3dQesrHrGRxOk1sRuzoODINz rC4IfDtoznIX964Lxc+RGLtTE1zUXJ9lR9YDvjXHyF+9kk8CvMbiKJo37IG4GCTc DsQWL3cSK7X4yDbosd5gCXihjxOKFT65S2e9inFPLUWevi7mbb+Qny/jMd5yqh32 /q/YUKUPSjZqqUFnG1x9nrp4n+0ygBsqUK9o8w6cU9ed7bw2r6fJo+j0BkzxUsPH Ntde80QKNYgqY73qJ0qEybm7o7XldBMSdHnaO0Iz9qOnlTclpyV7zrfca8ELx2Sx GQ/1ws1G+/rg3AWsWkXsu8QrUk0e4WuKFel59BBYyfRsvmDjz9Hta9qW9WMioykQ eGEM0EMJ683BtpYN4RKMmPG20jCxCi8dUBE8OGuRQX/c6exXyVysTNi3rt2oKsls VMtd74DrdjYrm+MwnxAfnB1CIDlgCbbGMvoZMcQLSQTPmSM5822XKAlGCcNQjkL9 HC0D7KItFXp/LYEoELkXGy+D8ta0BVxJTtHzuf5Msx4DgVu+fmFOwyHud13Hfwei JfrMMYjym3Aaa+X5y21HgzkqGmPJ9zg7vD5kXRt2vtwax9NiGxWqI/4EPSHD19l1 m6dhiOzYWmHPBsJMtLuvkG9Xmc8V3K40YDK6Dw/1i9yAbUBIpZE= =BfIb -----END PGP SIGNATURE----- Merge tag 'wireless-2026-07-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Lots of fixes: - mostly driver security/robustness/warning fixes - ath12k: fix MLO throughput regression - iwlwifi: add UNII-9 to avoid regression - brcmfmac: - fix 802.1X-SHA256 - SDIO fix for some boards - mac80211: - fix traffic indication for sleeping STAs - fix NAN throughput * tag 'wireless-2026-07-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (80 commits) wifi: brcmfmac: fix 802.1X-SHA256 call trace warning wifi: mt76: mt7996: fix possible NULL-pointer deref in mt7996_mcu_sta_bfer_eht() wifi: mt76: mt7925: fix crash in reset link replay wifi: mt76: fix airoha_npu dependency tracking wifi: mt76: restrict NPU/PPE active checks to MMIO devices wifi: mt76: fix MAC address for non OF pcie cards wifi: mt76: mt7996: check pointer returned by mt76_connac_get_he_phy_cap() wifi: mt76: mt7925: fix possible NULL-pointer deref in mt7925_mcu_bss_he_tlv() wifi: mt76: connac: fix possible NULL-pointer deref in mt76_connac_mcu_uni_bss_he_tlv() wifi: mt76: mt7915: guard HE capability lookups wifi: mt76: mt7925: guard link STA in decap offload wifi: mt76: Disable napi when removing device wifi: mt76: mt7615: drop TXRX_NOTIFY on non-mmio buses wifi: mt76: mt7925: drop TXRX_NOTIFY on non-mmio buses wifi: mt76: mt7921: drop TXRX_NOTIFY on non-mmio buses wifi: brcmfmac: set F2 blocksize to 256 for BCM43752 wifi: cfg80211: guard optional PMSR nominal time wifi: mac80211_hwsim: reject undersized HWSIM_ATTR_TX_INFO wifi: brcmfmac: drain bus_reset work on device removal wifi: brcmfmac: make release_scratchbuffers idempotent ... ==================== Link: https://patch.msgid.link/20260722092647.119094-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-22 09:07:36 -07:00
Yizhou Zhao	e1a9d3cc11	tcp: initialize standalone TCP-AO response padding tcp_v4_send_ack() and tcp_v6_send_response() construct standalone TCP responses with TCP-AO options. The option length carries the actual MAC length, but the TCP header length includes the option rounded up to a four-byte boundary. tcp_ao_hash_hdr() writes the MAC only. Thus, when the MAC length is not four-byte aligned, the one to three bytes after the MAC are left uninitialized and may be transmitted. For the normal TCP-AO hashing mode, those bytes also have to be initialized before computing the MAC. Initialize only the alignment padding in the TCP-AO branches, before hashing the header. Use TCPOPT_NOP, as in the normal TCP-AO output path. This avoids adding work to non-AO TCP responses while preserving a valid authenticated header. Fixes: `decde2586b` ("net/tcp: Add TCP-AO sign to twsk") Fixes: `da7dfaa6d6` ("net/tcp: Consistently align TCP-AO option in the header") Cc: stable@vger.kernel.org Reported-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn> Reported-by: Ao Wang <wangao@seu.edu.cn> Reported-by: Xuewei Feng <fengxw06@126.com> Reported-by: Qi Li <qli01@tsinghua.edu.cn> Reported-by: Ke Xu <xuke@tsinghua.edu.cn> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260713105631.8616-1-zhaoyz24@mails.tsinghua.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-21 15:24:36 -07:00
Youssef Samir	ff194cffd5	net: qrtr: ns: Raise node count limit to 512 The current node limit of 64 breaks the functionality for a number of AI200 deployments that have up to 384 nodes. Raise the limit to 512. Fixes: `27d5e84e81` ("net: qrtr: ns: Limit the total number of nodes") Cc: stable@vger.kernel.org Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Link: https://patch.msgid.link/20260713145901.212396-1-youssef.abdulrahman@oss.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-21 15:13:14 -07:00
Helen Koike	22f8aa3596	tipc: fix infinite loop in __tipc_nl_compat_dumpit cmd->dumpit callback can return a negative errno, causing an infinite loop due to the while(len) condition. As the loop never terminates, genl_mutex is never released, and other tasks waiting on it starve in D state. Check dumpit's return value, propagate it and jump to err_out on error. Reported-by: syzbot+85d0bec020d805014a3a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=85d0bec020d805014a3a Fixes: `d0796d1ef6` ("tipc: convert legacy nl bearer dump to nl compat") Signed-off-by: Helen Koike <koike@igalia.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20260713204940.647668-1-koike@igalia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-07-21 15:12:23 -07:00

1 2 3 4 5 ...

85644 Commits