Commit Graph

1446950 Commits

Author SHA1 Message Date
Lorenzo Bianconi
4ca01292ea net: airoha: Do not return err in ndo_stop() callback
Always complete the airoha_dev_stop() routine regardless of the
airoha_set_vip_for_gdm_port() return value, since errors from
ndo_stop() are ignored by the networking stack and the interface is
always considered down after the call.

Fixes: 23020f0493 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260428-airoha-ndo-stop-not-err-v1-1-674506d29a91@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-29 18:29:54 -07:00
Hamza Mahfooz
b31681206e hv_sock: fix ARM64 support
VMBUS ring buffers must be page aligned. Therefore, the current value of
24K presents a challenge on ARM64 kernels (with 64K pages). So, use
VMBUS_RING_SIZE() to ensure they are always aligned and large enough to
hold all of the relevant data.

Cc: stable@vger.kernel.org
Fixes: 77ffe33363 ("hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication")
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260428125339.13963-1-hamzamahfooz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-29 17:30:45 -07:00
Jakub Kicinski
e73cafaf4a MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
The IPv4/IPv6 and routing code is not very well separated from
the TCP/UDP code. Scope it down properly by providing a more
accurate file list, instead of net/ipv4/ and net/ipv6/

Now that the entry is more accurately representing layer 3
and routing merge in the nexthop entry into it.

Add Ido Schimmel as a co-maintainer, Ido's git history speaks
for itself.

Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260428203924.1229169-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-29 16:52:33 -07:00
Jakub Kicinski
72e9647e2b selftests: drv-net: clarify linters and frameworks in README
Minor clarifications in the README:
 - call out what linters we expect to be clean
 - make it clear that by "frameworks" we mean code under lib/
   not just factoring code out in the same file

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-29 16:48:21 -07:00
David Heidelberg
c3388f8c1c MAINTAINERS: Add myself as NFC subsystem maintainer
Add myself and update the mailing list.

Signed-off-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-29 16:46:51 -07:00
Jakub Kicinski
735a309b4b net: add net_iov_init() and use it to initialize ->page_type
Commit db359fccf2 ("mm: introduce a new page type for page pool in
page type") added a page_type field to struct net_iov at the same
offset as struct page::page_type, so that page_pool_set_pp_info() can
call __SetPageNetpp() uniformly on both pages and net_iovs.

The page-type API requires the field to hold the UINT_MAX "no type"
sentinel before a type can be set; for real struct page that invariant
is established by the page allocator on free. struct net_iov is not
allocated through the page allocator, so the field is left as zero
(io_uring zcrx, which uses __GFP_ZERO) or as slab garbage (devmem,
which uses kvmalloc_objs() without zeroing). When the page pool then
calls page_pool_set_pp_info() on a freshly-bound niov,
__SetPageNetpp()'s VM_BUG_ON_PAGE(page->page_type != UINT_MAX) fires
and the kernel BUGs. Triggered in selftests by io_uring zcrx setup
through the fbnic queue restart path:

 kernel BUG at ./include/linux/page-flags.h:1062!
 RIP: 0010:page_pool_set_pp_info (./include/linux/page-flags.h:1062
                                  net/core/page_pool.c:716)
 Call Trace:
  <TASK>
  net_mp_niov_set_page_pool (net/core/page_pool.c:1360)
  io_pp_zc_alloc_netmems (io_uring/zcrx.c:1089 io_uring/zcrx.c:1110)
  fbnic_fill_bdq (./include/net/page_pool/helpers.h:160
                  drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:906)
  __fbnic_nv_restart (drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:2470
                      drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:2874)
  fbnic_queue_start (drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:2903)
  netdev_rx_queue_reconfig (net/core/netdev_rx_queue.c:137)
  __netif_mp_open_rxq (net/core/netdev_rx_queue.c:234)
  io_register_zcrx (io_uring/zcrx.c:818 io_uring/zcrx.c:903)
  __io_uring_register (io_uring/register.c:931)
  __do_sys_io_uring_register (io_uring/register.c:1029)
  do_syscall_64 (arch/x86/entry/syscall_64.c:63
                 arch/x86/entry/syscall_64.c:94)
  </TASK>

The same path is reachable through devmem dmabuf binding via
netdev_nl_bind_rx_doit() -> net_devmem_bind_dmabuf_to_queue().

Add a net_iov_init() helper that stamps ->owner, ->type and the
->page_type sentinel, and use it from both the devmem and io_uring
zcrx niov init loops.

Fixes: db359fccf2 ("mm: introduce a new page type for page pool in page type")
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/20260428025320.853452-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-29 16:40:08 -07:00
Weiming Shi
1d47b55b36 netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
nft_fwd_neigh can be used in egress chains (NF_NETDEV_EGRESS). When the
forwarding rule targets the same device or two devices forward to each
other, neigh_xmit() triggers dev_queue_xmit() which re-enters
nf_hook_egress(), causing infinite recursion and stack overflow.

Move the nf_get_nf_dup_skb_recursion() accessor and NF_RECURSION_LIMIT
to the shared header nf_dup_netdev.h as a static inline, so that
nft_fwd_netdev can use the recursion counter directly without exported
function call overhead. Guard neigh_xmit() with the same recursion
limit already used in nf_do_netdev_egress().

[ Updated to cache the nf_get_nf_dup_skb_recursion pointer. --pablo ]

Fixes: f87b9464d1 ("netfilter: nft_fwd_netdev: Support egress hook")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30 00:57:42 +02:00
Pablo Neira Ayuso
0a0b35f0bf netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
The ttl field has been decremented already and evaluation of this rule
would proceed, just drop this packet instead if there is no destination
device to forwards this packet. This is exactly what nf_dup already does
in this case.

Moreover, check for headroom and call skb_expand_head() like in the IP
output path to ensure there is sufficient headroom when forwarding this
via neigh_xmit().

Fixes: d32de98ea7 ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30 00:57:42 +02:00
Pablo Neira Ayuso
1049970d75 netfilter: replace skb_try_make_writable() by skb_ensure_writable()
skb_try_make_writable() only works on clones and uncloned packets might
have their network header in paged fragments.

nft_fwd needs to work for the ingress and egress hooks, but the egress
hook where skb->data points to the mac header, use skb_network_offset()
to include the mac header. The flowtable is fine since it already uses
the transport offset.

Fixes: d32de98ea7 ("netfilter: nft_fwd_netdev: allow to forward packets via neighbour layer")
Fixes: 7d20868717 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-30 00:57:42 +02:00
Shyam Prasad N
c208a2b958 cifs: change_conf needs to be called for session setup
Today we skip calling change_conf for negotiates and session setup
requests. This can be a problem for mchan as the immediate next call
after session setup could be due to an I/O that is made on the
mount point. For single channel, this is not a problem as
there will be several calls after setting up session.

This change enforces calling change_conf when the total credits contain
enough for reservations for echoes and oplocks. We expect this to happen
during the last session setup response. This way, echoes and oplocks are
not disabled before the first request to the server. So if that first
request is an open, it does not need to disable requesting leases.

Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-29 17:26:37 -05:00
Fredric Cover
8e13b1b409 smb: client: change allocation requirements in smb2_compound_op
Currently, smb2_compound_op() allocates
struct smb2_compound_vars *vars using GFP_ATOMIC, although
smb2_compound_op() can sleep when it calls compound_send_recv()
before vars is freed.

Allocate vars using GFP_KERNEL.

Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-29 17:26:07 -05:00
Nathan Chancellor
9e9354075d ntfs: Use return instead of goto in ntfs_mapping_pairs_decompress()
Clang warns (or errors with CONFIG_WERROR=y / W=e):

  fs/ntfs/runlist.c:755:6: error: variable 'rl' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
    755 |         if (overflows_type(lowest_vcn, vcn)) {
        |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ...
  fs/ntfs/runlist.c:971:9: note: uninitialized use occurs here
    971 |         kvfree(rl);
        |                ^~
  ...

rl has not been allocated at this point so the 'goto err_out' should
really just be a return of the error pointer -EIO.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-30 07:05:48 +09:00
Hyunchul Lee
4ebcf3f949 ntfs: drop nlink once for WIN32/DOS aliases
NTFS could store a filename as paired WIN32 and DOS $FILE_NAME attributes
for directories. But ntfs_delete() deleted both attributes for unlinking
a directory, but it also called drop_nlink() for each attributes.
This could trigger warnings when unlinking directories.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-30 07:05:46 +09:00
Steven Rostedt
b2aa3b4d64 tracing/probes: Limit size of event probe to 3K
There currently isn't a max limit an event probe can be. One could make an
event greater than PAGE_SIZE, which makes the event useless because if
it's bigger than the max event that can be recorded into the ring buffer,
then it will never be recorded.

A event probe should never need to be greater than 3K, so make that the
max size. As long as the max is less than the max that can be recorded
onto the ring buffer, it should be fine.

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 93ccae7a22 ("tracing/kprobes: Support basic types on dynamic events")
Link: https://patch.msgid.link/20260428122302.706610ba@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-04-29 16:07:38 -04:00
Tejun Heo
20e81c64c9 workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
alloc_workqueue_va() forwards its va_list to __alloc_workqueue() which
ultimately feeds vsnprintf(). __alloc_workqueue() already carries
__printf(1, 0); the new wrapper needs the same annotation so format
string checking propagates through the forwarding.

Fixes: 0de4cb473a ("workqueue: fix devm_alloc_workqueue() va_list misuse")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202604300347.2LgXyteh-lkp@intel.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-29 09:44:16 -10:00
Michael Guralnik
6009cca96f RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
Raw Packet QPs are unique in that they support separate send and receive
queues, using 2 different user-provided buffers.
They can also be created with one of the queues having size 0, allowing
a send-only or receive-only QP.

The Raw Packet RQ umem is created in the common user QP creation path,
which allows zero-length queues. Add a later validation of the RQ umem
in Raw Packet QP creation path when an RQ was requested.

This prevents possible null-ptr dereference crashes, as seen in the
below trace:

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
  CPU: 6 UID: 0 PID: 3539 Comm: raw_packet_umem Not tainted 6.19.0-rc1+ #166 NONE
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib]
  Code: ff df 41 57 49 89 ff 41 56 41 55 41 89 d5 41 54 4d 89 cc 4c 8d 4f 30 55 4c 89 ca 48 89 f5 53 48 c1 ea 03 48 89 cb 48 83 ec 18 <80> 3c 02 00 44 89 04 24 0f 85 01 02 00 00 48 ba 00 00 00 00 00 fc
  RSP: 0018:ff1100013966f4e0 EFLAGS: 00010282
  RAX: dffffc0000000000 RBX: 00000000ffffffc0 RCX: 00000000ffffffc0
  RDX: 0000000000000006 RSI: 00000ffffffff000 RDI: 0000000000000000
  RBP: 00000ffffffff000 R08: 0000000000000040 R09: 0000000000000030
  R10: 0000000000000000 R11: 0000000000000000 R12: ff1100013966f648
  R13: 0000000000000005 R14: ff1100013966f980 R15: 0000000000000000
  FS:  00007fae6c82f740(0000) GS:ff11000898ba1000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000200000000000 CR3: 000000010f96c005 CR4: 0000000000373eb0
  Call Trace:
   <TASK>
   create_qp+0x747d/0xc740 [mlx5_ib]
   ? is_module_address+0x18/0x110
   ? _create_user_qp.constprop.0+0x18e0/0x18e0 [mlx5_ib]
   ? __module_address+0x49/0x210
   ? is_module_address+0x68/0x110
   ? static_obj+0x67/0x90
   ? lockdep_init_map_type+0x58/0x200
   mlx5_ib_create_qp+0xc85/0x2620 [mlx5_ib]
   ? find_held_lock+0x2b/0x80
   ? create_qp+0xc740/0xc740 [mlx5_ib]
   ? lock_release+0xcb/0x260
   ? lockdep_init_map_type+0x58/0x200
   ? __init_swait_queue_head+0xcb/0x150
   create_qp.part.0+0x558/0x7c0 [ib_core]
   ib_create_qp_user+0xa0/0x4f0 [ib_core]
   ? rdma_lookup_get_uobject+0x1e4/0x400 [ib_uverbs]
   create_qp+0xe4f/0x1d10 [ib_uverbs]
   ? ib_uverbs_rereg_mr+0xd40/0xd40 [ib_uverbs]
   ? ib_uverbs_cq_event_handler+0x120/0x120 [ib_uverbs]
   ? __might_fault+0x81/0x100
   ? lock_release+0xcb/0x260
   ? _copy_from_user+0x3e/0x90
   ib_uverbs_create_qp+0x10a/0x150 [ib_uverbs]
   ? ib_uverbs_ex_create_qp+0xe0/0xe0 [ib_uverbs]
   ? __might_fault+0x81/0x100
   ? lock_release+0xcb/0x260
   ib_uverbs_write+0x7e5/0xc90 [ib_uverbs]
   ? uverbs_devnode+0xc0/0xc0 [ib_uverbs]
   ? lock_acquire+0xfa/0x2b0
   ? find_held_lock+0x2b/0x80
   ? finish_task_switch.isra.0+0x189/0x6c0
   vfs_write+0x1c0/0xf70
   ? lockdep_hardirqs_on_prepare+0xde/0x170
   ? kernel_write+0x5a0/0x5a0
   ? __switch_to+0x527/0xe60
   ? __schedule+0x10a3/0x3950
   ? io_schedule_timeout+0x110/0x110
   ksys_write+0x170/0x1c0
   ? __x64_sys_read+0xb0/0xb0
   ? trace_hardirqs_off.part.0+0x4e/0xe0
   do_syscall_64+0x70/0x1360
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7fae6ca3118d
  Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5b cc 0c 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe678ca308 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 00007ffe678ca448 RCX: 00007fae6ca3118d
  RDX: 0000000000000070 RSI: 0000200000000280 RDI: 0000000000000003
  RBP: 00007ffe678ca320 R08: 00000000ffffffff R09: 00007fae6c8ec5b8
  R10: 0000000000000064 R11: 0000000000000213 R12: 0000000000000001
  R13: 0000000000000000 R14: 00007fae6cb71000 R15: 0000000000404df0
   </TASK>
  Modules linked in: mlx5_ib mlx5_fwctl mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_ucm ib_uverbs rdma_cm iw_cm ib_ipoib ib_cm ib_umad ib_core rpcsec_gss_krb5 auth_rpcgss oid_registry overlay nfnetlink zram zsmalloc fuse scsi_transport_iscsi [last unloaded: mlx5_core]
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib]

Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-5-4621fa52de0e@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29 16:37:12 -03:00
Michael Guralnik
1f3b337af2 RDMA/core: Fix rereg_mr use-after-free race
When a driver creates a new MR during rereg_user_mr, a race window
exists between rdma_alloc_commit_uobject() for the new MR and the point
where the code reads that MR to populate the response keys.

A concurrent rereg_mr or destroy_mr could destroy the MR in this window
and cause UAF in the first thread.

Racing flow between two rereg_mr calls:

 CPU0                           CPU1
 ----                           ----
 rereg_user_mr(mr_handle)
   uobj_get_write(mr_handle) -> mr0
   mr1 = driver→rereg()
   rdma_alloc_commit_uobject(mr1)
   // mr1 replaced mr0 and is unlocked
   uobj_put_destroy(mr0)
                                rereg_user_mr(mr_handle)
                                  uobj_get_write(mr_handle) -> mr1
                                  mr2 = driver→rereg()
                                  rdma_alloc_commit_uobject(mr2)
                                  // mr2 replaced mr1 and is unlocked
                                  uobj_put_destroy(mr1)
                                  // Destroys mr1!

   resp.lkey = mr1->lkey; // UAF - mr1 was freed!
   resp.rkey = mr1->rkey; // UAF - mr1 was freed!

Fix by storing lkey/rkey in local variables before the new MR is
unlocked and using the local variables to set the user response.

Fixes: 6e0954b11c ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr")
Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-4-4621fa52de0e@nvidia.com
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29 16:37:12 -03:00
Maher Sanalla
610771c62e IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a
netlink request to the userspace daemon to perform IP-to-GID
resolution in certain cases. The function allocates the netlink message
buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the
size of size_t) instead of 16 bytes (the size of an IPv6 address).
This results in an 8-byte under-allocation.

This is currently masked by nlmsg_new() over-allocation of the skb
in its internal logic. However, the code remains incorrect.

Fix the issue by supplying the proper IPv6 address length to
nla_total_size().

Fixes: ae43f82867 ("IB/core: Add IP to GID netlink offload")
Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-3-4621fa52de0e@nvidia.com
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29 16:37:12 -03:00
Edward Srouji
9bee81cc5e RDMA/mlx5: Fix UAF in DCT destroy due to race with create
A potential race condition exists between mlx5_core_destroy_dct() and
mlx5_core_create_dct() that can lead to a use-after-free.

After _mlx5_core_destroy_dct() releases the DCT to firmware, the DCTN
can be immediately reallocated for a new DCT being created concurrently.
If the create path stores the new DCT in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new DCT's entry.
Later accesses then hit freed memory.

Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created DCT.

Fixes: afff248998 ("RDMA/mlx5: Handle DCT QP logic separately from low level QP interface")
Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-2-4621fa52de0e@nvidia.com
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29 16:37:11 -03:00
Edward Srouji
38694f4639 RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
A race condition exists between mlx5_cmd_destroy_srq() and
mlx5_cmd_create_srq() that can lead to a use-after-free (UAF) [1].

After destroy_srq_split() releases the SRQ to firmware, the SRQN can be
immediately reallocated for a new SRQ being created concurrently. If the
create path stores the new SRQ in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new SRQ's entry.
Later accesses then hit freed memory.

Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created SRQ.

[1] RIP: 0010:mlx5_cmd_destroy_srq+0xd8/0x110 [mlx5_ib]
Code: 89 e1 ba 06 04 00 00 4c 89 f6 48 89 ef e8 80 19 70 e1 c6 83 a0 0f 00 00 00 fb 5b 44 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc <0f> 0b 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 05 c0 ff ff 77 08
RSP: 0018:ff110001037b7d08 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ff1100010bb9c000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff110001037b7c90
RBP: ff1100010bb9cfa0 R08: 0000000000000000 R09: 0000000000000000
R10: ff110001037b7da0 R11: ff11000104f29580 R12: ff1100010e2ac090
R13: 000000000000000d R14: 0000000000000001 R15: ff11000105336300
FS:  00007fa24787c740(0000) GS:ff1100046eb8d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa247984e90 CR3: 0000000109d59005 CR4: 0000000000373eb0
Call Trace:
 <TASK>
 mlx5_ib_destroy_srq+0x25/0xa0 [mlx5_ib]
 ib_destroy_srq_user+0x21/0x90 [ib_core]
 uverbs_free_srq+0x1b/0x50 [ib_uverbs]
 destroy_hw_idr_uobject+0x1e/0x50 [ib_uverbs]
 uverbs_destroy_uobject+0x35/0x180 [ib_uverbs]
 __uverbs_cleanup_ufile+0xdd/0x140 [ib_uverbs]
 uverbs_destroy_ufile_hw+0x38/0xf0 [ib_uverbs]
 ib_uverbs_close+0x17/0xa0 [ib_uverbs]
 __fput+0xe0/0x2a0
 __x64_sys_close+0x3a/0x80
 do_syscall_64+0x55/0xac0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa247984ea4
Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d a5 51 0e 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d
RSP: 002b:00007ffecfa79498 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000200000000080 RCX: 00007fa247984ea4
RDX: 0000000000000040 RSI: 0000200000000200 RDI: 0000000000000003
RBP: 00007ffecfa794e0 R08: 00007ffecfa794e0 R09: 00007ffecfa794e0
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 0000200000000000 R15: 0000200000000009
 </TASK>
 ---[ end trace 0000000000000000 ]---

Fixes: fd89099d63 ("RDMA/mlx5: Issue FW command to destroy SRQ on reentry")
Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-1-4621fa52de0e@nvidia.com
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29 16:37:11 -03:00
Dmitry Torokhov
76b0d0baa9 Input: elan_i2c - validate firmware size before use
Ensure that the firmware file is large enough to contain the expected
number of pages and the signature (which resides at the end of the
firmware blob) before accessing them to prevent potential out-of-bounds
reads.

Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/ae2dOgiFvXRm4BHo@google.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2026-04-29 11:53:33 -07:00
Jia Yao
662f9ddc80
drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR
Add validation in xe_vm_bind_ioctl() to reject PAT indices
with XE_COH_NONE coherency mode when used with
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR.

CPU address mirror mappings use system memory that is CPU
cached, which makes them incompatible with COH_NONE PAT
indices. Allowing COH_NONE with CPU cached buffers is a
security risk, as the GPU may bypass CPU caches and read
stale sensitive data from DRAM.

Although CPU_ADDR_MIRROR does not create an immediate
mapping, the backing system memory is still CPU cached.
Apply the same PAT coherency restrictions as
DRM_XE_VM_BIND_OP_MAP_USERPTR.

v2:
- Correct fix tag

v6:
- No change

v7:
- Correct fix tag

v8:
- Rebase

v9:
- Limit the restrictions to iGPU

v10:
- Just add the iGPU logic but keep dGPU logic

Fixes: b43e864af0 ("drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR")
Cc: <stable@vger.kernel.org> # v6.15+
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Mathew Alwin <alwin.mathew@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260417055917.2027459-3-jia.yao@intel.com
(cherry picked from commit 4d58d7535e826a3175527b6174502f0db319d7f6)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:21 -04:00
Jia Yao
4e5591c2fc
drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise
Add validation in xe_vm_madvise_ioctl() to reject PAT indices with
XE_COH_NONE coherency mode when applied to CPU cached memory.

Using coh_none with CPU cached buffers is a security issue. When the
kernel clears pages before reallocation, the clear operation stays in
CPU cache (dirty). GPU with coh_none can bypass CPU caches and read
stale sensitive data directly from DRAM, potentially leaking data from
previously freed pages of other processes.

This aligns with the existing validation in vm_bind path
(xe_vm_bind_ioctl_validate_bo).

v2(Matthew brost)
- Add fixes
- Move one debug print to better place

v3(Matthew Auld)
- Should be drm/xe/uapi
- More Cc

v4(Shuicheng Lin)
- Fix kmem leak issues by the way

v5
- Remove kmem leak because it has been merged by another patch

v6
- Remove the fix which is not related to current fix

v7
- No change

v8
- Rebase

v9
- Limit the restrictions to iGPU

v10
- No change

Fixes: ada7486c56 ("drm/xe: Implement madvise ioctl for xe")
Cc: <stable@vger.kernel.org> # v6.18+
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Mathew Alwin <alwin.mathew@intel.com>
Cc: Michal Mrozek <michal.mrozek@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260417055917.2027459-2-jia.yao@intel.com
(cherry picked from commit 016ccdb674b8c899940b3944952c96a6a490d10a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:21 -04:00
Tvrtko Ursulin
0df99689eb
drm/xe/xelp: Fix Wa_18022495364
Command parser relative MMIO addressing needs to be enabled when writing
to the register.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: ca33cd271e ("drm/xe/xelp: Add Wa_18022495364")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260420131603.70357-1-tvrtko.ursulin@igalia.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 5627392001802a98ed6cf8cf79a303abd00d1c0f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:21 -04:00
Shuicheng Lin
3762d6c365
drm/xe/gsc: Fix BO leak on error in query_compatibility_version()
When xe_gsc_read_out_header() fails, query_compatibility_version()
returns directly instead of jumping to the out_bo label. This skips
the xe_bo_unpin_map_no_vm() call, leaving the BO pinned and mapped
with no remaining reference to free it.

Fix by using goto out_bo so the error path properly cleans up the BO,
consistent with the other error handling in the same function.

Fixes: 0881cbe040 ("drm/xe/gsc: Query GSC compatibility version")
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260417163308.3416147-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 8de86d0a843c32ca9d36864bdb92f0376a830bce)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:20 -04:00
Shuicheng Lin
dc2d9842c6
drm/xe/eustall: Fix drm_dev_put called before stream disable in close
In xe_eu_stall_stream_close(), drm_dev_put() is called before the
stream is disabled and its resources are freed. If this drops the
last reference, the device structures could be freed while the
subsequent cleanup code still accesses them, leading to a
use-after-free.

Fix this by moving drm_dev_put() after all device accesses are
complete. This matches the ordering in xe_oa_release().

Fixes: 9a0b11d4cf ("drm/xe/eustall: Add support to init, enable and disable EU stall sampling")
Cc: Harish Chegondi <harish.chegondi@intel.com>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Link: https://patch.msgid.link/20260415225428.3399934-1-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 35aff528f7297e949e5e19c9cd7fd748cf1cf21c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:20 -04:00
Shuicheng Lin
f3cc22d4df
drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()
Two error handling issues exist in xe_exec_queue_create_ioctl():

1. When xe_hw_engine_group_add_exec_queue() fails, the error path jumps
   to put_exec_queue which skips xe_exec_queue_kill(). If the VM is in
   preempt fence mode, xe_vm_add_compute_exec_queue() has already added
   the queue to the VM's compute exec queue list. Skipping the kill
   leaves the queue on that list, leading to a dangling pointer after
   the queue is freed.

2. When xa_alloc() fails after xe_hw_engine_group_add_exec_queue() has
   succeeded, the error path does not call
   xe_hw_engine_group_del_exec_queue() to remove the queue from the hw
   engine group list. The queue is then freed while still linked into
   the hw engine group, causing a use-after-free.

Fix both by:
- Changing the xe_hw_engine_group_add_exec_queue() failure path to jump
  to kill_exec_queue so that xe_exec_queue_kill() properly removes the
  queue from the VM's compute list.
- Adding a del_hw_engine_group label before kill_exec_queue for the
  xa_alloc() failure path, which removes the queue from the hw engine
  group before proceeding with the rest of the cleanup.

Fixes: 7970cb3696 ("'drm/xe/hw_engine_group: Register hw engine group's exec queues")
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260408020647.3397933-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 37c831f401746a45d510b312b0ed7a77b1e06ec8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:20 -04:00
Shuicheng Lin
111ab67847
drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import()
When xe_dma_buf_init_obj() fails, the attachment from
dma_buf_dynamic_attach() is not detached. Add dma_buf_detach() before
returning the error. Note: we cannot use goto out_err here because
xe_dma_buf_init_obj() already frees bo on failure, and out_err would
double-free it.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Mattheq Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260408175255.3402838-5-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit a828eb185aac41800df8eae4b60501ccc0dbbe51)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:20 -04:00
Shuicheng Lin
93a528f67c
drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure
When drm_gpuvm_resv_object_alloc() fails, the pre-allocated storage bo
is not freed. Add xe_bo_free(storage) before returning the error.

xe_dma_buf_init_obj() calls xe_bo_init_locked(), which frees the bo on
error. Therefore, xe_dma_buf_init_obj() must also free the bo on its own
error paths. Otherwise, since xe_gem_prime_import() cannot distinguish
whether the failure originated from xe_dma_buf_init_obj() or from
xe_bo_init_locked(), it cannot safely decide whether the bo should be
freed.

Add comments documenting the ownership semantics: on success, ownership
of storage is transferred to the returned drm_gem_object; on failure,
storage is freed before returning.

v2: Add comments to explain the free logic.

Fixes: eb289a5f6c ("drm/xe: Convert xe_dma_buf.c for exhaustive eviction")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260408175255.3402838-4-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 78a6c5f899f22338bbf48b44fb8950409c5a69b9)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:20 -04:00
Shuicheng Lin
1d0adf2fd9
drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked()
When XE_BO_FLAG_GGTT_ALL is set without XE_BO_FLAG_GGTT, the function
returns an error without freeing a caller-provided bo, violating the
documented contract that bo is freed on failure.

Add xe_bo_free(bo) before returning the error.

Fixes: 5a3b0df25d ("drm/xe: Allow bo mapping on multiple ggtts")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260408175255.3402838-3-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 3fbd6cf43cac7b60757f3ce3d95195d3843a902c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:20 -04:00
Shuicheng Lin
09a8f3c1c1
drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked()
When type is ttm_bo_type_device and aligned_size != size, the function
returns an error without freeing a caller-provided bo, violating the
documented contract that bo is freed on failure.

Add xe_bo_free(bo) before returning the error.

Fixes: 4e03b58414 ("drm/xe/uapi: Reject bo creation of unaligned size")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260408175255.3402838-2-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 601c2aa087b6f21014300a3f107a08ee4dde7bdf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:20 -04:00
Shuicheng Lin
f8c4151d50
drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked
xe_exec_queue_tlb_inval_last_fence_put_unlocked() uses q->vm->xe as the
first argument to xe_assert(). This function is called unconditionally
from xe_exec_queue_destroy() for all queues, including kernel queues
that have q->vm == NULL (e.g., queues created during GT init in
xe_gt_record_default_lrcs() with vm=NULL).

While current compilers optimize away the q->vm->xe dereference (even
in CONFIG_DRM_XE_DEBUG=y builds, the compiler pushes the dereference
into the WARN branch that is only taken when the assert condition is
false), the code is semantically incorrect and constitutes undefined
behavior in the C abstract machine for the NULL pointer case.

Use gt_to_xe(q->gt) instead, which is always valid for any exec queue.
This is consistent with how xe_exec_queue_destroy() itself obtains the
xe_device pointer in its own xe_assert at the top of the function.

Fixes: b2d7ec41f2 ("drm/xe: Attach last fence to TLB invalidation job queues")
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260409003449.3405767-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 96078a1c68bf97f17fd1d08c3f58f5c5cc9ccd65)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:19 -04:00
Satyanarayana K V P
1460eae74f
drm/xe/vf: Use drm mm instead of drm sa for CCS read/write
The suballocator algorithm tracks a hole cursor at the last allocation
and tries to allocate after it. This is optimized for fence-ordered
progress, where older allocations are expected to become reusable first.

In fence-enabled mode, that ordering assumption holds. In fence-disabled
mode, allocations may be freed in arbitrary order, so limiting allocation
to the current hole window can miss valid free space and fail allocations
despite sufficient total space.

Use DRM memory manager instead of sub-allocator to get rid of this issue
as CCS read/write operations do not use fences.

Fixes: 864690cf4d ("drm/xe/vf: Attach and detach CCS copy commands with BO")
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260408110145.1639937-6-satyanarayana.k.v.p@intel.com
(cherry picked from commit 6c84b493012aeb05dec29c709377bf0e17ac6815)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:19 -04:00
Satyanarayana K V P
36c6bac158
drm/xe: Add memory pool with shadow support
Add a memory pool to allocate sub-ranges from a BO-backed pool
using drm_mm.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260408110145.1639937-5-satyanarayana.k.v.p@intel.com
(cherry picked from commit 1ce3229f8f269a245ff3b8c65ffae36b4d6afb93)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 12:51:19 -04:00
Sudeep Holla
6d3daa9b8d firmware: arm_ffa: Unregister bus notifier on teardown for FF-A v1.0
For FF-A v1.0 the driver registers a bus notifier to backfill UUID
matching, but the notifier was never unregistered on cleanup paths.
Track the registration state and unregister it during teardown and early
partition-setup failure.

Fixes: 9dd15934f6 ("firmware: arm_ffa: Move the FF-A v1.0 NULL UUID workaround to bus notifier")
Link: https://patch.msgid.link/20260428-ffa_fixes-v2-5-8595ae450034@kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-04-29 16:50:34 +01:00
Sudeep Holla
9985d5357e firmware: arm_ffa: Fix per-vcpu self notifications handling in workqueue
Per-vcpu notification handling already runs from a per-cpu work item on
the target cpu. Routing that path back through smp_call_function_single()
re-enters the call-function IPI path and executes the notification
handler with interrupts disabled. That makes the framework path unsafe,
since it takes a mutex, allocates memory with GFP_KERNEL, and invokes
client callbacks.

Handle per-vcpu self notifications directly from the existing per-cpu
work item instead. This keeps the per-vcpu path in task context and
avoids the extra IPI hop entirely.

Fixes: 3a3e2b83e8 ("firmware: arm_ffa: Avoid queuing work when running on the worker queue")
Link: https://patch.msgid.link/20260428-ffa_fixes-v2-4-8595ae450034@kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-04-29 16:50:34 +01:00
Sudeep Holla
9b5597af8b firmware: arm_ffa: Avoid collapsing NPI work from different CPUs
Notification pending interrupts are registered as per-CPU IRQs, but the
driver queues all NPI handling through a single shared work_struct.

That allows queue_work_on() calls from different CPUs to collapse onto a
single pending work item even though the work function uses the CPU it
runs on to fetch and handle per-CPU notifications.

Move notif_pcpu_work into the per-CPU ffa_pcpu_irq state and initialize
one work item per CPU. This keeps NPI handling independent per CPU and
avoids losing notifications when multiple CPUs queue work concurrently.

Link: https://patch.msgid.link/20260428-ffa_fixes-v2-3-8595ae450034@kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-04-29 16:50:34 +01:00
Sudeep Holla
09527e2c53 firmware: arm_ffa: Skip free_pages on RX buffer alloc failure
If the RX buffer allocation fails in ffa_init(), the error path jumps to
free_pages even though no buffer has been allocated yet. Route that case
directly to free_drv_info so the cleanup path is only used after at
least one RX/TX buffer allocation has succeeded.

Fixes: 3bbfe98710 ("firmware: arm_ffa: Add initial Arm FFA driver support")
Link: https://patch.msgid.link/20260428-ffa_fixes-v2-2-8595ae450034@kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-04-29 16:50:34 +01:00
Sudeep Holla
0a5e695095 firmware: arm_ffa: Check for NULL FF-A ID table while driver registration
The bus match callback assumes that every FF-A driver provides an
id_table and dereferences it unconditionally. Enforce that contract at
registration time so a buggy client driver cannot crash the bus during
match.

Fixes: 9274307146 ("firmware: arm_ffa: Ensure drivers provide a probe function")
Link: https://patch.msgid.link/20260428-ffa_fixes-v2-1-8595ae450034@kernel.org
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-04-29 16:50:34 +01:00
Matt Roper
03f2499c51
drm/xe/debugfs: Correct printing of register whitelist ranges
The register-save-restore debugfs prints whitelist entries as offset
ranges.  E.g.,

        REG[0x39319c-0x39319f]: allow read access

for a single dword-sized register.  However the GENMASK value used to
set the lower bits to '1' for the upper bound of the whitelist range
incorrectly included one more bit than it should have, causing the
whitelist ranges to sometimes appear twice as large as they really were.
For example,

        REG[0x6210-0x6217]: allow rw access

was also intended to be a single dword-sized register whitelist (with a
range 0x6210-0x6213) but was printed incorrectly as a qword-sized range
because one too many bits was flipped on.  Similar 'off by one' logic
was applied when printing 4-dword register ranges and 64-dword register
ranges as well.

Correct the GENMASK logic to print these ranges in debugfs correctly.
No impact outside of correcting the misleading debugfs output.

Fixes: d855d2246e ("drm/xe: Print whitelist while applying")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20260408-regsr_wl_range-v1-1-e9a28c8b4264@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 1a2a722ff96749734a5585dfe7f0bea7719caa8b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 11:28:24 -04:00
Matt Roper
9407936237
drm/xe: Mark ROW_CHICKEN5 as a masked register
ROW_CHICKEN5 is a masked register (i.e., to adjust the value of any of
the lower 16 bits, the corresponding bit in the upper 16 bits must also
be set).  Add the XE_REG_OPTION_MASKED to its definition; failure to do
so will cause workaround updates of this register to not apply properly.

Bspec: 56853
Fixes: 835cd6cbb0 ("drm/xe/xe3p_lpg: Add initial workarounds for graphics version 35.10")
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20260410-xe3p_tuning-v1-3-e206a62ee38f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit cd84bfbba7feb4c1e72356f14de026dfda1a9e2a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 11:28:17 -04:00
Matt Roper
2299d73562
drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL
From Xe2 onward (i.e., all platforms officially supported by the Xe
driver), the GAMSTLB_CTRL register is located at offset 0x477C and
represented by the macro "GAMSTLB_CTRL" in code.  However the register
formerly resided at offset 0xCF4C on Xe1-era platforms, and we also have
macro XEHP_GAMSTLB_CTRL that represents this old offset in the
unofficial/developer-only Xe1 code.  When tuning for the register was
added for Xe3p_LPG, the old Xe1-era macro was accidentally used instead
of the proper macro for Xe2 and beyond, causing the tuning to not be
applied properly.  Use the proper definition so that the correct offset
is written to.

Bspec: 59298
Fixes: 377c89bfaa ("drm/xe/xe3p_lpg: Set STLB bank hash mode to 4KB")
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20260410-xe3p_tuning-v1-2-e206a62ee38f@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 0b1676eafdd1ba5a5436bdca0d2a25ce56699783)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 11:28:10 -04:00
Gustavo Sousa
68fdf2c943
drm/xe/xe3p_lpg: Add missing indirect ring state feature flag
Even though commit 8fcb7dfb8b ("drm/xe/xe3p_lpg: Add support for
graphics IP 35.10") mentions that the support for Indirect Ring State
exists for Xe3p_LPG, it missed actually setting the feature flag in
graphics_xe3p_lpg.  Fix that by adding the missing member.

Fixes: 8fcb7dfb8b ("drm/xe/xe3p_lpg: Add support for graphics IP 35.10")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20260401-xe3p_lpg-indirect-ring-state-v1-1-0e4b5edf6898@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
(cherry picked from commit ec4f4970eb744fd7d6d135f40f5c83bd05982e72)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 11:28:02 -04:00
Matt Roper
9d7ca81b30
drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138
There appears to have been a silent merge conflict between some commits
updating the workaround tables on Xe's -fixes and -next branches:

 - Commit bc6387a2e0 ("drm/xe/xe2_hpg: Fix handling of Wa_14019988906
   & Wa_14019877138") from the fixes branch moved the Xe2_HPG instance
   of two workarounds touching the PSS_CHICKEN register from the
   engine_was[] table to the lrc_was[] table; the equivalent
   implementation for all other platforms/IPs were already properly
   located on lrc_was[].  This commit on the fixes branch is a
   cherry-pick of commit e04c609eed ("drm/xe/xe2_hpg: Fix handling of
   Wa_14019988906 & Wa_14019877138") that already existed on the next
   branch.

 - Commit 55b19abb6c ("drm/xe: Consolidate workaround entries for
   Wa_14019877138") and commit c2142a1a84 ("drm/xe: Consolidate
   workaround entries for Wa_14019988906") consolidated the individual
   entries per IP generation for each workaround into single, larger
   range-based entries.

During merge conflict resolution the Xe2_HPG-specific entries (i.e.,
those with rule "GRAPHICS_VERSION_RANGE(2001, 2002)") were accidentally
resurrected, even though the table already contains the consolidated
entries that match a superset of thse ranges.  These redundant entries
don't cause any build failures but do trigger a dmesg error during probe
on BMG-G21 devices:

  xe 0000:03:00.0: [drm] *ERROR* Tile0: GT0: discarding save-restore reg 7044 (clear: 00000400, set: 00000400, masked: yes, mcr: yes): ret=-22
  xe 0000:03:00.0: [drm] *ERROR* Tile0: GT0: discarding save-restore reg 7044 (clear: 00000020, set: 00000020, masked: yes, mcr: yes): ret=-22

Re-drop the Xe2_HPG-specific table entries to eliminate the error.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7433
Fixes: 17b95278ae ("Merge tag 'drm-xe-next-2026-03-02' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next")
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260401-wa_merge_conflict-v1-1-b477ab53fedc@intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
(cherry picked from commit c79bc999442ff3c0908ab8bce92b2a3cb7d59861)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 11:27:54 -04:00
Jonathan Cavitt
2bc0cce272
drm/xe/vm: Add missing pad and extensions check
Add missing pad and extensions check to xe_vm_get_property_ioctl

v2:
- Combine with other check (Auld)

Fixes: 50c577eab0 ("drm/xe/xe_vm: Implement xe_vm_get_property_ioctl")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260331181216.37775-2-jonathan.cavitt@intel.com
(cherry picked from commit 896070686b16cc45cca7854be2049923b2b303d3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 11:27:48 -04:00
Matthew Brost
a0fc362f09
drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()
xe_guc_submit_wedge() runs in the DMA-fence signaling path, where
GFP_KERNEL memory allocations are not permitted. However, registering
guc_submit_wedged_fini via drmm_add_action_or_reset() triggers such an
allocation.

Avoid this by moving the logic from guc_submit_wedged_fini() into
guc_submit_fini(), where wedged exec queue references are dropped during
normal teardown.

Fixes: 8ed9aaae39 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260326210116.202585-3-matthew.brost@intel.com
(cherry picked from commit 4a706bd93c4fb156a13477e26ffdf2e633edeb10)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-29 11:27:38 -04:00
DaeMyung Kang
c444139cb7 ksmbd: rewrite stop_sessions() with restartable iteration
stop_sessions() walks conn_list with hash_for_each() and, for every
entry, drops conn_list_lock across the transport ->shutdown() call
before re-acquiring the read lock to continue the loop.  The hash
walk relies on cross-iteration state (the current bucket and the
hlist position), which is not preserved across unlock/relock: if
another thread performs a list mutation during the unlocked window,
the ongoing iteration becomes unreliable and can re-visit
connections that have already been handled or skip connections that
have not.  The outer `if (!hash_empty(conn_list)) goto again;` retry
masks the symptom in the common case but does not address the
unsafe iteration itself.

Reframe the loop so it never relies on iterator state across
unlock/relock.  Under conn_list_lock held for read, pick the first
connection whose ->shutdown() has not yet been issued by this path,
pin it by taking an extra reference, record that fact on the
connection and mark it EXITING while still inside the locked walk,
then drop the lock.  Then call ->shutdown() outside the lock, drop
the pin (freeing the connection if the handler already released its
reference), and restart from the top.

Use a new per-connection flag, conn->stop_called, as the "shutdown
issued from stop_sessions()" marker rather than reusing the status
state.  ksmbd_conn_set_exiting() is also invoked by
ksmbd_sessions_deregister() on sibling channels of a multichannel
session without issuing a transport shutdown, so treating
KSMBD_SESS_EXITING as "already handled here" would skip connections
that still need shutdown() to wake their handler out of recv(),
leaving the outer retry waiting indefinitely for the hash to drain.
stop_sessions() is serialised by init_lock in
ksmbd_conn_transport_destroy(), so writing stop_called under the
read lock has no other writer.

Set EXITING inside the locked walk so the selection, the stop_called
marker, and the status transition all happen together, and guard
against regressing a connection that has already advanced to
KSMBD_SESS_RELEASING on its own (for example, if the handler exited
its receive loop for an unrelated reason between teardown steps).

When the pin drop is the last put, release the transport and pair
ida_destroy(&target->async_ida) with the ida_init() done in
ksmbd_conn_alloc(), so stop_sessions() retiring a connection on its
own does not leak the xarray backing of the embedded async_ida.

The outer retry with msleep() is kept to wait for handler threads to
reach ksmbd_conn_free() and drain the hash.

Observed with an instrumented build that logs one line per visit and
widens the unlocked window before ->shutdown() by 200 ms, under
five concurrent cifs mounts (nosharesock, one connection each):

  * Current code: the same connection address is revisited many
    times during a single stop_sessions() call and ->shutdown() is
    invoked well beyond the number of live connections before the
    hash finally drains.

  * Rewritten code: each live connection produces exactly one
    ->shutdown() call; the function returns as soon as the hash is
    empty.

Functional teardown via `ksmbd.control --shutdown` with the same
five mounts completes cleanly on the rewritten path.

Performance is observably unchanged.  Tearing down N concurrent
nosharesock cifs connections with `ksmbd.control --shutdown` +
`rmmod ksmbd` takes essentially the same wall time before and after
the rewrite:

    N        before        after
    10       4.93s         5.34s
    30       7.34s         7.03s
    50       7.31s         7.01s     (3-run avg: 7.04s vs 7.25s)
   100       6.98s         6.78s
   200       6.77s         6.89s

and the number of ->shutdown() calls equals the number of live
connections on both paths when the race is not widened.  The
teardown is dominated by the msleep(100)-based outer retry waiting
for handler threads to run ksmbd_conn_free(), not by the iteration
itself; the restartable loop's worst-case O(N^2) visit cost is in
the microseconds even at N=200 and sits far below the msleep(100)
granularity.

Applied alone on top of ksmbd-for-next-next, this patch does not
introduce a new leak site.  Under the same reproducer (10x
concurrent-holders + ss -K + ksmbd.control --shutdown + rmmod), the
tree still shows the pre-existing per-connection transport leak
count that arises when the last refcount drop lands in one of
ksmbd_conn_r_count_dec(), __free_opinfo() or session_fd_check() -
all of which end with a bare kfree() today.  kmemleak backtraces
for the unreferenced objects point into the TCP accept path
(sk_clone -> inet_csk_clone_lock, sock_alloc_inode) and none
involve stop_sessions().  Plugging those bare-kfree sites is the
responsibility of the follow-up patch.

Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-29 10:25:37 -05:00
Marios Makassikis
ab4ad35e58 smb: server: handle readdir_info_level_struct_sz() error
early exit in smb2_populate_readdir_entry() if the requested info_level
is unknown.

Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-29 10:25:37 -05:00
Timur Kristóf
019155e2bd drm/amd/display: Use EDID from VBIOS embedded panel info
When an embedded panel has no DDC, read the EDID from
the VBIOS embedded panel info and use that.

Fixes: 7c7f5b15be ("drm/amd/display: Refactor edid read.")
Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/5192
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 399b9abc353c62f6e37d38325edbdb6c2c00411c)
2026-04-29 10:41:46 -04:00
Timur Kristóf
9ea16f6418 drm/amd/display: Read EDID from VBIOS embedded panel info
Some board manufacturers hardcode the EDID for the embedded
panel in the VBIOS. This EDID should be used when the panel
doesn't have a DDC.

For reference, see the legacy non-DC display code:
amdgpu_atombios_encoder_get_lcd_info()

This is necessary to support embedded connectors without DDC.

Fixes: 4562236b3b ("drm/amd/dc: Add dc display driver (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/5192
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit eb105e63b474c11ef6a84a1c6b18100d851ff364)
2026-04-29 10:41:40 -04:00