linux/drivers/infiniband/core
Zhu Yanjun 86dfdd8288 RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency
In the commit aee2424246 ("RDMA/iwcm: Fix a use-after-free related to
destroying CM IDs"), the function flush_workqueue is invoked to flush the
work queue iwcm_wq.

But at that time, the work queue iwcm_wq was created via the function
alloc_ordered_workqueue without the flag WQ_MEM_RECLAIM.

Because the current process is trying to flush the whole iwcm_wq, if
iwcm_wq doesn't have the flag WQ_MEM_RECLAIM, verify that the current
process is not reclaiming memory or running on a workqueue which doesn't
have the flag WQ_MEM_RECLAIM as that can break forward-progress guarantee
leading to a deadlock.

The call trace is as below:

[  125.350876][ T1430] Call Trace:
[  125.356281][ T1430]  <TASK>
[ 125.361285][ T1430] ? __warn (kernel/panic.c:693)
[ 125.367640][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.375689][ T1430] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 125.382505][ T1430] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 125.388987][ T1430] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[ 125.395831][ T1430] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 125.403125][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.410984][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.418764][ T1430] __flush_workqueue (kernel/workqueue.c:3970)
[ 125.426021][ T1430] ? __pfx___might_resched (kernel/sched/core.c:10151)
[ 125.433431][ T1430] ? destroy_cm_id (drivers/infiniband/core/iwcm.c:375) iw_cm
[ 125.441209][ T1430] ? __pfx___flush_workqueue (kernel/workqueue.c:3910)
[ 125.473900][ T1430] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 125.473909][ T1430] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161)
[ 125.482537][ T1430] _destroy_id (drivers/infiniband/core/cma.c:2044) rdma_cm
[ 125.495072][ T1430] nvme_rdma_free_queue (drivers/nvme/host/rdma.c:656 drivers/nvme/host/rdma.c:650) nvme_rdma
[ 125.505827][ T1430] nvme_rdma_reset_ctrl_work (drivers/nvme/host/rdma.c:2180) nvme_rdma
[ 125.505831][ T1430] process_one_work (kernel/workqueue.c:3231)
[ 125.515122][ T1430] worker_thread (kernel/workqueue.c:3306 kernel/workqueue.c:3393)
[ 125.515127][ T1430] ? __pfx_worker_thread (kernel/workqueue.c:3339)
[ 125.531837][ T1430] kthread (kernel/kthread.c:389)
[ 125.539864][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.550628][ T1430] ret_from_fork (arch/x86/kernel/process.c:147)
[ 125.558840][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.558844][ T1430] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[  125.566487][ T1430]  </TASK>
[  125.566488][ T1430] ---[ end trace 0000000000000000 ]---

Fixes: aee2424246 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs")
Link: https://patch.msgid.link/r/20240820113336.19860-1-yanjun.zhu@linux.dev
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-08-23 11:44:43 -03:00
..
addr.c inet: introduce dst_rtable() helper 2024-04-30 18:32:38 -07:00
agent.c RDMA/core: Create GSI QP only when CM is supported 2024-07-01 15:38:05 +03:00
agent.h
cache.c RDMA/cache: Release GID table even if leak is detected 2024-06-21 10:17:44 -03:00
cgroup.c
cm_msgs.h RDMA/core: Add necessary spaces 2021-04-12 14:52:22 -03:00
cm_trace.c RDMA/cm: Replace pr_debug() call sites with tracepoints 2020-08-24 19:41:41 -03:00
cm_trace.h trace: Relocate event helper files 2022-12-10 11:01:12 -05:00
cm.c RDMA/cm: Print the old state when cm_destroy_id gets timeout 2024-04-01 15:16:36 +03:00
cma_configfs.c RDMA/cma: Fix truncation compilation warning in make_cma_ports 2023-09-18 11:52:10 +03:00
cma_priv.h RDMA/core: Add an rb_tree that stores cm_ids sorted by ifindex and remote IP 2022-06-16 09:54:35 +03:00
cma_trace.c
cma_trace.h tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
cma.c RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw 2024-05-12 13:04:11 +03:00
core_priv.h RDMA/core: Remove unused declaration rdma_resolve_ip_route() 2024-08-19 15:19:52 -03:00
counters.c RDMA/counter: Add optional counter support 2021-10-12 12:48:05 -03:00
cq.c RDMA/core: Delete useless module.h include 2022-01-28 13:03:12 -04:00
device.c RDMA: Fix netdev tracker in ib_device_set_netdev 2024-07-14 10:32:57 +03:00
ib_core_uverbs.c
iwcm.c RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency 2024-08-23 11:44:43 -03:00
iwcm.h RDMA/core: Use refcount_t instead of atomic_t on refcount of iwcm_id_private 2021-06-08 14:35:44 -03:00
iwpm_msg.c RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure that requests are valid 2021-07-30 10:01:41 -03:00
iwpm_util.c RDMA: Remove unnecessary NULL values 2023-08-07 16:56:57 +03:00
iwpm_util.h RDMA/core: Delete useless module.h include 2022-01-28 13:03:12 -04:00
lag.c RDMA/core: Remove NULL check before dev_{put, hold} 2024-05-05 15:12:35 +03:00
mad_priv.h RDMA/core: Remove refcount from struct ib_mad_snoop_private 2021-06-08 14:43:28 -03:00
mad_rmpp.c RDMA/core: Remove redundant spaces 2021-04-12 14:56:48 -03:00
mad_rmpp.h
mad.c RDMA/mad: Simplify an alloc_ordered_workqueue() invocation 2024-08-23 11:37:49 -03:00
Makefile RDMA/umem: Support importing dma-buf as user memory region 2021-01-20 16:07:52 -04:00
mr_pool.c
multicast.c RDMA/core: Use refcount_t instead of atomic_t on refcount of mcast_port 2021-06-08 14:45:07 -03:00
netlink.c RDMA: Remove unnecessary ternary operators 2023-07-31 15:16:12 +03:00
nldev.c RDMA/nldev: Enhance netlink message parsing and validation 2024-08-11 11:12:49 +03:00
opa_smi.h RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
packer.c
rdma_core.c RDMA: Correct duplicated words in comments 2022-06-24 16:52:28 -03:00
rdma_core.h IB/uverbs: Introduce create/destroy QP commands over ioctl 2020-05-21 20:39:36 -03:00
restrack.c RDMA/core: Add an option to display driver-specific QPs in the rdmatool 2024-04-30 11:19:29 +03:00
restrack.h RDMA/restrack: Improve readability in task name management 2020-09-22 19:47:35 -03:00
roce_gid_mgmt.c RDMA/core: Remove NULL check before dev_{put, hold} 2024-05-05 15:12:35 +03:00
rw.c RDMA/core: Fix a couple of obvious typos in comments 2023-10-04 21:55:44 +03:00
sa_query.c RDMA/core: Use size_{add,sub,mul}() in calls to struct_size() 2023-09-19 10:33:45 +03:00
sa.h RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
security.c IB/core: Removed port validity check from ib_get_cached_subnet_prefix 2021-06-21 20:49:32 -03:00
smi.c RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
smi.h RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
sysfs.c IB/core: Add support for XDR link speed 2023-09-26 12:38:39 +03:00
trace.c RDMA/core: Clean up tracepoint headers 2020-07-06 14:54:46 -03:00
ucma.c infiniband: Remove the now superfluous sentinel element from ctl_table array 2023-10-11 12:16:13 -07:00
ud_header.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
umem_dmabuf.c RDMA/umem: Introduce an option to revoke DMABUF umem 2024-08-11 11:12:49 +03:00
umem_odp.c Linux 6.0 2022-10-06 19:48:45 -03:00
umem.c RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz 2023-12-04 20:02:41 -04:00
user_mad.c RDMA/core: Create "issm*" device nodes only when SMI is supported 2024-07-01 15:10:15 +03:00
uverbs_cmd.c RDMA: Pass entire uverbs attr bundle to create cq function 2024-06-27 16:28:21 -03:00
uverbs_ioctl.c RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings 2024-03-03 15:38:44 +02:00
uverbs_main.c RDMA/core: Support IB sub device with type "SMI" 2024-07-01 15:38:04 +03:00
uverbs_marshall.c RDMA/core: Don't infoleak GRH fields 2022-01-05 16:30:19 -04:00
uverbs_std_types_async_fd.c RDMA/core: Make FD destroy callback void 2020-11-12 12:32:17 -04:00
uverbs_std_types_counters.c IB/uverbs: Fix an potential error pointer dereference 2023-08-07 16:49:59 +03:00
uverbs_std_types_cq.c RDMA: Pass entire uverbs attr bundle to create cq function 2024-06-27 16:28:21 -03:00
uverbs_std_types_device.c IB/core: Add support for XDR link speed 2023-09-26 12:38:39 +03:00
uverbs_std_types_dm.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types_flow_action.c RDMA/core: Delete IPsec flow action logic from the core 2022-04-09 08:25:06 +03:00
uverbs_std_types_mr.c RDMA: Pass uverbs_attr_bundle as part of '.reg_user_mr_dmabuf' API 2024-08-11 11:12:50 +03:00
uverbs_std_types_qp.c IB/uverbs: fix the typo of optional 2022-10-19 09:46:45 +03:00
uverbs_std_types_srq.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types_wq.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types.c RDMA/core: Make FD destroy callback void 2020-11-12 12:32:17 -04:00
uverbs_uapi.c RDMA/uverbs: Check for null return of kmalloc_array 2022-01-05 14:16:53 -04:00
uverbs.h RDMA/core: Use refcount_t instead of atomic_t on refcount of ib_uverbs_device 2021-06-08 15:04:36 -03:00
verbs.c IB/core: add support for draining Shared receive queues 2024-06-26 10:53:29 -03:00