linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-04 12:35:52 +02:00

Author	SHA1	Message	Date
David Howells	daeb443b92	netfs: Defer the emission of trace_netfs_folio() Change netfs_perform_write() to keep the netfs_folio trace value in a variable and emit it later to make it easier to choose the value displayed. This is a prerequisite for a subsequent patch. Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-13-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:31 +02:00
David Howells	156ac2ec2e	netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone If a streaming write is made, this will leave the relevant modified folio in a not-uptodate, but dirty state with a netfs_folio struct hung off of folio->private indicating the dirty range. Subsequently truncating the file such that the dirty data in the folio is removed, but the first part of the folio theoretically remains will cause the netfs_folio struct to be discarded... but will leave the dirty flag set. If the folio is then read via mmap(), netfs_read_folio() will see that the page is dirty and jump to netfs_read_gaps() to fill in the missing bits. netfs_read_gaps(), however, expects there to be a netfs_folio struct present and can oops because truncate removed it. Fix this by calling folio_cancel_dirty() in netfs_invalidate_folio() in the event that all the dirty data in the folio is erased (as nfs does). Also add some tracepoints to log modifications to a dirty page. This can be reproduced with something like: dd if=/dev/zero of=/xfstest.test/foo bs=1M count=1 umount /xfstest.test mount /xfstest.test xfs_io -c "w 0xbbbf 0xf96c" \ -c "truncate 0xbbbf" \ -c "mmap -r 0xb000 0x11000" \ -c "mr 0xb000 0x11000" \ /xfstest.test/foo with fscaching disabled (otherwise streaming writes are suppressed) and a change to netfs_perform_write() to disallow streaming writes if the fd is open O_RDWR: if (//(file->f_mode & FMODE_READ) \|\| <--- comment this out netfs_is_cache_enabled(ctx)) { It should be reproducible even without this change, but if prevents the above trivial xfs_io command from reproducing it. Note that the initial dd is important: the file must start out sufficiently large that the zero-point logic doesn't just clear the gaps because it knows there's nothing in the file to read yet. Unmounting and mounting is needed to clear the pagecache (there are other ways to do that that may also work). This was initially reproduced with the generic/522 xfstest on some patches that remove the FMODE_READ restriction. Fixes: `9ebff83e64` ("netfs: Prep to use folio->private for write grouping and streaming write") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-12-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:31 +02:00
David Howells	0ef37eef83	netfs: Fix overrun check in netfs_extract_user_iter() Fix netfs_extract_user_iter() so that if iov_iter_extract_pages() overfills pages[], then those pages don't get included in the iterator constructed at the end of the function. If there was an overfill, memory corruption has already happened. Fixes: `85dd2c8ff3` ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Closes: https://sashiko.dev/#/patchset/20260427154639.180684-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-11-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:30 +02:00
Paulo Alcantara	0aad5704c6	netfs: fix error handling in netfs_extract_user_iter() In netfs_extract_user_iter(), if iov_iter_extract_pages() failed to extract user pages, bail out on -ENOMEM, otherwise return the error code only if @npages == 0, allowing short DIO reads and writes to be issued. This fixes mmapstress02 from LTP tests against CIFS. Fixes: `85dd2c8ff3` ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-10-dhowells@redhat.com Cc: netfs@lists.linux.dev Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:30 +02:00
David Howells	7e3d8db899	netfs: Fix potential uninitialised var in netfs_extract_user_iter() In netfs_extract_user_iter(), if it's given a zero-length iterator, it will fall through the loop without setting ret, and so the error handling behaviour will be undefined, depending on whether ret happens to be negative. The value of ret then propagates back up the callstack. Fix this by presetting ret to 0. Fixes: `85dd2c8ff3` ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-9-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:30 +02:00
Viacheslav Dubeyko	dc7832d05d	netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call The multiple runs of generic/013 test-case is capable to reproduce a kernel BUG at mm/filemap.c:1504 with probability of 30%. while true; do sudo ./check generic/013 done [ 9849.452376] page: refcount:3 mapcount:0 mapping:00000000e58ff252 index:0x10781 pfn:0x1c322 [ 9849.452412] memcg:ffff8881a1915800 [ 9849.452417] aops:ceph_aops ino:1000058db9e dentry name(?):"f9XXXXXX" [ 9849.452432] flags: 0x17ffffc0000000(node=0\|zone=2\|lastcpupid=0x1fffff) [ 9849.452441] raw: 0017ffffc0000000 0000000000000000 dead000000000122 ffff88816110d248 [ 9849.452445] raw: 0000000000010781 0000000000000000 00000003ffffffff ffff8881a1915800 [ 9849.452447] page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) [ 9849.452474] ------------[ cut here ]------------ [ 9849.452476] kernel BUG at mm/filemap.c:1504! [ 9849.478635] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI [ 9849.481772] CPU: 2 UID: 0 PID: 84223 Comm: fsstress Not tainted 7.0.0-rc1+ #18 PREEMPT(full) [ 9849.482881] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-9.fc43 06/1 0/2025 [ 9849.484539] RIP: 0010:folio_unlock+0x85/0xa0 [ 9849.485076] Code: 89 df 31 f6 e8 1c f3 ff ff 48 8b 5d f8 c9 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc 48 c7 c6 80 6c d9 a7 48 89 df e8 4b b3 10 00 <0f> 0b 48 89 df e8 21 e6 2c 00 eb 9d 0f 1f 40 00 66 66 2e 0f 1f 84 [ 9849.493818] RSP: 0018:ffff8881bb8076b0 EFLAGS: 00010246 [ 9849.495740] RAX: 0000000000000000 RBX: ffffea00070c8980 RCX: 0000000000000000 [ 9849.498678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 9849.500559] RBP: ffff8881bb8076b8 R08: 0000000000000000 R09: 0000000000000000 [ 9849.501097] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000010782000 [ 9849.502108] R13: ffff8881935de738 R14: ffff88816110d010 R15: 0000000000001000 [ 9849.502516] FS: 00007e36cbe94740(0000) GS:ffff88824a899000(0000) knlGS:0000000000000000 [ 9849.502996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9849.503810] CR2: 000000c0002b0000 CR3: 000000011bbf6004 CR4: 0000000000772ef0 [ 9849.504459] PKRU: 55555554 [ 9849.504626] Call Trace: [ 9849.505242] <TASK> [ 9849.505379] netfs_write_begin+0x7c8/0x10a0 [ 9849.505877] ? __kasan_check_read+0x11/0x20 [ 9849.506384] ? __pfx_netfs_write_begin+0x10/0x10 [ 9849.507178] ceph_write_begin+0x8c/0x1c0 [ 9849.507934] generic_perform_write+0x391/0x8f0 [ 9849.508503] ? __pfx_generic_perform_write+0x10/0x10 [ 9849.509062] ? file_update_time_flags+0x19a/0x4b0 [ 9849.509581] ? ceph_get_caps+0x63/0xf0 [ 9849.510259] ? ceph_get_caps+0x63/0xf0 [ 9849.510530] ceph_write_iter+0xe79/0x1ae0 [ 9849.511282] ? __pfx_ceph_write_iter+0x10/0x10 [ 9849.511839] ? lock_acquire+0x1ad/0x310 [ 9849.512334] ? ksys_write+0xf9/0x230 [ 9849.512582] ? lock_is_held_type+0xaa/0x140 [ 9849.513128] vfs_write+0x512/0x1110 [ 9849.513634] ? __fget_files+0x33/0x350 [ 9849.513893] ? __pfx_vfs_write+0x10/0x10 [ 9849.514143] ? mutex_lock_nested+0x1b/0x30 [ 9849.514394] ksys_write+0xf9/0x230 [ 9849.514621] ? __pfx_ksys_write+0x10/0x10 [ 9849.514887] ? do_syscall_64+0x25e/0x1520 [ 9849.515122] ? __kasan_check_read+0x11/0x20 [ 9849.515366] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.515655] __x64_sys_write+0x72/0xd0 [ 9849.515885] ? trace_hardirqs_on+0x24/0x1c0 [ 9849.516130] x64_sys_call+0x22f/0x2390 [ 9849.516341] do_syscall_64+0x12b/0x1520 [ 9849.516545] ? do_syscall_64+0x27c/0x1520 [ 9849.516783] ? do_syscall_64+0x27c/0x1520 [ 9849.517003] ? lock_release+0x318/0x480 [ 9849.517220] ? __x64_sys_io_getevents+0x143/0x2d0 [ 9849.517479] ? percpu_ref_put_many.constprop.0+0x8f/0x210 [ 9849.517779] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9849.518073] ? do_syscall_64+0x25e/0x1520 [ 9849.518291] ? __kasan_check_read+0x11/0x20 [ 9849.518519] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.518799] ? do_syscall_64+0x27c/0x1520 [ 9849.519024] ? local_clock_noinstr+0xf/0x120 [ 9849.519262] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9849.519544] ? do_syscall_64+0x25e/0x1520 [ 9849.519781] ? __kasan_check_read+0x11/0x20 [ 9849.520008] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.520273] ? do_syscall_64+0x27c/0x1520 [ 9849.520491] ? trace_hardirqs_on_prepare+0x178/0x1c0 [ 9849.520767] ? irqentry_exit+0x10c/0x6c0 [ 9849.520984] ? trace_hardirqs_off+0x86/0x1b0 [ 9849.521224] ? exc_page_fault+0xab/0x130 [ 9849.521472] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9849.521766] RIP: 0033:0x7e36cbd14907 [ 9849.521989] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 9849.523057] RSP: 002b:00007ffff2d2a968 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 9849.523484] RAX: ffffffffffffffda RBX: 000000000000e549 RCX: 00007e36cbd14907 [ 9849.523885] RDX: 000000000000e549 RSI: 00005bd797ec6370 RDI: 0000000000000004 [ 9849.524277] RBP: 0000000000000004 R08: 0000000000000047 R09: 00005bd797ec6370 [ 9849.524652] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000000049 [ 9849.525062] R13: 0000000010781a37 R14: 00005bd797ec6370 R15: 0000000000000000 [ 9849.525447] </TASK> [ 9849.525574] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass ghash_clmulni_intel aesni_intel input_leds rapl mac_hid psmouse vga16fb serio_raw vgastate floppy i2c_piix4 bochs qemu_fw_cfg i2c_smbus pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 9849.529150] ---[ end trace 0000000000000000 ]--- [ 9849.529502] RIP: 0010:folio_unlock+0x85/0xa0 [ 9849.530813] Code: 89 df 31 f6 e8 1c f3 ff ff 48 8b 5d f8 c9 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc 48 c7 c6 80 6c d9 a7 48 89 df e8 4b b3 10 00 <0f> 0b 48 89 df e8 21 e6 2c 00 eb 9d 0f 1f 40 00 66 66 2e 0f 1f 84 [ 9849.534986] RSP: 0018:ffff8881bb8076b0 EFLAGS: 00010246 [ 9849.536198] RAX: 0000000000000000 RBX: ffffea00070c8980 RCX: 0000000000000000 [ 9849.537718] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 9849.539321] RBP: ffff8881bb8076b8 R08: 0000000000000000 R09: 0000000000000000 [ 9849.540862] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000010782000 [ 9849.542438] R13: ffff8881935de738 R14: ffff88816110d010 R15: 0000000000001000 [ 9849.543996] FS: 00007e36cbe94740(0000) GS:ffff88824b899000(0000) knlGS:0000000000000000 [ 9849.545854] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9849.547092] CR2: 00007e36cb3ff000 CR3: 000000011bbf6006 CR4: 0000000000772ef0 [ 9849.548679] PKRU: 55555554 The race sequence: 1. Read completes -> netfs_read_collection() runs 2. netfs_wake_rreq_flag(rreq, NETFS_RREQ_IN_PROGRESS, ...) 3. netfs_wait_for_read() returns -EFAULT to netfs_write_begin() 4. The netfs_unlock_abandoned_read_pages() unlocks the folio 5. netfs_write_begin() calls folio_unlock(folio) -> VM_BUG_ON_FOLIO() The key reason of the issue that netfs_unlock_abandoned_read_pages() doesn't check the flag NETFS_RREQ_NO_UNLOCK_FOLIO and executes folio_unlock() unconditionally. This patch implements in netfs_unlock_abandoned_read_pages() logic similar to netfs_unlock_read_folio(). Fixes: `ee4cdf7ba8` ("netfs: Speed up buffered reading") Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-8-dhowells@redhat.com Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: Ceph Development <ceph-devel@vger.kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:30 +02:00
David Howells	4543a4d737	netfs: Fix zeropoint update where i_size > remote_i_size Fix the update of the zero point[] by netfs_release_folio() when there is uncommitted data in the pagecache beyond the folio being released but the on-server EOF is in this folio (ie. i_size > remote_i_size). The update needs to limit zero_point to remote_i_size, not i_size as i_size is a local phenomenon reflecting updates made locally to the pagecache, not stuff written to the server. remote_i_size tracks the server's i_size. [] The zero point is the file position from which we can assume that the server will just return zeros, so we can avoid generating reads. Note that netfs_invalidate_folio() probably doesn't need fixing as zero_point should be updated by setattr after truncation or fallocate. Found with: fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \ /xfstest.test/junk --replay-ops=junk.fsxops using the following as junk.fsxops: truncate 0x0 0x1bbae 0x82864 write 0x3ef2e 0xf9c8 0x1bbae write 0x67e05 0xcb5a 0x4e8f6 mapread 0x57781 0x85b6 0x7495f copy_range 0x5d3d 0x10329 0x54fac 0x7495f write 0x64710 0x1c2b 0x7495f mapread 0x64000 0x1000 0x7495f on cifs with the default cache option. It shows read-gaps on folio 0x64 failing with a short read (ie. it hits EOF) if the FMODE_READ check is commented out in netfs_perform_write(): if (//(file->f_mode & FMODE_READ) \|\| netfs_is_cache_enabled(ctx)) { and no fscache. This was initially found with the generic/522 xfstest. Fixes: `cce6bfa6ca` ("netfs: Fix trimming of streaming-write folios in netfs_inval_folio()") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-7-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:30 +02:00
David Howells	2c8f4742bb	netfs: Fix potential for tearing in ->remote_i_size and ->zero_point Fix potential tearing in using ->remote_i_size and ->zero_point by copying i_size_read() and i_size_write() and using the same seqcount as for i_size. We need to make sure that netfslib and the filesystems that use it always hold i_lock whilst updating any of the sizes to prevent i_size_seqcount from getting corrupted. Fixes: `4058f74210` ("netfs: Keep track of the actual remote file size") Fixes: `100ccd18bb` ("netfs: Optimise away reads above the point at which there can be no data") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-6-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:30 +02:00
David Howells	8a8c0cfdf4	netfs: Fix netfs_read_to_pagecache() to pause on subreq failure Fix netfs_read_to_pagecache() so that it pauses the generation of new subrequests if an already-issued subrequest fails. Fixes: `ee4cdf7ba8` ("netfs: Speed up buffered reading") Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-5-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:29 +02:00
David Howells	b5782e2d46	netfs: Fix missing barriers when accessing stream->subrequests locklessly The list of subrequests attached to stream->subrequests is accessed without locks by netfs_collect_read_results() and netfs_collect_write_results(), and then they access subreq->flags without taking a barrier after getting the subreq pointer from the list. Relatedly, the functions that build the list don't use any sort of write barrier when constructing the list to make sure that the NETFS_SREQ_IN_PROGRESS flag is perceived to be set first if no lock is taken. Fix this by: (1) Add a new list_add_tail_release() function that uses a release barrier to set the pointer to the new member of the list. (2) Add a new list_first_entry_or_null_acquire() function that uses an acquire barrier to read the pointer to the first member in a list (or return NULL). (3) Use list_add_tail_release() when adding a subreq to ->subrequests. (4) Use list_first_entry_or_null_acquire() when initially accessing the front of the list (when an item is removed, the pointer to the new front iterm is obtained under the same lock). Fixes: `e2d46f2ec3` ("netfs: Change the read result collector to only use one work item") Fixes: `288ace2f57` ("netfs: New writeback implementation") Link: https://sashiko.dev/#/patchset/20260326104544.509518-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-4-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:29 +02:00
David Howells	cce18c263e	netfs: Fix missing locking around retry adding new subreqs Fix netfs_retry_read_subrequests() and netfs_retry_write_stream() to take the appropriate lock when adding extra subrequests into stream->subrequests. Fixes: `e2d46f2ec3` ("netfs: Change the read result collector to only use one work item") Fixes: `288ace2f57` ("netfs: New writeback implementation") Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-3-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:29 +02:00
David Howells	6f0f7ac191	netfs: Fix cancellation of a DIO and single read subrequests When the preparation of a new subrequest for a read fails, if the subrequest has already been added to the stream->subrequests list, it can't simply be put and abandoned as the collector may see it. Also, if it hasn't been queued yet, it has two outstanding refs that both need to be put. Both DIO read and single-read dispatch fail at this; further, both differ in the order they do things to the way buffered read works. Fix cancellation of both DIO-read and single-read subrequests that failed preparation by the following steps: (1) Harmonise all three reads (buffered, dio, single) to queue the subreq before prepping it. (2) Make all three call netfs_queue_read() to do the queuing. (3) Set NETFS_RREQ_ALL_QUEUED independently of the queuing as we don't know the length of the subreq at this point. (4) In all cases, set the error and NETFS_SREQ_FAILED flag on the subreq and then call netfs_read_subreq_terminated() to deal with it. This will pass responsibility off to the collector for dealing with it. Fixes: `e2d46f2ec3` ("netfs: Change the read result collector to only use one work item") Closes: https://sashiko.dev/#/patchset/20260425125426.3855807-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-2-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:42:29 +02:00
Breno Leitao	859c199bb3	fs/select: reject negative timeval components in kern_select() kern_select() normalises the user-supplied struct __kernel_old_timeval with tv.tv_sec + (tv.tv_usec / USEC_PER_SEC) (tv.tv_usec % USEC_PER_SEC) * NSEC_PER_USEC before calling poll_select_set_timeout() -> timespec64_valid(). Both operands of the seconds sum are unbounded user-controlled signed long. A crafted pair where tv_usec is a negative multiple of USEC_PER_SEC drives the sum across the wrap boundary - e.g. { .tv_sec = LONG_MIN, .tv_usec = -1000000 } yields sec = LONG_MAX, nsec = 0, which passes timespec64_valid() and then flows through timespec64_add_safe(), which saturates the absolute deadline to TIME64_MAX (clamped further to KTIME_MAX downstream). select(2) therefore blocks effectively forever instead of returning -EINVAL as POSIX requires for a negative timeout. Only the legacy __NR_select syscall takes this path. pselect6, ppoll, poll and epoll_pwait2 all hand the user's two fields directly to poll_select_set_timeout(), which validates before doing any arithmetic: /* fs/select.c:271 -- the validator / int poll_select_set_timeout(struct timespec64 to, time64_t sec, long nsec) { struct timespec64 ts = {.tv_sec = sec, .tv_nsec = nsec}; if (!timespec64_valid(&ts)) return -EINVAL; ... } /* include/linux/time64.h:97 -- timespec64_valid / if (ts->tv_sec < 0) return false; if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC) return false; / fs/select.c:744 do_pselect() (pselect6, pselect6_time32) / if (get_timespec64(&ts, tsp)) return -EFAULT; if (poll_select_set_timeout(to, ts.tv_sec, ts.tv_nsec)) return -EINVAL; / fs/select.c:1097 ppoll / if (get_timespec64(&ts, tsp)) return -EFAULT; if (poll_select_set_timeout(to, ts.tv_sec, ts.tv_nsec)) return -EINVAL; / fs/select.c:1065 poll -- timeout_msecs is int; >= 0 gates the math / if (timeout_msecs >= 0) poll_select_set_timeout(to, timeout_msecs / MSEC_PER_SEC, NSEC_PER_MSEC (timeout_msecs % MSEC_PER_SEC)); /* fs/eventpoll.c:2512 epoll_pwait2 */ if (get_timespec64(&ts, timeout)) return -EFAULT; if (poll_select_set_timeout(to, ts.tv_sec, ts.tv_nsec)) return -EINVAL; In every one of these the wrap-prone arithmetic from kern_select() simply does not exist; the user fields reach timespec64_valid() unmodified. glibc routes the C-library select() through pselect6, so the bug is reachable only via a direct syscall(__NR_select, ...). The pre-validation negative check that used to live here was lost when the syscall was switched to the poll_select_set_timeout() helper. Restore it: reject tv_sec < 0 \|\| tv_usec < 0 up front, mirroring what glibc does in userspace. do_compat_select() has the same arithmetic pattern but is only reachable on 32-bit compat and from a different syscall entry; left for a follow-up so this change stays minimal. Reproducer (returns -1/EINVAL on a fixed kernel; blocks indefinitely on an unfixed one): struct timeval tv = { .tv_sec = LONG_MIN, .tv_usec = -1000000 }; fd_set r; int pfd[2]; pipe(pfd); FD_ZERO(&r); FD_SET(pfd[0], &r); syscall(__NR_select, pfd[0] + 1, &r, NULL, NULL, &tv); Fixes: `4d36a9e65d` ("select: deal with math overflow from borderline valid userland data") Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260429-timeval-v1-1-4448e2588bbf@debian.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-12 14:41:40 +02:00
Paolo Abeni	93d809adc1	Merge branch 'vsock-virtio-fix-vsockmon-tap-skb-construction' Stefano Garzarella says: ==================== vsock/virtio: fix vsockmon tap skb construction While reviewing the patch posted by Yiqi Sun [1] to fix an issue in virtio_transport_build_skb(), I discovered another issue related to the offset and length of the payload to be copied in the new skb. This was introduced when we did the skb conversion, and fixed by patch 1. Patch 2 fixes the issue found by Yiqi Sun in a different way: using iov_iter_kvec() to properly initialize all the iov_iter fields and removing the linear vs non-linear split like we alredy do in vhost-vsock. It could have been a single patch, but since there were two affected commits, I decided to keep the fixes separate. [1] https://lore.kernel.org/netdev/20260430071110.380509-1-sunyiqixm@gmail.com/ ==================== Link: https://patch.msgid.link/20260508164411.261440-1-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 12:52:18 +02:00
Stefano Garzarella	3a3e3d90cb	vsock/virtio: fix empty payload in tap skb for non-linear buffers For non-linear skbs, virtio_transport_build_skb() goes through virtio_transport_copy_nonlinear_skb() to copy the original payload in the new skb to be delivered to the vsockmon tap device. This manually initializes an iov_iter but does not set iov_iter.count. Since the iov_iter is zero-initialized, the copy length is zero and no payload is actually copied to the monitor interface, leaving data un-initialized. Fix this by removing the linear vs non-linear split and using skb_copy_datagram_iter() with iov_iter_kvec() for all cases, as vhost-vsock already does. This handles both linear and non-linear skbs, properly initializes the iov_iter, and removes the now unused virtio_transport_copy_nonlinear_skb(). While touching this code, let's also check the return value of skb_copy_datagram_iter(), even though it's unlikely to fail. Fixes: `4b0bf10eb0` ("vsock/virtio: non-linear skb handling for tap") Reported-by: Yiqi Sun <sunyiqixm@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Arseniy Krasnov <avkrasnov@rulkc.org> Link: https://patch.msgid.link/20260508164411.261440-3-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 12:52:15 +02:00
Stefano Garzarella	5f344d809e	vsock/virtio: fix length and offset in tap skb for split packets virtio_transport_build_skb() builds a new skb to be delivered to the vsockmon tap device. To build the new skb, it uses the original skb data length as payload length, but as the comment notes, the original packet stored in the skb may have been split in multiple packets, so we need to use the length in the header, which is correctly updated before the packet is delivered to the tap, and the offset for the data. This was also similar to what we did before commit `71dc9ec9ac` ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") where we probably missed something during the skb conversion. Also update the comment above, which was left stale by the skb conversion and still mentioned a buffer pointer that no longer exists. Fixes: `71dc9ec9ac` ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Arseniy Krasnov <avkrasnov@rulkc.org> Link: https://patch.msgid.link/20260508164411.261440-2-sgarzare@redhat.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 12:52:15 +02:00
Quan Sun	911f54771c	net: hsr: fix NULL pointer dereference in hsr_get_node_data() In the HSR (High-availability Seamless Redundancy) protocol, node information is maintained in the node_db. When a supervision frame is received, node->addr_B_port is updated to track the receiving port type (e.g., HSR_PT_SLAVE_B). If the underlying physical interface associated with this slave port is removed (e.g., via `ip link del`), hsr_del_port() frees the hsr_port object. However, the stale node->addr_B_port reference is kept in the node_db until the node ages out. Subsequently, if userspace queries the node status via the Netlink command HSR_C_GET_NODE_STATUS, the kernel calls hsr_get_node_data(). This function unconditionally dereferences the pointer returned by hsr_port_get_hsr(): if (node->addr_B_port != HSR_PT_NONE) { port = hsr_port_get_hsr(hsr, node->addr_B_port); *addr_b_ifindex = port->dev->ifindex; // <-- NULL deref } If the slave port has been deleted, hsr_port_get_hsr() returns NULL, resulting in a kernel panic. Oops: general protection fault, probably for non-canonical address KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] RIP: 0010:hsr_get_node_data+0x7b6/0x9e0 Call Trace: <TASK> hsr_get_node_status+0x445/0xa40 Fix this by adding a proper NULL pointer check. If the port lookup fails due to a stale port type, gracefully treat it as if no valid port exists and assign -1 to the interface index. Steps to reproduce: 1. Create an HSR interface with two slave devices. 2. Receive a supervision frame to populate node_db with addr_B_port assigned to SLAVE_B. 3. Delete the underlying slave device B. 4. Send an HSR_C_GET_NODE_STATUS Netlink message. Fixes: `c5a7591172` ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Quan Sun <2022090917019@std.uestc.edu.cn> Link: https://patch.msgid.link/20260508124636.1462346-1-2022090917019@std.uestc.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-05-12 12:28:34 +02:00
Chaitanya Kumar Borah	1ae15b6c79	drm/i915/dp: Fix VSC dynamic range signaling for RGB formats For RGB, set dynamic_range to CTA or VESA based on crtc_state->limited_color_range so sinks apply correct quantization. YCbCr remains limited (CTA) range. (DP v1.4, Table 5-1) v2: - Added Reported-by and Tested-by tags v3: - Add back YCbCr comment(Suraj) Cc: stable@vger.kernel.org #v5.8+ Reported-by: DeepChirp <DeepChirp@outlook.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/15874 Tested-by: DeepChirp <DeepChirp@outlook.com> Fixes: `9799c4c3b7` ("drm/i915/dp: Add compute routine for DP VSC SDP") Assisted-by: GitHub-Copilot:GPT-5.4 Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260505090920.2479112-1-chaitanya.kumar.borah@intel.com (cherry picked from commit 38e10ddae6f8d42a2e8437fcd25a1cac51106c64) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>	2026-05-12 08:05:24 +01:00
Sebastian Brzezinka	4cfe4c0efb	drm/i915: skip __i915_request_skip() for already signaled requests After a GPU reset the HWSP is zeroed, so previously completed requests appear incomplete. If such a request is picked up during reset_rewind() and marked guilty, i915_request_set_error_once() returns early (fence already signaled), leaving fence.error without a fatal error code. The subsequent __i915_request_skip() then hits: ``` GEM_BUG_ON(!fatal_error(rq->fence.error)) ``` Fixes a kernel BUG observed on Sandy Bridge (Gen6) during heartbeat-triggered engine resets. ``` kernel BUG at drivers/gpu/drm/i915/i915_request.c:556! RIP: __i915_request_skip+0x15e/0x1d0 [i915] ... __i915_request_reset+0x212/0xa70 [i915] reset_rewind+0xe4/0x280 [i915] intel_gt_reset+0x30d/0x5b0 [i915] heartbeat+0x516/0x530 [i915] ``` Guard __i915_request_skip() with i915_request_signaled(), if the fence is already signaled, the ring content is committed and there is nothing left to skip. Fixes: `36e191f064` ("drm/i915: Apply i915_request_skip() on submission") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/13729 Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Cc: stable@vger.kernel.org # v5.7+ Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/fe76921d35b6ae85aa651822726d0d9815aa5362.1776339012.git.sebastian.brzezinka@intel.com (cherry picked from commit 5ba54393dcd7adf75a9f39f5a933b1538349cad5) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>	2026-05-12 08:05:21 +01:00
Sven Eckelmann	99d9958fa1	batman-adv: tt: prevent TVLV entry number overflow The helpers to prepare the buffers for the local and global TT based replies are trying to sum up all TT entries which can be found for each VLAN. In theory, this sum can be too big for an u16 and therefore overflow. A too small buffer would then be allocated for the TVLV. The too small buffer will be handled gracefully by batadv_tt_tvlv_generate() and is not causing a buffer overflow - just a truncated reply. But this overflow shouldn't have happened in the first and the too small buffer should never have been allocated when an overflow was detected. Cc: stable@kernel.org Fixes: `7ea7b4a142` ("batman-adv: make the TT CRC logic VLAN specific") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-12 08:33:53 +02:00
Sven Eckelmann	fa1bd70494	batman-adv: tt: avoid empty VLAN responses The commit `16116dac23` ("batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs") added checks to the local (direct) TT response code. But the response can also be done indirectly by another node using the global TT state. To avoid such inconsistency states reported in the original fix, also avoid sending empty VLANs for replies from the global TT state. Cc: stable@kernel.org Fixes: `7ea7b4a142` ("batman-adv: make the TT CRC logic VLAN specific") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-12 08:33:53 +02:00
Sven Eckelmann	94d2700501	batman-adv: tt: fix TOCTOU race for reported vlans The local TT based TVLV is generated by first checking the number of VLANs which have at least one TT entry. A new buffer with the correct size for the VLANs is then allocated. Only then, the list of VLANs s used to fill the VLAN entries in the buffer. During this time, the meshif_vlan_list_lock is held. But the actual number of TT entries of each VLAN can still increase during this time - just not the number of VLANs in the list. But the prefilter used in the buffer size calculation might still cause an increase of the number of VLANs which need to be stored. Simply because a VLAN might now suddenly have at least one entry when it had none in the pre-alloc check - and then needs to occupy space which was not allocated. It is better to overestimate the buffer size at the beginning and then fill the buffer only with the VLANs which are not empty. Cc: stable@kernel.org Fixes: `16116dac23` ("batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-12 08:33:53 +02:00
Sven Eckelmann	fc92cdfcb2	batman-adv: tt: fix negative last_changeset_len batadv_piv_tt::last_changeset_len len was declared as s16, but the field is never intended to hold a negative value. When a value greater than 32767 is assigned, it wraps to a negative signed integer. In batadv_send_my_tt_response(), last_changeset_len is temporarily widened to s32. The incorrectly negative s16 value propagates into the s32, causing batadv_tt_prepare_tvlv_local_data() to allocate a full sized buffer but populates only a small portion of it with the collected changeset. All remaining bits are kept uninitialized. Using an u16 avoids this type confusion and ensures that no (negative) sign extension is performed in batadv_send_my_tt_response(). Cc: stable@kernel.org Fixes: `a73105b8d4` ("batman-adv: improved client announcement mechanism") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-12 08:33:52 +02:00
Sven Eckelmann	b64963a2ce	batman-adv: tt: fix negative tt_buff_len batadv_orig_node::tt_buff_len was declared as s16, but the field is never intended to hold a negative value. When a value greater than 32767 is assigned, it wraps to a negative signed integer. In batadv_send_other_tt_response(), tt_buff_len is temporarily widened to s32. The incorrectly negative s16 value propagates into the s32, causing batadv_tt_prepare_tvlv_global_data() to allocate a full sized buffer but populates only a small portion of it with the collected changeset. All remaining bits are kept uninitialized. Using an u16 avoids this type confusion and ensures that no (negative) sign extension is performed in batadv_send_other_tt_response(). Cc: stable@kernel.org Fixes: `a73105b8d4` ("batman-adv: improved client announcement mechanism") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-12 08:33:52 +02:00
Sven Eckelmann	1e9fab756f	batman-adv: tt: reject oversized local TVLV buffers The commit `3a359bf5c6` ("batman-adv: reject oversized global TT response buffers") added a check to ensure that a global return buffer size can be stored in an u16. The same buffer handling also exists for the local data buffer but was not touched. A similar check should be also be in place for the local TVLV buffer. It doesn't have the similar attack surface because it is only generated from locally discovered MAC addresses but the dynamic nature could still cause temporarily to large buffers. Cc: stable@kernel.org Fixes: `7ea7b4a142` ("batman-adv: make the TT CRC logic VLAN specific") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-12 08:33:52 +02:00
Aboorva Devarajan	dbc30a57bd	powerpc/hv-gpci: fix preempt count leak in sysfs show paths Four sysfs show() callbacks in hv-gpci take get_cpu_var(hv_gpci_reqb) (which calls preempt_disable()) but only call the matching put_cpu_var() on the error path under the 'out:' label. Every successful read leaks one preempt_disable(): processor_bus_topology_show() processor_config_show() affinity_domain_via_virtual_processor_show() affinity_domain_via_domain_show() (affinity_domain_via_partition_show() was already correct.) On a CONFIG_PREEMPT=y kernel, repeated reads raise preempt_count and eventually return to userspace with preemption still disabled. The next user-mode page fault then hits faulthandler_disabled() == 1, gets forced to SIGSEGV, and the resulting coredump trips 'BUG: scheduling while atomic' in call_usermodehelper_exec -> wait_for_completion_state -> schedule: BUG: scheduling while atomic: <task>/<pid>/0x00000004 ... __schedule_bug+0x6c/0x90 __schedule+0x58c/0x13a0 schedule+0x48/0x1a0 schedule_timeout+0x104/0x170 wait_for_completion_state+0x16c/0x330 call_usermodehelper_exec+0x254/0x2d0 vfs_coredump+0x1050/0x2590 get_signal+0xb9c/0xc80 do_notify_resume+0xf8/0x470 Add an out_success label that calls put_cpu_var() before returning the byte count, mirroring affinity_domain_via_partition_show(). Fixes: `71f1c39647` ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor bus topology information") Fixes: `1a160c2a13` ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor config information") Fixes: `71a7ccb478` ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via virtual processor information") Fixes: `a69a57cac1` ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via domain information") Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260508041256.3447113-1-aboorvad@linux.ibm.com	2026-05-12 11:51:17 +05:30
Julian Braha	aef656a0e6	powerpc: fix dead default for GUEST_STATE_BUFFER_TEST The GUEST_STATE_BUFFER_TEST config option should default to KUNIT_ALL_TESTS so that if all tests are enabled then it is included, but currently the 'default KUNIT_ALL_TESTS' statement is shadowed by 'def_tristate n', meaning that this second default statement is currently dead code. It looks to me like the commit `6ccbbc33f0` ("KVM: PPC: Add helper library for Guest State Buffers") intended to set the default to KUNIT_ALL_TESTS, but mistakenly missed the def_tristate. This dead code was found by kconfirm, a static analysis tool for Kconfig. Fixes: `6ccbbc33f0` ("KVM: PPC: Add helper library for Guest State Buffers") Signed-off-by: Julian Braha <julianbraha@gmail.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Reviewed-by: Amit Machhiwal <amachhiw@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260405161545.161006-1-julianbraha@gmail.com	2026-05-12 11:50:52 +05:30
Bart Van Assche	f98020c22a	powerpc/powermac: Remove pmac_low_i2c_{lock,unlock}() Commit `a28d3af2a2` ("[PATCH] 2/5 powerpc: Rework PowerMac i2c part 2") removed the last calls to the pmac_low_i2c_{lock,unlock}() functions. Hence, remove these two functions. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260316174747.3871924-1-bvanassche@acm.org	2026-05-12 11:50:09 +05:30
Ma Ke	108d7f9512	powerpc/warp: Fix error handling in pika_dtm_thread pika_dtm_thread() acquires client through of_find_i2c_device_by_node() but fails to release it in error handling path. This could result in a reference count leak, preventing proper cleanup and potentially leading to resource exhaustion. Add put_device() to release the reference in the error handling path. Found by code review. Cc: stable@vger.kernel.org Fixes: `3984114f05` ("powerpc/warp: Platform fix for i2c change") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251116024411.21968-1-make24@iscas.ac.cn	2026-05-12 11:49:30 +05:30
Ally Heev	acd1e47db0	powerpc: 82xx: fix uninitialized pointers with free attribute Uninitialized pointers with `__free` attribute can cause undefined behavior as the memory allocated to the pointer is freed automatically when the pointer goes out of scope. powerpc/km82xx doesn't have any bugs related to this as of now, but, it is better to initialize and assign pointers with `__free` attribute in one statement to ensure proper scope-based cleanup Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/aPiG_F5EBQUjZqsl@stanley.mountain/ Signed-off-by: Ally Heev <allyheev@gmail.com> Fixes: `4aa5cc1e00` ("powerpc-km82xx.c: replace of_node_put() with __free") Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251116-aheev-uninitialized-free-attr-km82xx-v2-1-4307e2b5300d@gmail.com	2026-05-12 11:48:47 +05:30
Linus Walleij	8d57bb6173	powerpc/g5: Enable all windfarms by default The G5 defconfig is clearly intended for the G5 Powermac series, and that should enable all the available windfarm drivers, or the machine will overheat a short while after booting and shut itself down, which is annoying. Signed-off-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260505-powermac-g5-config-v3-1-7747bf72f874@kernel.org	2026-05-12 11:48:01 +05:30
Prathyushi Nangia	c21b90f776	x86/CPU/AMD: Prevent improper isolation of shared resources in Zen2's op cache Make sure resources are not improperly shared in the op cache and cause instruction corruption this way. Signed-off-by: Prathyushi Nangia <prathyushi.nangia@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-05-11 20:06:36 -07:00
Guangshuo Li	53597deca0	drm/bridge: imx8qxp-pxl2dpi: avoid ERR_PTR with device_node cleanup imx8qxp_pxl2dpi_get_available_ep_from_port() returns ERR_PTR() on errors. imx8qxp_pxl2dpi_find_next_bridge() stores its return value in a __free(device_node) variable before checking IS_ERR(). When the function returns on the error path, the cleanup action calls of_node_put() on the ERR_PTR() value. Do not let a device_node cleanup variable hold error pointers. Change imx8qxp_pxl2dpi_get_available_ep_from_port() to return an int and pass the endpoint node through an output argument. Initialize the output argument to NULL so callers hold either NULL on error paths or a valid device_node pointer on successful path. Fixes: `ceea3f7806` ("drm/bridge: imx8qxp-pxl2dpi: simplify put of device_node pointers") Cc: stable@vger.kernel.org Reviewed-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260507100604.667731-1-lgs201920130244@gmail.com Signed-off-by: Liu Ying <victor.liu@nxp.com>	2026-05-12 10:13:19 +08:00
Richard Fitzgerald	8b7c5cc7f6	ASoC: cs35l56: Check for successful runtime-resume in cs35l56_dsp_work() In cs35l56_dsp_work() check that the request for runtime-resume was successful instead of assuming that it can't fail. We may as well do this using the new PM_RUNTIME_ACQUIRE*() macros and remove the manual pm_runtime_put_autosuspend() and associated gotos. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20260511114239.44970-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>	2026-05-12 10:22:13 +09:00
Mario Limonciello	2c7b1227e5	ASoC: SOF: amd: Fix error code handling in psp_send_cmd() The smn_read_register() helper returns negative error codes on failure or the register value on success. When used with read_poll_timeout(), the return value is stored in the 'data' variable. Currently 'data' is declared as u32, which causes negative error codes to be cast to large positive values. This makes the condition 'data > 0' incorrectly treat errors as success. Fix by changing 'data' from u32 to int, matching the pattern used in psp_mbox_ready() which correctly handles the same helper function. Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/linux-sound/agGES8vWrLOrBu28@stanley.mountain/ Fixes: `f120cf33d2` ("ASoC: SOF: amd: Use AMD_NODE") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20260511153638.724810-1-mario.limonciello@amd.com Signed-off-by: Mark Brown <broonie@kernel.org>	2026-05-12 10:14:33 +09:00
Evgenii Burenchev	be48e5fe51	qed: fix division by zero in qed_init_wfq_param when all vports are configured In qed_init_wfq_param(), variable non_requested_count can become zero when the number of vports with the configured flag set (including the current vport being configured) equals total num_vports. This happens when configuring the last unconfigured vport or when re-configuring an already configured vport. The function then calculates left_rate_per_vp = total_left_rate / non_requested_count, which causes division by zero. Fix this by skipping the division when non_requested_count is zero. In that case, there is no remaining bandwidth to distribute, so just record the configuration for the current vport and return success. Fixes: `bcd197c81f` ("qed: Add vport WFQ configuration APIs") Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru> Link: https://patch.msgid.link/20260507145520.23106-1-evg28bur@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-11 18:11:06 -07:00
Davide Caratti	f8e64e956a	net/sched: dualpi2: initialize timer earlier in dualpi2_init() 'pi2_timer' needs to be initialized in all error paths of dualpi2_init(): otherwise, a failure in qdisc_create_dflt() causes the following crash in dualpi2_destroy(): # tc qdisc add dev crash0 handle 1: root dualpi2 BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 4 UID: 0 PID: 471 Comm: tc Tainted: G E 7.1.0-rc1-virtme #2 PREEMPT(full) Tainted: [E]=UNSIGNED_MODULE Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:hrtimer_active+0x39/0x60 Code: f9 eb 23 0f b6 41 38 3c 01 0f 87 87 64 c0 ff 83 e0 01 75 33 48 39 4a 50 74 28 44 3b 42 10 75 06 48 3b 51 30 74 21 48 8b 51 30 <44> 8b 42 10 41 f6 c0 01 74 cf f3 90 44 8b 42 10 41 f6 c0 01 74 c3 RSP: 0018:ffffd0db80b93620 EFLAGS: 00010282 RAX: ffffffffc0400320 RBX: ffff8cf24a4c86b8 RCX: ffff8cf24a4c86b8 RDX: 0000000000000000 RSI: ffff8cf2429c2ab0 RDI: ffff8cf24a4c86b8 RBP: 00000000fffffff4 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000000001 R11: ffff8cf24a39c500 R12: ffff8cf24822c000 R13: ffffd0db80b936c0 R14: ffffffffc02cf360 R15: 00000000ffffffff FS: 00007fbc01706580(0000) GS:ffff8cf2dc759000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000008e02003 CR4: 0000000000172ef0 Call Trace: <TASK> hrtimer_cancel+0x15/0x40 dualpi2_destroy+0x20/0x40 [sch_dualpi2] qdisc_create+0x230/0x570 tc_modify_qdisc+0x716/0xc10 rtnetlink_rcv_msg+0x188/0x780 netlink_rcv_skb+0xcd/0x150 netlink_unicast+0x1ba/0x290 netlink_sendmsg+0x242/0x4d0 ____sys_sendmsg+0x39e/0x3e0 ___sys_sendmsg+0xe1/0x130 __sys_sendmsg+0xad/0x110 do_syscall_64+0x14f/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fbc0188b08e Code: 4d 89 d8 e8 94 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 03 ff ff ff 0f 1f 00 f3 0f 1e fa RSP: 002b:00007fff593260e0 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc0188b08e RDX: 0000000000000000 RSI: 00007fff59326190 RDI: 0000000000000003 RBP: 00007fff593260f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 000055f06124f260 R13: 0000000069fca043 R14: 000055f061255640 R15: 000055f06124d3f8 </TASK> Modules linked in: sch_dualpi2(E) CR2: 0000000000000010 [1] https://lore.kernel.org/netdev/2e78e01c504c633ebdff18d041833cf2e079a3a4.1607020450.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/20200725201707.16909-1-xiyou.wangcong@gmail.com/ v2: - rebased on top of latest net.git Fixes: `320d031ad6` ("sched: Struct definition and parsing of dualpi2 qdisc") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/1faca91179702b31da5d87653e1e036543e32722.1778259798.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-11 18:03:16 -07:00
Abel Vesa	12c97d1c15	arm64: dts: qcom: glymur: Drop RPMh CXO clocks from QMP PHYs On Glymur, all QMP PHYs except the one used by USB SS0 take their reference clock from the TCSR clock controller. Since these TCSR clocks already derive from RPMH_CXO_CLK as their sole parent, there is no need to provide an extra `clkref` clock to the PHY nodes. Drop the extra RPMh CXO clock inputs and use the TCSR clocks as the PHY reference clocks instead. This also fixes the devicetree schema validation, as the bindings do not allow a separate `clkref` clock. Fixes: `4eee57dd4d` ("arm64: dts: qcom: glymur: Add USB related nodes") Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reported-by: Rob Herring <robh@kernel.org> Closes: https://lore.kernel.org/r/20260410145205.GA554754-robh@kernel.org/ Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260414-dts-glymur-drop-rpmh-cxo-clk-from-qmpphys-v1-1-ab12d77c4aec@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>	2026-05-11 20:00:11 -05:00
Kuniyuki Iwashima	03cb001ef8	tcp: Fix out-of-bounds access for twsk in tcp_ao_established_key(). lockdep_sock_is_held() was added in tcp_ao_established_key() by the cited commit. It can be called from tcp_v[46]_timewait_ack() with twsk. Since it does not have sk->sk_lock, the lockdep annotation results in out-of-bound access. $ pahole -C tcp_timewait_sock vmlinux \| grep size /* size: 288, cachelines: 5, members: 8 / $ pahole -C sock vmlinux \| grep sk_lock socket_lock_t sk_lock; / 440 192 */ Let's not use lockdep_sock_is_held() for TCP_TIME_WAIT. Fixes: `6b2d11e2d8` ("net/tcp: Add missing lockdep annotations for TCP-AO hlist traversals") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260508120853.4098365-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-11 17:50:15 -07:00
Arthur Kiyanovski	24a08d7d62	net: ena: PHC: Check return code before setting timestamp output ena_phc_gettimex64() is setting the output parameter regardless of whether ena_com_phc_get_timestamp() succeeded or failed. When ena_com_phc_get_timestamp() returns an error, the timestamp parameter may contain uninitialized stack memory (e.g., when PHC is disabled or in blocked state) or invalid hardware values. Passing these to userspace via the PTP ioctl is both a security issue (information leak) and a correctness bug. Fix by checking the return code after releasing the lock and only setting the output timestamp on success. Fixes: `e0ea34158e` ("net: ena: Add PHC support in the ENA driver") Cc: stable@vger.kernel.org Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260507003518.22554-1-akiyano@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-11 17:22:48 -07:00
Allison Henderson	e174929793	net/rds: reset op_nents when zerocopy page pin fails When iov_iter_get_pages2() fails in rds_message_zcopy_from_user(), the pinned pages are released with put_page(), and rm->data.op_mmp_znotifier is cleared. But we fail to properly clear rm->data.op_nents. Later when rds_message_purge() is called from rds_sendmsg() the cleanup loop iterates over the incorrectly non zero number of op_nents and frees them again. Fix this by properly resetting op_nents when it should be in rds_message_zcopy_from_user(). Fixes: `0cebaccef3` ("rds: zerocopy Tx support.") Signed-off-by: Allison Henderson <achender@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260505234336.2132721-1-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-11 17:20:02 -07:00
Florian Eckert	83dd9effef	MAINTAINERS: Remove Chuanhua Lei as PCIe intel-gw maintainer Chuanhua Lei's email address has been bouncing for months. Remove the entry and mark the PCI intel-gw driver as orphaned. Signed-off-by: Florian Eckert <fe@dev.tdt.de> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260417-pcie-intel-gw-v5-1-0a2b933fe04f@dev.tdt.de	2026-05-11 18:10:48 -05:00
Johannes Thumshirn	3a8389d42b	zonefs: handle integer overflow in zonefs_fname_to_fno In zonefs the file name in one of the two directories corresponds to the zone number. Here Alexey reported a possible integer overflow in zonefs_fname_to_fno(), where the parsing of the zone number from the file name can overflow the 'long' data type. Add a check for integer overflows and if the fno 'long' did overflow return -ENOENT. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Fixes: `d207794aba` ("zonefs: Dynamically create file inodes when needed") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>	2026-05-12 07:53:20 +09:00
Linus Torvalds	50897c9559	linux_kselftest-kunit-fixes-7.1-rc4 Fix to decouple KUNIT_DEBUGFS and KUNIT_ALL_TESTS options and fix KUNIT_DEBUGFS dependencies so it depends on DEBUG_FS without which it will not be useful. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmoCUf0ACgkQCwJExA0N Qxxsog//Wd/HweNWaLJFG3z9QL4bCAl7cfDTV0uNAobnHge5ymqaKgWmXxi6Xpnt CVYOxDsqJ+Pt4AZedOZ8ZAx09rw0NkfbkRRdJMSvh+7EBJHNwguKxaWa99zcz96p O4qppKOWOeLWP5YDIDlDLK7atiDhBQjm/p5LdtDBa1oxnD3g2ETvu4mUmPwpNT1W t07duEbjLblvOL+hAZ9oRn56638pAyFQG3B9Gs7pjY/YsVWwEIpJnRmFjVu0ZdwU gSbxXYgsc8L/EKFWqsz1kfrZjb24s6z4amYCg12UMGBr1YVfSWp0CLkPjm+BbTjx SzSoPmUSg4g5m+nLwvGkcK9cGhEvAoyPsDdywcSPtE66vOdbkzdeRQ/ukzwcA+5o hm2eC4Z3GXIj+lw2yXzeLflrupUBzi2mZbj0Rcm/rn3h0fUmmDhQf9AoKbH4q/Q+ UKT/2U0R39WO/e7LTYMpcaajXCNFiAwl3eFpoJfYlWWYV56YgqIp4Q2/B9lmRsi1 YNTVJUNtesgtzVOUsRjlgjjKMJjXF6wZRVf8YU4e0Uj95gxEc9pxcilmoOnMqWFr VvutsERfhaCLbfRnEKx2MP9hsora66aqF3fdMaseM9JoNiuXdlXONIQSD8vAuOkR fVp38l2v/SvOb6zyVfAURFYPlPaXUfPGRHraeyQcBcXfew+B8vg= =QdSk -----END PGP SIGNATURE----- Merge tag 'linux_kselftest-kunit-fixes-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fixes from Shuah Khan: "Fix to decouple KUNIT_DEBUGFS and KUNIT_ALL_TESTS options and fix KUNIT_DEBUGFS dependencies so it depends on DEBUG_FS without which it will not be useful" * tag 'linux_kselftest-kunit-fixes-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: config: KUNIT_DEBUGFS should depend on DEBUG_FS kunit: config: Enable KUNIT_DEBUGFS by default	2026-05-11 15:38:49 -07:00
Tejun Heo	9a415cc537	sched_ext: Avoid UAF in scx_root_enable_workfn() init failure path In scx_root_enable_workfn(), put_task_struct(p) is called before scx_error() dereferences p->comm and p->pid. If the iterator's reference is the last drop, the task is freed synchronously and the deref becomes a UAF. Move put_task_struct() past scx_error(). Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://lore.kernel.org/all/20260511214031.AF5E9C2BCB0@smtp.kernel.org/ Fixes: `f0e1a0643a` ("sched_ext: Implement BPF extensible scheduler class") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-11 12:05:48 -10:00
Jesse Zhang	5d08559c91	drm/amdgpu/gfx_v12_0: set gfx.rs64_enable from PFP header on GFX12 gfx_v12_0_init_microcode() always loads RS64 CP ucode but never set adev->gfx.rs64_enable, so it stayed false and code that branches on it (e.g. MEC pipe reset) used the legacy CP_MEC_CNTL path incorrectly. Match GFX11: derive RS64 mode from the PFP firmware header (v2.0) via amdgpu_ucode_hdr_version(). Log at debug when RS64 is enabled. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b03d53598b0d2048e8fa7303b8d0784768ec4fa6)	2026-05-11 17:54:44 -04:00
Xiang Liu	6bbede02dc	drm/amd/ras: Fix CPER ring debugfs read overflow The legacy CPER debugfs reader can reach the payload path without a valid pointer snapshot. The remaining user byte count is also treated as the ring occupancy in dwords, so reads past the header can copy more than requested. Take the CPER lock before sampling pointers. Resample rptr/wptr for payload reads, bound the payload copy by available dwords and the remaining user size, and advance the file position for each dword copied. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1e40ef87ffdc291e05ccdade8b9170cc9c1c4249)	2026-05-11 17:54:28 -04:00
Mikhail Gavrilov	183182235f	drm/amd/display: Wrap DCN32 phantom-plane allocation in DC_RUN_WITH_PREEMPTION_ENABLED [Why] dcn32_validate_bandwidth() wraps dcn32_internal_validate_bw() with DC_FP_START()/DC_FP_END(). In x86 non-RT, DC_FP_START takes fpregs_lock(), which disables local softirqs. The DML1 path through dcn32_enable_phantom_plane() calls kvzalloc() to allocate ~335 KiB for dc_plane_state. This triggers the vmalloc path, which calls BUG_ON(in_interrupt()) because it's invoked within the FPU-enabled (softirq disabled) region, leading to a kernel crash. [How] Wrap the dc_state_create_phantom_plane() call with the DC_RUN_WITH_PREEMPTION_ENABLED() macro to allow preemption during this memory allocation. Fixes: `235c676342` ("drm/amd/display: add DCN32/321 specific files for Display Core") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4470 Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: James Lin <pinglei.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 885ccbef7b94a8b38f69c4211c679021aa27ad11) Cc: stable@vger.kernel.org	2026-05-11 17:53:31 -04:00
Christian König	0071e01c61	drm/amdgpu: fix userq hang detection and reset Fix lock inversions pointed out by Prike and Sunil. The hang detection timeout CAN'T grab locks under which we wait for fences, especially not the userq_mutex lock. Then instead of this completely broken handling with the hang_detect_fence just cancel the work when fences are processed and re-start if necessary. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1b62077f045ac6ffde7c97005c6659569ac5c1ec)	2026-05-11 17:47:11 -04:00
Christian König	d0053441ad	drm/amdgpu: remove almost all calls to amdgpu_userq_detect_and_reset_queues Well the reset handling seems broken on multiple levels. As first step of fixing this remove most calls to the hang detection. That function should only be called after we run into a timeout! And NOT as random check spread over the code in multiple places. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 71bea36b54ccfb14cbc90f94267af6369af4e702)	2026-05-11 17:47:04 -04:00

... 25 26 27 28 29 ...

1446786 Commits