cifs_copy_folioq_to_iter() copies a requested number of bytes from
a folio queue into the destination iterator. Since the encrypted
SMB2 READ path was changed to pass the server-declared payload
length (data_len) instead of the larger folioq buffer length, the
caller can ask for fewer bytes than the folio queue holds.
In that case the helper continues walking the remaining folios after
data_size has reached zero and calls copy_folio_to_iter() with
len = 0, which is unnecessary work.
The helper also returns 0 (success) when the folio queue is
exhausted before data_size bytes have been copied. The caller has
no way to distinguish that from a full copy and the reported
transfer count ends up larger than the amount of data placed in the
iterator.
Add an early exit when data_size reaches zero, and return an error
when the folio queue is exhausted before all requested bytes have
been copied.
Signed-off-by: Jeremy Erazo <mendozayt13@gmail.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In handle_read_data() the encrypted/folioq branch
(buf_len <= data_offset, reached via receive_encrypted_read for
transform PDUs > CIFSMaxBufSize + MAX_HEADER_SIZE) copies the READ
payload using buffer_len rather than data_len:
rdata->result = cifs_copy_folioq_to_iter(buffer, buffer_len,
cur_off,
&rdata->subreq.io_iter);
...
rdata->got_bytes = buffer_len;
buffer_len comes from the SMB3 transform header OriginalMessageSize
field (OriginalMessageSize - read_rsp_size); it represents the size
of the decrypted message after the SMB2 header. data_len comes from
the SMB2 READ response DataLength field; it represents the actual
READ payload size and may be smaller than buffer_len when the
decrypted message contains padding or other trailing bytes after the
READ payload. The existing check `data_len > buffer_len - pad_len`
only enforces an upper bound, so a server that emits
OriginalMessageSize larger than read_rsp_size + pad_len + data_len
passes the check and the kernel copies buffer_len bytes per response,
ignoring the server-asserted DataLength.
Two observable failures with a crafted server (DataLength=4,
buffer_len=20000):
- the kernel returns 20000 bytes per sub-request to userspace and
sets got_bytes = buffer_len, even though the response claimed
only 4 bytes of payload;
- on a partial netfs sub-request whose iterator is sized to
data_len, the over-large copy_folio_to_iter() short-reads,
cifs_copy_folioq_to_iter() returns -EIO via the n != len path,
and the entire netfs read collapses to -EIO even though the
leading sub-requests succeeded.
Use data_len for the copy length and for got_bytes so the kernel
honours the server-asserted READ payload size. For well-formed
servers (where buffer_len == pad_len + data_len) the change is
behaviour-equivalent.
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Erazo <mendozayt13@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Please consider pulling these changes from the signed vfs-7.1-rc5.fixes tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCagq67gAKCRCRxhvAZXjc
ooRHAP0Scrpsiloo7JPM1u0DZZwvTdb9JRlx6k/KXkeN0j5L/wD9FVA9AXarcta5
h37k+SZpz8FuWkoY5LxTvUNbV6mr0w0=
=Enhi
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.1-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a fixes for the current development cycle. Note that AI
related review sometimes delays fixes a bit because we find more fixes
for the fixes. I might try and send smaller but more fixes PRs if this
trend keeps up.
- Fix various netfslib bugs
- Fix an out-of-bounds write when listing idmappings
- Fix the return values in jfs_mkdir() and orangefs_mkdir()
- Fix a writeback writeback array overflow in fuse
- Fix a forced iversion increment on lazytime timestamp updates
- Reject a negative timeval component in kern_select()
- Fix error return when vfs_mkdir() fails in the cachefiles code
- Fix wrong error code returned for pidns ioctls"
* tag 'vfs-7.1-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
cachefiles: Fix error return when vfs_mkdir() fails
afs: Fix the locking used by afs_get_link()
netfs, afs: Fix write skipping in dir/link writepages
netfs: Fix netfs_read_folio() to wait on writeback
netfs: Fix folio->private handling in netfs_perform_write()
netfs: Fix partial invalidation of streaming-write folio
netfs: Fix potential UAF in netfs_unlock_abandoned_read_pages()
netfs: Fix leak of request in netfs_write_begin() error handling
netfs: Fix early put of sink folio in netfs_read_gaps()
netfs: Fix write streaming disablement if fd open O_RDWR
netfs: Fix read-gaps to remove netfs_folio from filled folio
netfs: Fix potential deadlock in write-through mode
netfs: Fix streaming write being overwritten
netfs: Defer the emission of trace_netfs_folio()
netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone
netfs: Fix overrun check in netfs_extract_user_iter()
netfs: fix error handling in netfs_extract_user_iter()
netfs: Fix potential uninitialised var in netfs_extract_user_iter()
netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call
netfs: Fix zeropoint update where i_size > remote_i_size
...
SMB2 READ response validation in cifs_readv_receive() and
handle_read_data() checks data_offset + data_len against the received
buffer length. Both values are attacker-controlled fields from the
server response and are stored as unsigned int, so the addition can
wrap before the bounds check:
fs/smb/client/transport.c:1259
if (!use_rdma_mr && (data_offset + data_len > buflen))
fs/smb/client/smb2ops.c:4839
else if (buf_len >= data_offset + data_len)
A malicious SMB server can use this to bypass validation. In the
non-encrypted receive path the client attempts an oversized socket
read and stalls for the SMB response timeout (180 seconds) before
reconnecting. In the SMB3 encrypted path, runtime testing shows the
malformed length can reach copy_to_iter() in handle_read_data() with
attacker-controlled size, where usercopy hardening stops the oversized
copy before bytes reach userspace.
Guard both call sites with check_add_overflow(), which is already
used elsewhere in this subsystem (smb2pdu.c). On overflow, treat the
response as malformed and reject with -EIO.
Signed-off-by: Jeremy Erazo <mendozayt13@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix potential tearing in using ->remote_i_size and ->zero_point by copying
i_size_read() and i_size_write() and using the same seqcount as for i_size.
We need to make sure that netfslib and the filesystems that use it always
hold i_lock whilst updating any of the sizes to prevent i_size_seqcount
from getting corrupted.
Fixes: 4058f74210 ("netfs: Keep track of the actual remote file size")
Fixes: 100ccd18bb ("netfs: Optimise away reads above the point at which there can be no data")
Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260512123404.719402-6-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Today we skip calling change_conf for negotiates and session setup
requests. This can be a problem for mchan as the immediate next call
after session setup could be due to an I/O that is made on the
mount point. For single channel, this is not a problem as
there will be several calls after setting up session.
This change enforces calling change_conf when the total credits contain
enough for reservations for echoes and oplocks. We expect this to happen
during the last session setup response. This way, echoes and oplocks are
not disabled before the first request to the server. So if that first
request is an open, it does not need to disable requesting leases.
Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
smb2_ioctl_query_info() has two response-copy branches: PASSTHRU_FSCTL
and the default QUERY_INFO path. The QUERY_INFO branch clamps
qi.input_buffer_length to the server-reported OutputBufferLength and then
copies qi.input_buffer_length bytes from qi_rsp->Buffer to userspace, but
it never verifies that the flexible-array payload actually fits within
rsp_iov[1].iov_len.
A malicious server can return OutputBufferLength larger than the actual
QUERY_INFO response, causing copy_to_user() to walk past the response
buffer and expose adjacent kernel heap to userspace.
Guard the QUERY_INFO copy with a bounds check on the actual Buffer
payload. Use struct_size(qi_rsp, Buffer, qi.input_buffer_length)
rather than an open-coded addition so the guard cannot overflow on
32-bit builds.
Fixes: f5778c3987 ("SMB3: Allow SMB3 FSCTL queries to be sent to server from tools")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Steve French <stfrench@microsoft.com>
In receive_encrypted_read(), the length of data to read from the socket
is computed as:
len = le32_to_cpu(tr_hdr->OriginalMessageSize) -
server->vals->read_rsp_size;
OriginalMessageSize comes from the server's transform header and is
untrusted. If a malicious server sends a value smaller than
read_rsp_size, the unsigned subtraction wraps to a very large value
(~4GB). This value is then passed to netfs_alloc_folioq_buffer() and
cifs_read_iter_from_socket(), causing either a massive allocation
attempt that fails with -ENOMEM (DoS), or under extreme memory
pressure, potential heap corruption.
Fix by adding a check that OriginalMessageSize is at least
read_rsp_size before the subtraction. On failure, jump to
discard_data to drain the remaining PDU from the socket, preventing
desync of subsequent reads on the connection.
Signed-off-by: Dudu Lu <phx0fer@gmail.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
When updating ->i_size, make sure to always update ->i_blocks as well
until we query new allocation size from the server.
generic/694 was failing because smb3_simple_falloc() was missing the
update of ->i_blocks after calling cifs_setsize(). So, fix this by
updating ->i_blocks directly in cifs_setsize(), so all places that
call it doesn't need to worry about updating ->i_blocks later.
Reported-by: Shyam Prasad N <sprasad@microsoft.com>
Closes: https://lore.kernel.org/r/CANT5p=rqgRwaADB=b_PhJkqXjtfq3SFv41SSTXSVEHnuh871pA@mail.gmail.com
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
When looking up open handles to be re-used in cifs_open(), calling
cifs_get_{writable,readable}_path() is wrong as it will look up for
the first matching open handle, and if @file->f_flags doesn't match,
it will ignore the remaining open handles in
cifsInodeInfo::openFileList that might potentially match
@file->f_flags.
For writable and readable handles, fix this by calling
__cifs_get_writable_file() and __find_readable_file(), respectively,
with FIND_OPEN_FLAGS set.
With the patch, the following program ends up with two opens instead
of three sent over the wire.
```
#define _GNU_SOURCE
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
int fd;
fd = open("/mnt/1/foo", O_CREAT | O_WRONLY | O_TRUNC, 0664);
close(fd);
fd = open("/mnt/1/foo", O_DIRECT | O_WRONLY);
close(fd);
fd = open("/mnt/1/foo", O_WRONLY);
close(fd);
fd = open("/mnt/1/foo", O_DIRECT | O_WRONLY);
close(fd);
return 0;
}
```
```
$ mount.cifs //srv/share /mnt/1 -o ...
$ gcc test.c && ./a.out
```
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
parse_server_interfaces() initializes interface socket addresses with
CIFS_PORT. When the mount uses a non-default port this overwrites the
configured destination port.
Later, cifs_chan_update_iface() copies this sockaddr into server->dstaddr,
causing reconnect attempts to use the wrong port after server interface
updates.
Use the existing port from server->dstaddr instead.
Cc: stable@vger.kernel.org
Fixes: fe856be475 ("CIFS: parse and store info on iface queries")
Tested-by: Dr. Thomas Orgis <thomas.orgis@uni-hamburg.de>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Use atomic_t for cifs_sb_info::mnt_cifs_flags as it's currently
accessed locklessly and may be changed concurrently in mount/remount
and reconnect paths.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
In several places in the code, we have a label to signify
the start of the code where a request can be replayed if
necessary. However, some of these places were missing the
necessary reinitializations of certain local variables
before replay.
This change makes sure that these variables get initialized
after the label.
Cc: stable@vger.kernel.org
Reported-by: Yuchan Nam <entropy1110@gmail.com>
Tested-by: Yuchan Nam <entropy1110@gmail.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This code was recently changed from manually decrementing
tcon ref to using cifs_put_tcon. But even before that change
this tracing happened after decrementing the ref count, which
is wrong. With cifs_put_tcon, tracing already happens inside it.
So just removing the extra tracing here.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Customer reported data corruption in some of their files. It turned
out the client would end up calling cacheless IO functions while
having RHW lease, bypassing the pagecache and then leaving gaps in the
file while writing to it. It was related to concurrent opens changing
the lease state while having writes in flight. Lease breaks and
re-opens due to reconnect could also cause same issue.
Fix this by serialising the lease updates with
cifsInodeInfo::open_file_lock. When handling oplock break, make sure
to use the downgraded oplock value rather than one in cifsInodeinfo as
it could be changed concurrently.
Reported-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
It was possible for two query interface works to be concurrently trying
to update the interfaces.
Prevent this by checking and updating iface_last_update under
iface_lock.
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We used to use the cifs_tcp_ses_lock to protect a lot of objects
that are not just the server, ses or tcon lists. We later introduced
srv_lock, ses_lock and tc_lock to protect fields within the
corresponding structs. This was done to provide a more granular
protection and avoid unnecessary serialization.
There were still a couple of uses of cifs_tcp_ses_lock to provide
tcon fields. In this patch, I've replaced them with tc_lock.
Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
On replayable errors, we call smb2_should_replays that does these
things today:
1. decide if we need to replay the command again
2. sleep to back-off the failed request
3. update the next sleep value
We will not be able to use this for async requests, when this is
processed in callbacks (as this will be called in cifsd threads that
should not sleep in response processing).
Modify the behaviour by taking the sleep out of smb2_should_replay
and performing the sleep for back-off just before actually
performing the replay.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
struct copychunk_ioctl_req::ChunkCount is annotated with
__counted_by_le() as the number of elements in Chunks[].
smb2_copychunk_range reuses ChunkCount to store the number of chunks
sent in the current iteration. If a later iteration populates more
chunks than a previous one, the stale smaller value trips UBSAN.
Set ChunkCount to chunk_count (allocated capacity) before populating
Chunks[].
Fixes: cc26f593dc ("smb: move copychunk definitions to common/smb2pdu.h")
Link: https://lore.kernel.org/linux-cifs/CAH2r5ms9AWLy8WZ04Cpq5XOeVK64tcrUQ6__iMW+yk1VPzo1BA@mail.gmail.com
Tested-by: Youling Tang <tangyouling@kylinos.cn>
Acked-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make some preparatory cleanups prior to running a script to organise the
function declarations within the fs/smb/client/ headers. These include:
(1) Remove "inline" from the dummy cifs_proc_init/clean() functions as
they are in a .c file.
(2) Move should_compress()'s kdoc comment to the .c file and remove kdoc
markers from the comments.
(3) Rename CIFS_ALLOW_INSECURE_LEGACY in #endif comments to have CONFIG_
on the front to allow the script to recognise it.
(4) Don't let comments have bare words at the left margin as that confused
the simplistic function detection code in the script.
(5) Adjust some argument lists so that when and if the cleanup script is
run they don't end up over 100 chars.
(6) Fix a few comments to have missing '*' added or the "*/" moved to
their own lines so that checkpatch doesn't moan over the cleanup
script patch.
(7) Move struct cifs_calc_sig_ctx to cifsglob.h.
(8) Remove some __KERNEL__ conditionals.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Add a tracepoint to log EIO errors and give it the capacity to convey up to
two integers of information. This is then wrapped with three functions:
int smb_EIO(enum smb_eio_trace trace)
int smb_EIO1(enum smb_eio_trace trace, unsigned long info)
int smb_EIO2(enum smb_eio_trace trace, unsigned long info,
unsigned long info2)
depending on how many bits of info are desired to be logged with any
particular trace. The functions all return -EIO and can be used in place
of -EIO.
The trace argument is an enum value that gets translated to a string when
the trace is printed.
This makes is easier to log EIO instances when the client is under high
load than turning on a printk wrapper such as cifs_dbg(). Granted, EIO
could have its own separate EIO printing since EIO shouldn't happen.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Remove the server pointer from smb_message and instead pass it down to all
the things that access it.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com> (RDMA, smbdirect)
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Remove the RFC1002 header from struct smb_hdr as used for SMB-1.0. This
simplifies the SMB-1.0 code by simplifying a lot of places that have to add
or subtract 4 to work around the fact that the RFC1002 header isn't really
part of the message and the base for various offsets within the message is
from the base of the smb_hdr, not the RFC1002 header.
Further, clean up a bunch of places that require an extra kvec struct
specifically pointing to the RFC1002 header, such that kvec[0].iov_base
must be exactly 4 bytes before kvec[1].iov_base.
This allows the header preamble size stuff to be removed too.
The size of the request and response message are then handed around either
directly or by summing the size of all the iov_len members in the kvec
array for which we have a count.
Also, this simplifies and cleans up the common transmission and receive
paths for SMB1 and SMB2/3 as there no longer needs to be special handling
casing for SMB1 messages as the RFC1002 header is now generated on the fly
for SMB1 as it is for SMB2/3.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Use netfs_alloc/free_folioq_buffer() rather than doing its own version.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com> (RDMA, smbdirect)
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Rename 2 places:
- resume_key_req -> resume_key_ioctl_rsp
- server: ResumeKey -> ResumeKeyU64
Merge the struct members of the server and the client, then move duplicate
definitions to common header file.
Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Rename 3 places:
- copychunk_ioctl -> copychunk_ioctl_req
- copychunk -> srv_copychunk
- server: ResumeKey -> SourceKeyU64
Merge the struct members of the server and the client, then move duplicate
definitions to common header file.
Co-developed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
When smb2_query_info_compound() retries, a previously allocated cfid may
have been freed in the first attempt.
Because cfid wasn't reset on replay, later cleanup could act on a stale
pointer, leading to a potential use-after-free.
Reinitialize cfid to NULL under the replay label.
Example trace (trimmed):
refcount_t: underflow; use-after-free.
WARNING: CPU: 1 PID: 11224 at ../lib/refcount.c:28 refcount_warn_saturate+0x9c/0x110
[...]
RIP: 0010:refcount_warn_saturate+0x9c/0x110
[...]
Call Trace:
<TASK>
smb2_query_info_compound+0x29c/0x5c0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
? step_into+0x10d/0x690
? __legitimize_path+0x28/0x60
smb2_queryfs+0x6a/0xf0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
smb311_queryfs+0x12d/0x140 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
? kmem_cache_alloc+0x18a/0x340
? getname_flags+0x46/0x1e0
cifs_statfs+0x9f/0x2b0 [cifs f90b72658819bd21c94769b6a652029a07a7172f]
statfs_by_dentry+0x67/0x90
vfs_statfs+0x16/0xd0
user_statfs+0x54/0xa0
__do_sys_statfs+0x20/0x50
do_syscall_64+0x58/0x80
Cc: stable@kernel.org
Fixes: 4f1fffa237 ("cifs: commands that are retried should have replay flag set")
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
As the SMB1 and SMB2/3 calc_signature functions are called from separate
sign and verify paths, just call them directly rather than using a function
pointer. The SMB3 calc_signature then jumps to the SMB2 variant if
necessary.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix three refcount inconsistency issues related to `cifs_sb_tlink`.
Comments for `cifs_sb_tlink` state that `cifs_put_tlink()` needs to be
called after successful calls to `cifs_sb_tlink()`. Three calls fail to
update refcount accordingly, leading to possible resource leaks.
Fixes: 8ceb984379 ("CIFS: Move rename to ops struct")
Fixes: 2f1afe2599 ("cifs: Use smb 2 - 3 and cifsacl mount options getacl functions")
Fixes: 366ed846df ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function")
Cc: stable@vger.kernel.org
Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Steve French <stfrench@microsoft.com>
AIO+DIO may extend the file size, hence we need to make sure ->i_size
is stable across the entire fallocate(2) operation, otherwise it would
become a truncate and then inode size reduced back down when it
finishes.
Fix this by calling netfs_wait_for_outstanding_io() right after
acquiring ->i_rwsem exclusively in cifs_fallocate() and then guarantee
a stable ->i_size across fallocate(2).
Also call netfs_wait_for_outstanding_io() after truncating pagecache
to avoid any potential races with writeback.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Fixes: 210627b0aca9 ("smb: client: fix missing timestamp updates with O_TRUNC")
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
The return value of copy_to_iter() function will never be negative,
it is the number of bytes copied, or zero if nothing was copied.
Update the check to treat 0 as an error, and return -1 in that case.
Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Acked-by: Tom Talpey <tom@talpey.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
smb2_copychunk_range() used to send a single SRV_COPYCHUNK per
SRV_COPYCHUNK_COPY IOCTL.
Implement variable Chunks[] array in struct copychunk_ioctl and fill it
with struct copychunk (MS-SMB2 2.2.31.1.1), bounded by server-advertised
limits.
This reduces the number of IOCTL requests for large copies.
While we are at it, rename a couple variables to follow the terminology
used in the specification.
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
open_cached_dir() will only return a valid cfid, which has both
has_lease = true and time != 0.
Remove the pointless check of cfid->has_lease right after
open_cached_dir() returns no error.
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
The crypto API, through the scatterlist API, expects input buffers to be
in linear memory. We handle this with the cifs_sg_set_buf() helper
that converts vmalloc'd memory to their corresponding pages.
However, when we allocate our aead_request buffer (@creq in
smb2ops.c::crypt_message()), we do so with kvzalloc(), which possibly
puts aead_request->__ctx in vmalloc area.
AEAD algorithm then uses ->__ctx for its private/internal data and
operations, and uses sg_set_buf() for such data on a few places.
This works fine as long as @creq falls into kmalloc zone (small
requests) or vmalloc'd memory is still within linear range.
Tasks' stacks are vmalloc'd by default (CONFIG_VMAP_STACK=y), so too
many tasks will increment the base stacks' addresses to a point where
virt_addr_valid(buf) will fail (BUG() in sg_set_buf()) when that
happens.
In practice: too many parallel reads and writes on an encrypted mount
will trigger this bug.
To fix this, always alloc @creq with kmalloc() instead.
Also drop the @sensitive_size variable/arguments since
kfree_sensitive() doesn't need it.
Backtrace:
[ 945.272081] ------------[ cut here ]------------
[ 945.272774] kernel BUG at include/linux/scatterlist.h:209!
[ 945.273520] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 945.274412] CPU: 7 UID: 0 PID: 56 Comm: kworker/u33:0 Kdump: loaded Not tainted 6.15.0-lku-11779-g8e9d6efccdd7-dirty #1 PREEMPT(voluntary)
[ 945.275736] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
[ 945.276877] Workqueue: writeback wb_workfn (flush-cifs-2)
[ 945.277457] RIP: 0010:crypto_gcm_init_common+0x1f9/0x220
[ 945.278018] Code: b0 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 c7 c0 00 00 00 80 48 2b 05 5c 58 e5 00 e9 58 ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 48 c7 04 24 01 00 00 00 48 8b
[ 945.279992] RSP: 0018:ffffc90000a27360 EFLAGS: 00010246
[ 945.280578] RAX: 0000000000000000 RBX: ffffc90001d85060 RCX: 0000000000000030
[ 945.281376] RDX: 0000000000080000 RSI: 0000000000000000 RDI: ffffc90081d85070
[ 945.282145] RBP: ffffc90001d85010 R08: ffffc90001d85000 R09: 0000000000000000
[ 945.282898] R10: ffffc90001d85090 R11: 0000000000001000 R12: ffffc90001d85070
[ 945.283656] R13: ffff888113522948 R14: ffffc90001d85060 R15: ffffc90001d85010
[ 945.284407] FS: 0000000000000000(0000) GS:ffff8882e66cf000(0000) knlGS:0000000000000000
[ 945.285262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 945.285884] CR2: 00007fa7ffdd31f4 CR3: 000000010540d000 CR4: 0000000000350ef0
[ 945.286683] Call Trace:
[ 945.286952] <TASK>
[ 945.287184] ? crypt_message+0x33f/0xad0 [cifs]
[ 945.287719] crypto_gcm_encrypt+0x36/0xe0
[ 945.288152] crypt_message+0x54a/0xad0 [cifs]
[ 945.288724] smb3_init_transform_rq+0x277/0x300 [cifs]
[ 945.289300] smb_send_rqst+0xa3/0x160 [cifs]
[ 945.289944] cifs_call_async+0x178/0x340 [cifs]
[ 945.290514] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[ 945.291177] smb2_async_writev+0x3e3/0x670 [cifs]
[ 945.291759] ? find_held_lock+0x32/0x90
[ 945.292212] ? netfs_advance_write+0xf2/0x310
[ 945.292723] netfs_advance_write+0xf2/0x310
[ 945.293210] netfs_write_folio+0x346/0xcc0
[ 945.293689] ? __pfx__raw_spin_unlock_irq+0x10/0x10
[ 945.294250] netfs_writepages+0x117/0x460
[ 945.294724] do_writepages+0xbe/0x170
[ 945.295152] ? find_held_lock+0x32/0x90
[ 945.295600] ? kvm_sched_clock_read+0x11/0x20
[ 945.296103] __writeback_single_inode+0x56/0x4b0
[ 945.296643] writeback_sb_inodes+0x229/0x550
[ 945.297140] __writeback_inodes_wb+0x4c/0xe0
[ 945.297642] wb_writeback+0x2f1/0x3f0
[ 945.298069] wb_workfn+0x300/0x490
[ 945.298472] process_one_work+0x1fe/0x590
[ 945.298949] worker_thread+0x1ce/0x3c0
[ 945.299397] ? __pfx_worker_thread+0x10/0x10
[ 945.299900] kthread+0x119/0x210
[ 945.300285] ? __pfx_kthread+0x10/0x10
[ 945.300729] ret_from_fork+0x119/0x1b0
[ 945.301163] ? __pfx_kthread+0x10/0x10
[ 945.301601] ret_from_fork_asm+0x1a/0x30
[ 945.302055] </TASK>
Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
In future struct smbdirect_socket_parameters will be the only
public structure for the smb layer. This prepares this...
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Rename of open files in SMB2+ has been broken for a very long time,
resulting in data loss as the CIFS client would fail the rename(2)
call with -ENOENT and then removing the target file.
Fix this by implementing ->rename_pending_delete() for SMB2+, which
will rename busy files to random filenames (e.g. silly rename) during
unlink(2) or rename(2), and then marking them to delete-on-close.
Besides, introduce a FIND_WR_NO_PENDING_DELETE flag to prevent open(2)
from reusing open handles that had been marked as delete pending.
Handle it in cifs_get_readable_path() as well.
Reported-by: Jean-Baptiste Denis <jbdenis@pasteur.fr>
Closes: https://marc.info/?i=16aeb380-30d4-4551-9134-4e7d1dc833c0@pasteur.fr
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Cc: Frank Sorenson <sorenson@redhat.com>
Cc: Olga Kornievskaia <okorniev@redhat.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Scott Mayhew <smayhew@redhat.com>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
The encryption layer can't handle the padding iovs, so flatten the
compound request into a single buffer with required padding to prevent
the server from dropping the connection when finding unaligned
compound requests.
Fixes: bc925c1216 ("smb: client: improve compound padding in encryption")
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: linux-cifs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix smb3_init_transform_rq() to initialise buffer to NULL before calling
netfs_alloc_folioq_buffer() as netfs assumes it can append to the buffer it
is given. Setting it to NULL means it should start a fresh buffer, but the
value is currently undefined.
Fixes: a2906d3316 ("cifs: Switch crypto buffer to use a folio_queue rather than an xarray")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
This is step 4/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.
In compound_send_recv(), when wait_for_response() is interrupted by
signals, the code attempts to cancel pending requests by changing
their callbacks to cifs_cancelled_callback. However, there's a race
condition between signal interruption and network response processing
that causes both mid_q_entry and server buffer leaks:
```
User foreground process cifsd
cifs_readdir
open_cached_dir
cifs_send_recv
compound_send_recv
smb2_setup_request
smb2_mid_entry_alloc
smb2_get_mid_entry
smb2_mid_entry_alloc
mempool_alloc // alloc mid
kref_init(&temp->refcount); // refcount = 1
mid[0]->callback = cifs_compound_callback;
mid[1]->callback = cifs_compound_last_callback;
smb_send_rqst
rc = wait_for_response
wait_event_state TASK_KILLABLE
cifs_demultiplex_thread
allocate_buffers
server->bigbuf = cifs_buf_get()
standard_receive3
->find_mid()
smb2_find_mid
__smb2_find_mid
kref_get(&mid->refcount) // +1
cifs_handle_standard
handle_mid
/* bigbuf will also leak */
mid->resp_buf = server->bigbuf
server->bigbuf = NULL;
dequeue_mid
/* in for loop */
mids[0]->callback
cifs_compound_callback
/* Signal interrupts wait: rc = -ERESTARTSYS */
/* if (... || midQ[i]->mid_state == MID_RESPONSE_RECEIVED) *?
midQ[0]->callback = cifs_cancelled_callback;
cancelled_mid[i] = true;
/* The change comes too late */
mid->mid_state = MID_RESPONSE_READY
release_mid // -1
/* cancelled_mid[i] == true causes mid won't be released
in compound_send_recv cleanup */
/* cifs_cancelled_callback won't executed to release mid */
```
The root cause is that there's a race between callback assignment and
execution.
Fix this by introducing per-mid locking:
- Add spinlock_t mid_lock to struct mid_q_entry
- Add mid_execute_callback() for atomic callback execution
- Use mid_lock in cancellation paths to ensure atomicity
This ensures that either the original callback or the cancellation
callback executes atomically, preventing reference count leaks when
requests are interrupted by signals.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220404
Fixes: ee258d7915 ("CIFS: Move credit processing to mid callbacks for SMB3")
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is step 3/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.
Replace the mid_flags bitmask with dedicated boolean fields to
simplify locking logic and improve code readability:
- Replace MID_DELETED with bool deleted_from_q
- Replace MID_WAIT_CANCELLED with bool wait_cancelled
- Remove mid_flags field entirely
The new boolean fields have clearer semantics:
- deleted_from_q: whether mid has been removed from pending_mid_q
- wait_cancelled: whether request was cancelled during wait
This change reduces memory usage (from 4-byte bitmask to 2 boolean
flags) and eliminates confusion about which lock protects which
flag bits, preparing for per-mid locking in the next patch.
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is step 2/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.
Add a dedicated mid_counter_lock to protect current_mid counter,
separating it from mid_queue_lock which protects pending_mid_q
operations. This reduces lock contention and prepares for finer-
grained locking in subsequent patches.
Changes:
- Add TCP_Server_Info->mid_counter_lock spinlock
- Rename CurrentMid to current_mid for consistency
- Use mid_counter_lock to protect current_mid access
- Update locking documentation in cifsglob.h
This separation allows mid allocation to proceed without blocking
queue operations, improving performance under heavy load.
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is step 1/4 of a patch series to fix mid_q_entry memory leaks
caused by race conditions in callback execution.
The current mid_lock name is somewhat ambiguous about what it protects.
To prepare for splitting this lock into separate, more granular locks,
this patch renames mid_lock to mid_queue_lock to clearly indicate its
specific responsibility for protecting the pending_mid_q list and
related queue operations.
No functional changes are made in this patch - it only prepares the
codebase for the lock splitting that follows.
- mid_queue_lock for queue operations
- mid_counter_lock for mid counter operations
- per-mid locks for individual mid state management
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Acked-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
SMB3.1.1 POSIX mounts support native symlinks that are created with
IO_REPARSE_TAG_SYMLINK reparse points, so skip the checking of
FILE_SUPPORTS_REPARSE_POINTS as some servers might not have it set.
Cc: linux-cifs@vger.kernel.org
Cc: Ralph Boehme <slow@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson@ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
SMB1 already supports querying reparse points and detecting types of
symlink, fifo, socket, block and char.
This change implements the missing part - ability to create a new reparse
points over SMB1. This includes everything which SMB2+ already supports:
- native SMB symlinks and sockets
- NFS style of special files (symlinks, fifos, sockets, char/block devs)
- WSL style of special files (symlinks, fifos, sockets, char/block devs)
Attaching a reparse point to an existing file or directory is done via
SMB1 SMB_COM_NT_TRANSACT/NT_TRANSACT_IOCTL/FSCTL_SET_REPARSE_POINT command
and implemented in a new cifs_create_reparse_inode() function.
This change introduce a new callback ->create_reparse_inode() which creates
a new reperse point file or directory and returns inode. For SMB1 it is
provided via that new cifs_create_reparse_inode() function.
Existing reparse.c code was only slightly updated to call new protocol
callback ->create_reparse_inode() instead of hardcoded SMB2+ function.
This make the whole reparse.c code to work with every SMB dialect.
The original callback ->create_reparse_symlink() is not needed anymore as
the implementation of new create_reparse_symlink() function is dialect
agnostic too. So the link.c code was updated to call that function directly
(and not via callback).
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>