syzbot reported a kernel BUG triggered from pn_socket_sendmsg() via
pn_socket_autobind():
kernel BUG at net/phonet/socket.c:213!
RIP: 0010:pn_socket_autobind net/phonet/socket.c:213 [inline]
RIP: 0010:pn_socket_sendmsg+0x240/0x250 net/phonet/socket.c:421
Call Trace:
sock_sendmsg_nosec+0x112/0x150 net/socket.c:797
__sock_sendmsg net/socket.c:812 [inline]
__sys_sendto+0x402/0x590 net/socket.c:2280
...
pn_socket_autobind() calls pn_socket_bind() with port 0 and, on
-EINVAL, assumes the socket was already bound and asserts that the
port is non-zero:
err = pn_socket_bind(sock, ..., sizeof(struct sockaddr_pn));
if (err != -EINVAL)
return err;
BUG_ON(!pn_port(pn_sk(sock->sk)->sobject));
return 0; /* socket was already bound */
However pn_socket_bind() also returns -EINVAL when sk->sk_state is not
TCP_CLOSE, even when the socket has never been bound and pn_port() is
still 0. In that case the BUG_ON() fires and panics the kernel from a
user-triggerable path.
Treat the "bind returned -EINVAL but pn_port() is still 0" case as a
regular error and propagate -EINVAL to the caller instead of crashing.
Existing callers already translate a non-zero return from
pn_socket_autobind() into -ENOBUFS/-EAGAIN, so returning -EINVAL here
only changes behaviour from panic to a normal errno.
Fixes: ba113a94b7 ("Phonet: common socket glue")
Reported-by: syzbot+706f5eb79044e686c794@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=706f5eb79044e686c794
Suggested-by: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Morduan Zang <zhangdandan@uniontech.com>
Signed-off-by: zhanjun <zhanjun@uniontech.com>
Acked-by: Rémi Denis-Courmont <remi@remlab.net>
Link: https://patch.msgid.link/87A8960A2045AF3C+20260423010557.138124-1-zhangdandan@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core & protocols
----------------
- Support HW queue leasing, allowing containers to be granted access
to HW queues for zero-copy operations and AF_XDP.
- Number of code moves to help the compiler with inlining.
Avoid output arguments for returning drop reason where possible.
- Rework drop handling within qdiscs to include more metadata
about the reason and dropping qdisc in the tracepoints.
- Remove the rtnl_lock use from IP Multicast Routing.
- Pack size information into the Rx Flow Steering table pointer
itself. This allows making the table itself a flat array of u32s,
thus making the table allocation size a power of two.
- Report TCP delayed ack timer information via socket diag.
- Add ip_local_port_step_width sysctl to allow distributing the randomly
selected ports more evenly throughout the allowed space.
- Add support for per-route tunsrc in IPv6 segment routing.
- Start work of switching sockopt handling to iov_iter.
- Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid
buffer size drifting up.
- Support MSG_EOR in MPTCP.
- Add stp_mode attribute to the bridge driver for STP mode selection.
This addresses concerns about call_usermodehelper() usage.
- Remove UDP-Lite support (as announced in 2023).
- Remove support for building IPv6 as a module.
Remove the now unnecessary function calling indirection.
Cross-tree stuff
----------------
- Move Michael MIC code from generic crypto into wireless,
it's considered insecure but some WiFi networks still need it.
Netfilter
---------
- Switch nft_fib_ipv6 module to no longer need temporary dst_entry
object allocations by using fib6_lookup() + RCU.
Florian W reports this gets us ~13% higher packet rate.
- Convert IPVS's global __ip_vs_mutex to per-net service_mutex and
switch the service tables to be per-net. Convert some code that
walks the service lists to use RCU instead of the service_mutex.
- Add more opinionated input validation to lower security exposure.
- Make IPVS hash tables to be per-netns and resizable.
Wireless
--------
- Finished assoc frame encryption/EPPKE/802.1X-over-auth.
- Radar detection improvements.
- Add 6 GHz incumbent signal detection APIs.
- Multi-link support for FILS, probe response templates and
client probing.
- New APIs and mac80211 support for NAN (Neighbor Aware Networking,
aka Wi-Fi Aware) so less work must be in firmware.
Driver API
----------
- Add numerical ID for devlink instances (to avoid having to create
fake bus/device pairs just to have an ID). Support shared devlink
instances which span multiple PFs.
- Add standard counters for reporting pause storm events
(implement in mlx5 and fbnic).
- Add configuration API for completion writeback buffering
(implement in mana).
- Support driver-initiated change of RSS context sizes.
- Support DPLL monitoring input frequency (implement in zl3073x).
- Support per-port resources in devlink (implement in mlx5).
Misc
----
- Expand the YAML spec for Netfilter.
Drivers
-------
- Software:
- macvlan: support multicast rx for bridge ports with shared source
MAC address
- team: decouple receive and transmit enablement for IEEE 802.3ad
LACP "independent control"
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- support high order pages in zero-copy mode (for payload
coalescing)
- support multiple packets in a page (for systems with 64kB pages)
- Broadcom 25-400GE (bnxt):
- implement XDP RSS hash metadata extraction
- add software fallback for UDP GSO, lowering the IOMMU cost
- Broadcom 800GE (bnge):
- add link status and configuration handling
- add various HW and SW statistics
- Marvell/Cavium:
- NPC HW block support for cn20k
- Huawei (hinic3):
- add mailbox / control queue
- add rx VLAN offload
- add driver info and link management
- Ethernet NICs:
- Marvell/Aquantia:
- support reading SFP module info on some AQC100 cards
- Realtek PCI (r8169):
- add support for RTL8125cp
- Realtek USB (r8152):
- support for the RTL8157 5Gbit chip
- add 2500baseT EEE status/configuration support
- Ethernet NICs embedded and off-the-shelf IP:
- Synopsys (stmmac):
- cleanup and reorganize SerDes handling and PCS support
- cleanup descriptor handling and per-platform data
- cleanup and consolidate MDIO defines and handling
- shrink driver memory use for internal structures
- improve Tx IRQ coalescing
- improve TCP segmentation handling
- add support for Spacemit K3
- Cadence (macb):
- support PHYs that have inband autoneg disabled with GEM
- support IEEE 802.3az EEE
- rework usrio capabilities and handling
- AMD (xgbe):
- improve power management for S0i3
- improve TX resilience for link-down handling
- Virtual:
- Google cloud vNIC:
- support larger ring sizes in DQO-QPL mode
- improve HW-GRO handling
- support UDP GSO for DQO format
- PCIe NTB:
- support queue count configuration
- Ethernet PHYs:
- automatically disable PHY autonomous EEE if MAC is in charge
- Broadcom:
- add BCM84891/BCM84892 support
- Micrel:
- support for LAN9645X internal PHY
- Realtek:
- add RTL8224 pair order support
- support PHY LEDs on RTL8211F-VD
- support spread spectrum clocking (SSC)
- Maxlinear:
- add PHY-level statistics via ethtool
- Ethernet switches:
- Maxlinear (mxl862xx):
- support for bridge offloading
- support for VLANs
- support driver statistics
- Bluetooth:
- large number of fixes and new device IDs
- Mediatek:
- support MT6639 (MT7927)
- support MT7902 SDIO
- WiFi:
- Intel (iwlwifi):
- UNII-9 and continuing UHR work
- MediaTek (mt76):
- mt7996/mt7925 MLO fixes/improvements
- mt7996 NPU support (HW eth/wifi traffic offload)
- Qualcomm (ath12k):
- monitor mode support on IPQ5332
- basic hwmon temperature reporting
- support IPQ5424
- Realtek:
- add USB RX aggregation to improve performance
- add USB TX flow control by tracking in-flight URBs
- Cellular:
- IPA v5.2 support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmnelNoACgkQMUZtbf5S
IrtWFw//WyiXuEiGawVQONnbu1dtR+3nw/cvNpSYi0IM66vbRUB9n+9fxm2MIyG4
4jI/c/X/fxIvUxEqGez3yPn5P7KqkQR8WRYwkxrMYKRpXeukN0IDk5Euew5DskCe
wtBKNJOQWKdKXff0bLQoJ9dHWYuJ2IMRVil5M3fhUbeUOXeyJD7Yn1w2ICvJAbj+
T/Hw7sEtchNaHp6h6SbaQfahkUFHQG5peNoETkZF4UDF6ALGY29WH91GXeO2lrgN
IxX203KtaavV0oU8T0oixZgOc57Ns081YfFL/F1JP2HV6lgkwhuq+zxCrRTi1c9M
HPTXgwD7Z80Y74nM3YTLrPfoMOP8GLBZgdV3rUpwmteM26+gMTm+O1zHUur5ZoGy
D6TaMFguPTIqiRyrARa9xY/J6r9TQkc2Wfu4bIuPndKFg8xPoepuEObODnh0+5Hg
4j4pdFhIo2huENhSg7kVb/yl+1q68SFwM3RqTmx+OhCa0AyjcKIKgt/UBhismdnG
r8obxzb+nXeJc2rRDuwNMwlBlcMSbep27uGt64zeHMMXVhTVqOoytNaL/X/ZpH2m
A0DscUrpHvb36IoDPtanc6irP+JOh5Xe7Nw5qhkgwsMc7hlf8SyyHB4OUBBaz1qA
ETSnHlfwklRmXSpWqH2LyGXjdOQpDKP46+h0W3dttMD2/cRBqYo=
=EhQZ
-----END PGP SIGNATURE-----
Merge tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Support HW queue leasing, allowing containers to be granted access
to HW queues for zero-copy operations and AF_XDP
- Number of code moves to help the compiler with inlining. Avoid
output arguments for returning drop reason where possible
- Rework drop handling within qdiscs to include more metadata about
the reason and dropping qdisc in the tracepoints
- Remove the rtnl_lock use from IP Multicast Routing
- Pack size information into the Rx Flow Steering table pointer
itself. This allows making the table itself a flat array of u32s,
thus making the table allocation size a power of two
- Report TCP delayed ack timer information via socket diag
- Add ip_local_port_step_width sysctl to allow distributing the
randomly selected ports more evenly throughout the allowed space
- Add support for per-route tunsrc in IPv6 segment routing
- Start work of switching sockopt handling to iov_iter
- Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid
buffer size drifting up
- Support MSG_EOR in MPTCP
- Add stp_mode attribute to the bridge driver for STP mode selection.
This addresses concerns about call_usermodehelper() usage
- Remove UDP-Lite support (as announced in 2023)
- Remove support for building IPv6 as a module. Remove the now
unnecessary function calling indirection
Cross-tree stuff:
- Move Michael MIC code from generic crypto into wireless, it's
considered insecure but some WiFi networks still need it
Netfilter:
- Switch nft_fib_ipv6 module to no longer need temporary dst_entry
object allocations by using fib6_lookup() + RCU.
Florian W reports this gets us ~13% higher packet rate
- Convert IPVS's global __ip_vs_mutex to per-net service_mutex and
switch the service tables to be per-net. Convert some code that
walks the service lists to use RCU instead of the service_mutex
- Add more opinionated input validation to lower security exposure
- Make IPVS hash tables to be per-netns and resizable
Wireless:
- Finished assoc frame encryption/EPPKE/802.1X-over-auth
- Radar detection improvements
- Add 6 GHz incumbent signal detection APIs
- Multi-link support for FILS, probe response templates and client
probing
- New APIs and mac80211 support for NAN (Neighbor Aware Networking,
aka Wi-Fi Aware) so less work must be in firmware
Driver API:
- Add numerical ID for devlink instances (to avoid having to create
fake bus/device pairs just to have an ID). Support shared devlink
instances which span multiple PFs
- Add standard counters for reporting pause storm events (implement
in mlx5 and fbnic)
- Add configuration API for completion writeback buffering (implement
in mana)
- Support driver-initiated change of RSS context sizes
- Support DPLL monitoring input frequency (implement in zl3073x)
- Support per-port resources in devlink (implement in mlx5)
Misc:
- Expand the YAML spec for Netfilter
Drivers
- Software:
- macvlan: support multicast rx for bridge ports with shared
source MAC address
- team: decouple receive and transmit enablement for IEEE 802.3ad
LACP "independent control"
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- support high order pages in zero-copy mode (for payload
coalescing)
- support multiple packets in a page (for systems with 64kB
pages)
- Broadcom 25-400GE (bnxt):
- implement XDP RSS hash metadata extraction
- add software fallback for UDP GSO, lowering the IOMMU cost
- Broadcom 800GE (bnge):
- add link status and configuration handling
- add various HW and SW statistics
- Marvell/Cavium:
- NPC HW block support for cn20k
- Huawei (hinic3):
- add mailbox / control queue
- add rx VLAN offload
- add driver info and link management
- Ethernet NICs:
- Marvell/Aquantia:
- support reading SFP module info on some AQC100 cards
- Realtek PCI (r8169):
- add support for RTL8125cp
- Realtek USB (r8152):
- support for the RTL8157 5Gbit chip
- add 2500baseT EEE status/configuration support
- Ethernet NICs embedded and off-the-shelf IP:
- Synopsys (stmmac):
- cleanup and reorganize SerDes handling and PCS support
- cleanup descriptor handling and per-platform data
- cleanup and consolidate MDIO defines and handling
- shrink driver memory use for internal structures
- improve Tx IRQ coalescing
- improve TCP segmentation handling
- add support for Spacemit K3
- Cadence (macb):
- support PHYs that have inband autoneg disabled with GEM
- support IEEE 802.3az EEE
- rework usrio capabilities and handling
- AMD (xgbe):
- improve power management for S0i3
- improve TX resilience for link-down handling
- Virtual:
- Google cloud vNIC:
- support larger ring sizes in DQO-QPL mode
- improve HW-GRO handling
- support UDP GSO for DQO format
- PCIe NTB:
- support queue count configuration
- Ethernet PHYs:
- automatically disable PHY autonomous EEE if MAC is in charge
- Broadcom:
- add BCM84891/BCM84892 support
- Micrel:
- support for LAN9645X internal PHY
- Realtek:
- add RTL8224 pair order support
- support PHY LEDs on RTL8211F-VD
- support spread spectrum clocking (SSC)
- Maxlinear:
- add PHY-level statistics via ethtool
- Ethernet switches:
- Maxlinear (mxl862xx):
- support for bridge offloading
- support for VLANs
- support driver statistics
- Bluetooth:
- large number of fixes and new device IDs
- Mediatek:
- support MT6639 (MT7927)
- support MT7902 SDIO
- WiFi:
- Intel (iwlwifi):
- UNII-9 and continuing UHR work
- MediaTek (mt76):
- mt7996/mt7925 MLO fixes/improvements
- mt7996 NPU support (HW eth/wifi traffic offload)
- Qualcomm (ath12k):
- monitor mode support on IPQ5332
- basic hwmon temperature reporting
- support IPQ5424
- Realtek:
- add USB RX aggregation to improve performance
- add USB TX flow control by tracking in-flight URBs
- Cellular:
- IPA v5.2 support"
* tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits)
net: pse-pd: fix kernel-doc function name for pse_control_find_by_id()
wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit
wireguard: allowedips: remove redundant space
tools: ynl: add sample for wireguard
wireguard: allowedips: Use kfree_rcu() instead of call_rcu()
MAINTAINERS: Add netkit selftest files
selftests/net: Add additional test coverage in nk_qlease
selftests/net: Split netdevsim tests from HW tests in nk_qlease
tools/ynl: Make YnlFamily closeable as a context manager
net: airoha: Add missing PPE configurations in airoha_ppe_hw_init()
net: airoha: Fix VIP configuration for AN7583 SoC
net: caif: clear client service pointer on teardown
net: strparser: fix skb_head leak in strp_abort_strp()
net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete()
selftests/bpf: add test for xdp_master_redirect with bond not up
net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master
net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration
sctp: disable BH before calling udp_tunnel_xmit_skb()
sctp: fix missing encap_port propagation for GSO fragments
net: airoha: Rely on net_device pointer in ETS callbacks
...
Please consider pulling these changes from the signed vfs-7.1-rc1.kino tag.
Thanks!
Christian
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc
otmnAP4sbsxZQdz2TG2hJuOwnEZOkkxZQOUMc3ERVyZaWXIeTAEA7e5M+8FpoG9n
8ipO76UoaXdGLESrqVdp9EOhLqOW7QY=
=uMeJ
-----END PGP SIGNATURE-----
Merge tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs i_ino updates from Christian Brauner:
"For historical reasons, the inode->i_ino field is an unsigned long,
which means that it's 32 bits on 32 bit architectures. This has caused
a number of filesystems to implement hacks to hash a 64-bit identifier
into a 32-bit field, and deprives us of a universal identifier field
for an inode.
This changes the inode->i_ino field from an unsigned long to a u64.
This shouldn't make any material difference on 64-bit hosts, but
32-bit hosts will see struct inode grow by at least 4 bytes. This
could have effects on slabcache sizes and field alignment.
The bulk of the changes are to format strings and tracepoints, since
the kernel itself doesn't care that much about the i_ino field. The
first patch changes some vfs function arguments, so check that one out
carefully.
With this change, we may be able to shrink some inode structures. For
instance, struct nfs_inode has a fileid field that holds the 64-bit
inode number. With this set of changes, that field could be
eliminated. I'd rather leave that sort of cleanups for later just to
keep this simple"
* tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group()
EVM: add comment describing why ino field is still unsigned long
vfs: remove externs from fs.h on functions modified by i_ino widening
treewide: fix missed i_ino format specifier conversions
ext4: fix signed format specifier in ext4_load_inode trace event
treewide: change inode->i_ino from unsigned long to u64
nilfs2: widen trace event i_ino fields to u64
f2fs: widen trace event i_ino fields to u64
ext4: widen trace event i_ino fields to u64
zonefs: widen trace event i_ino fields to u64
hugetlbfs: widen trace event i_ino fields to u64
ext2: widen trace event i_ino fields to u64
cachefiles: widen trace event i_ino fields to u64
vfs: widen trace event i_ino fields to u64
net: change sock.sk_ino and sock_i_ino() to u64
audit: widen ino fields to u64
vfs: widen inode hash/lookup functions to u64
bond_header_parse() can loop if a stack of two bonding devices is setup,
because skb->dev always points to the hierarchy top.
Add new "const struct net_device *dev" parameter to
(struct header_ops)->parse() method to make sure the recursion
is bounded, and that the final leaf parse method is called.
Fixes: 950803f725 ("bonding: fix type confusion in bond_setup_by_slave()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Tested-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Link: https://patch.msgid.link/20260315104152.1436867-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
inode->i_ino is being converted to a u64. sock.sk_ino (which caches the
inode number) must also be widened to avoid truncation on 32-bit
architectures where unsigned long is only 32 bits.
Change sk_ino from unsigned long to u64, and update the return type
of sock_i_ino() to match. Fix all format strings that print the
result of sock_i_ino() (%lu -> %llu), and widen the intermediate
variables and function parameters in the diag modules that were
using int to hold the inode number.
Note that the UAPI socket diag structures (inet_diag_msg.idiag_inode,
unix_diag_msg.udiag_ino, etc.) are all __u32 and cannot be changed
without breaking the ABI. The assignments to those fields will
silently truncate, which is the existing behavior.
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260304-iino-u64-v3-3-2257ad83d372@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
Convert struct proto pre_connect(), connect(), bind(), and bind_add()
callback function prototypes from struct sockaddr to struct sockaddr_unsized.
This does not change per-implementation use of sockaddr for passing around
an arbitrarily sized sockaddr struct. Those will be addressed in future
patches.
Additionally removes the no longer referenced struct sockaddr from
include/net/inet_common.h.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-5-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update all struct proto_ops connect() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Update all struct proto_ops bind() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.
No binary changes expected.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We want to split sk->sk_drops in the future to reduce
potential contention on this field.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
A new warning in clang [1] points out a place in pep_sock_accept() where
dst is uninitialized then passed as a const pointer to pep_find_pipe():
net/phonet/pep.c:829:37: error: variable 'dst' is uninitialized when passed as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
829 | newsk = pep_find_pipe(&pn->hlist, &dst, pipe_handle);
| ^~~:
Move the call to pn_skb_get_dst_sockaddr(), which initializes dst, to
before the call to pep_find_pipe(), so that dst is consistently used
initialized throughout the function.
Cc: stable@vger.kernel.org
Fixes: f7ae8d59f6 ("Phonet: allocate sock from accept syscall rather than soft IRQ")
Link: 00dacf8c22 [1]
Closes: https://github.com/ClangBuiltLinux/linux/issues/2101
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20250715-net-phonet-fix-uninit-const-pointer-v1-1-8efd1bd188b3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Difference between sock_i_uid() and sk_uid() is that
after sock_orphan(), sock_i_uid() returns GLOBAL_ROOT_UID
while sk_uid() returns the last cached sk->sk_uid value.
None of sock_i_uid() callers care about this.
Use sk_uid() which is much faster and inlined.
Note that diag/dump users are calling sock_i_ino() and
can not see the full benefit yet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://patch.msgid.link/20250620133001.4090592-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now only __dev_get_by_index() depends on RTNL in route_doit().
Let's use dev_get_by_index_rcu() and register route_doit() with
RTNL_FLAG_DOIT_UNLOCKED.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
route_doit() calls phonet_route_add() or phonet_route_del()
for RTM_NEWROUTE or RTM_DELROUTE, respectively.
Both functions only touch phonet_pernet(dev_net(dev))->routes,
which is currently protected by RTNL and its dedicated mutex,
phonet_routes.lock.
We will convert route_doit() to RCU and cannot use mutex inside RCU.
Let's convert the mutex to spinlock_t.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, rtm_phonet_notify() fetches netns and ifindex from dev.
Once route_doit() is converted to RCU, rtm_phonet_notify() will be
called outside of RCU due to GFP_KERNEL, and dev will be unavailable
there.
Let's pass net and ifindex to rtm_phonet_notify().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We will convert route_doit() to RCU.
route_doit() will call rtm_phonet_notify() outside of RCU due
to GFP_KERNEL, so dev will not be available in fill_route().
Let's pass ifindex directly to fill_route().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
getaddr_dumpit() already relies on RCU and does not need RTNL.
Let's use READ_ONCE() for ifindex and register getaddr_dumpit()
with RTNL_FLAG_DUMP_UNLOCKED.
While at it, the retval of getaddr_dumpit() is changed to combine
NLMSG_DONE and save recvmsg() as done in 58a4ff5d77 ("phonet: no
longer hold RTNL in route_dumpit()").
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Now only __dev_get_by_index() depends on RTNL in addr_doit().
Let's use dev_get_by_index_rcu() and register addr_doit() with
RTNL_FLAG_DOIT_UNLOCKED.
While at it, I changed phonet_rtnl_msg_handlers[]'s init to C99
style like other core networking code.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
addr_doit() calls phonet_address_add() or phonet_address_del()
for RTM_NEWADDR or RTM_DELADDR, respectively.
Both functions only touch phonet_device_list(dev_net(dev)),
which is currently protected by RTNL and its dedicated mutex,
phonet_device_list.lock.
We will convert addr_doit() to RCU and cannot use mutex inside RCU.
Let's convert the mutex to spinlock_t.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, phonet_address_notify() fetches netns and ifindex from dev.
Once addr_doit() is converted to RCU, phonet_address_notify() will be
called outside of RCU due to GFP_KERNEL, and dev will be unavailable
there.
Let's pass net and ifindex to phonet_address_notify().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
We will convert addr_doit() and getaddr_dumpit() to RCU, both
of which call fill_addr().
The former will call phonet_address_notify() outside of RCU
due to GFP_KERNEL, so dev will not be available in fill_addr().
Let's pass ifindex directly to fill_addr().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Before commit addf9b90de ("net: rtnetlink: use rcu to free rtnl
message handlers"), once the first rtnl_register_module() allocated
rtnl_msg_handlers[PF_PHONET], the following calls never failed.
However, after the commit, rtnl_register_module() could fail silently
to allocate rtnl_msg_handlers[PF_PHONET][msgtype] and requires error
handling for each call.
Handling the error allows users to view a module as an all-or-nothing
thing in terms of the rtnetlink functionality. This prevents syzkaller
from reporting spurious errors from its tests, where OOM often occurs
and module is automatically loaded.
Let's use rtnl_register_many() to handle the errors easily.
Fixes: addf9b90de ("net: rtnetlink: use rcu to free rtnl message handlers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Rémi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
Rather than pass in flags, error pointer, and whether this is a kernel
invocation or not, add a struct proto_accept_arg struct as the argument.
This then holds all of these arguments, and prepares accept for being
able to pass back more information.
No functional changes in this patch.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
35d92abfba ("net: hns3: fix kernel crash when devlink reload during initialization")
2a1a1a7b5f ("net: hns3: add command queue trace for hns3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
route_dumpit() already relies on RCU, RTNL is not needed.
Also change return value at the end of a dump.
This allows NLMSG_DONE to be appended to the current
skb at the end of a dump, saving a couple of recvmsg()
system calls.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Remi Denis-Courmont <courmisch@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240507121748.416287-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
To avoid lots of small commits, this commit brings together network
changes from (as they appear in MAINTAINERS) LLC, MPTCP, NETROM NETWORK
LAYER, PHONET PROTOCOL, ROSE NETWORK LAYER, RXRPC SOCKETS, SCTP
PROTOCOL, SHARED MEMORY COMMUNICATIONS (SMC), TIPC NETWORK LAYER and
NETWORKING [IPSEC]
* Remove sentinel element from ctl_table structs.
* Replace empty array registration with the register_net_sysctl_sz call
in llc_sysctl_init
* Replace the for loop stop condition that tests for procname == NULL
with one that depends on array size in sctp_sysctl_net_register
* Remove instances where an array element is zeroed out to make it look
like a sentinel in xfrm_sysctl_init. This is not longer needed and is
safe after commit c899710fe7 ("networking: Update to
register_net_sysctl_sz") added the array size to the ctl_table
registration
* Use a table_size variable to keep the value of ARRAY_SIZE
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The receive queues are protected by their respective spin-lock, not
the socket lock. This could lead to skb_peek() unexpectedly
returning NULL or a pointer to an already dequeued socket buffer.
Fixes: 9641458d3e ("Phonet: Pipe End Point for Phonet Pipes protocol")
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Link: https://lore.kernel.org/r/20240218081214.4806-2-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The receive queue is protected by its embedded spin-lock, not the
socket lock, so we need the former lock here (and only that one).
Fixes: 107d0d9b8d ("Phonet: Phonet datagram transport protocol")
Reported-by: Luosili <rootlab@huawei.com>
Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240218081214.4806-1-remi@remlab.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Most of the ioctls to net protocols operates directly on userspace
argument (arg). Usually doing get_user()/put_user() directly in the
ioctl callback. This is not flexible, because it is hard to reuse these
functions without passing userspace buffers.
Change the "struct proto" ioctls to avoid touching userspace memory and
operate on kernel buffers, i.e., all protocol's ioctl callbacks is
adapted to operate on a kernel memory other than on userspace (so, no
more {put,get}_user() and friends being called in the ioctl callback).
This changes the "struct proto" ioctl format in the following way:
int (*ioctl)(struct sock *sk, int cmd,
- unsigned long arg);
+ int *karg);
(Important to say that this patch does not touch the "struct proto_ops"
protocols)
So, the "karg" argument, which is passed to the ioctl callback, is a
pointer allocated to kernel space memory (inside a function wrapper).
This buffer (karg) may contain input argument (copied from userspace in
a prep function) and it might return a value/buffer, which is copied
back to userspace if necessary. There is not one-size-fits-all format
(that is I am using 'may' above), but basically, there are three type of
ioctls:
1) Do not read from userspace, returns a result to userspace
2) Read an input parameter from userspace, and does not return anything
to userspace
3) Read an input from userspace, and return a buffer to userspace.
The default case (1) (where no input parameter is given, and an "int" is
returned to userspace) encompasses more than 90% of the cases, but there
are two other exceptions. Here is a list of exceptions:
* Protocol RAW:
* cmd = SIOCGETVIFCNT:
* input and output = struct sioc_vif_req
* cmd = SIOCGETSGCNT
* input and output = struct sioc_sg_req
* Explanation: for the SIOCGETVIFCNT case, userspace passes the input
argument, which is struct sioc_vif_req. Then the callback populates
the struct, which is copied back to userspace.
* Protocol RAW6:
* cmd = SIOCGETMIFCNT_IN6
* input and output = struct sioc_mif_req6
* cmd = SIOCGETSGCNT_IN6
* input and output = struct sioc_sg_req6
* Protocol PHONET:
* cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
* input int (4 bytes)
* Nothing is copied back to userspace.
For the exception cases, functions sock_sk_ioctl_inout() will
copy the userspace input, and copy it back to kernel space.
The wrapper that prepare the buffer and put the buffer back to user is
sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now
calls sk_ioctl(), which will handle all cases.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
callback implementations. For example:
<...>
iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
<...>
Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal recvmsg() functions have two parameters 'flags' and 'noblock'
that were merged inside skb_recv_datagram(). As a follow up patch to commit
f4b41f062c ("net: remove noblock parameter from skb_recv_datagram()")
this patch removes the separate 'noblock' parameter for recvmsg().
Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
'noblock' parameters are unnecessarily split up with e.g.
err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
or in
err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
instead of simply using only flags all the time and check for MSG_DONTWAIT
where needed (to preserve for the formerly separated no(n)block condition).
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'
As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
into 'flags' and 'noblock' with finally obsolete bit operations like this:
skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);
And this is not even done consistently with the 'flags' parameter.
This patch removes the obsolete and costly splitting into two parameters
and only performs bit operations when really needed on the caller side.
One missing conversion thankfully reported by kernel test robot. I missed
to enable kunit tests to build the mctp code.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit
baebdf48c3 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
the function netif_rx() can be used in preemptible/thread context as
well as in interrupt context.
Use netif_rx().
Cc: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This ioctl() implicitly assumed that the socket was already bound to
a valid local socket name, i.e. Phonet object. If the socket was not
bound, two separate problems would occur:
1) We'd send an pipe enablement request with an invalid source object.
2) Later socket calls could BUG on the socket unexpectedly being
connected yet not bound to a valid object.
Reported-by: syzbot+2dc91e7fc3dea88b1e8a@syzkaller.appspotmail.com
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_hold(sk) is invoked in pep_sock_accept(), but __sock_put(sk) is not
invoked in subsequent failure branches(pep_accept_conn() != 0).
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20211209082839.33985-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 'if (dev)' statement already move into dev_{put , hold}, so remove
redundant if statements.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
phonet has a single private ioctl that is broken in compat
mode on big-endian machines today because the data returned
from it is never copied back to user space.
Move it over to the ndo_siocdevprivate callback, which also
fixes the compat issue.
Cc: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rémi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rework the remaining setsockopt code to pass a sockptr_t instead of a
plain user pointer. This removes the last remaining set_fs(KERNEL_DS)
outside of architecture specific code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just check for a NULL method instead of wiring up
sock_no_{get,set}sockopt.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the compat handling to sock_common_{get,set}sockopt instead,
keyed of in_compat_syscall(). This allow to remove the now unused
->compat_{get,set}sockopt methods from struct proto_ops.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>