* android12-5.10: (2274 commits)
FROMGIT: mm: slub: move sysfs slab alloc/free interfaces to debugfs
ANDROID: gki - CONFIG_NET_SCH_FQ=y
ANDROID: GKI: Kconfig.gki: Add GKI_HIDDEN_ETHERNET_CONFIGS
FROMLIST: media: Kconfig: Fix DVB_CORE can't be selected as module
ANDROID: Update ABI and symbol list
Revert "net: usb: cdc_ncm: don't spew notifications"
ANDROID: Fips 140: move fips symbols entirely in own list
ANDROID: core of xt_IDLETIMER send_nl_msg support
ANDROID: start to re-add xt_IDLETIMER send_nl_msg support
ANDROID: add fips140.ko symbols to module ABI
ANDROID: inject correct HMAC digest into fips140.ko at build time
ANDROID: crypto: fips140 - perform load time integrity check
FROMLIST: crypto: shash - stop comparing function pointers to avoid breaking CFI
ANDROID: arm64: module: preserve RELA sections for FIPS140 integrity selfcheck
ANDROID: arm64: simd: omit capability check in may_use_simd()
ANDROID: kbuild: lto: permit the use of .a archives in LTO modules
ANDROID: arm64: only permit certain alternatives in the FIPS140 module
ANDROID: crypto: lib/aes - add vendor hooks for AES library routines
ANDROID: crypto: lib/sha256 - add vendor hook for sha256() routine
UPSTREAM: KVM: arm64: Mark the host stage-2 memory pools static
...
Conflicts:
drivers/mmc/core/mmc_ops.c
drivers/usb/gadget/function/f_uac1.c
drivers/usb/gadget/function/f_uac2.c
drivers/usb/gadget/function/f_uvc.c
Changes in 5.10.43
btrfs: tree-checker: do not error out if extent ref hash doesn't match
net: usb: cdc_ncm: don't spew notifications
hwmon: (dell-smm-hwmon) Fix index values
hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_3 for RAA228228
netfilter: conntrack: unregister ipv4 sockopts on error unwind
efi/fdt: fix panic when no valid fdt found
efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
efi/libstub: prevent read overflow in find_file_option()
efi: cper: fix snprintf() use in cper_dimm_err_location()
vfio/pci: Fix error return code in vfio_ecap_init()
vfio/pci: zap_vma_ptes() needs MMU
samples: vfio-mdev: fix error handing in mdpy_fb_probe()
vfio/platform: fix module_put call in error flow
ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
HID: logitech-hidpp: initialize level variable
HID: pidff: fix error return code in hid_pidff_init()
HID: i2c-hid: fix format string mismatch
devlink: Correct VIRTUAL port to not have phys_port attributes
net/sched: act_ct: Offload connections with commit action
net/sched: act_ct: Fix ct template allocation for zone 0
mptcp: always parse mptcp options for MPC reqsk
nvme-rdma: fix in-casule data send for chained sgls
ACPICA: Clean up context mutex during object deletion
perf probe: Fix NULL pointer dereference in convert_variable_location()
net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs
net: sock: fix in-kernel mark setting
net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
net/tls: Fix use-after-free after the TLS device goes down and up
net/mlx5e: Fix incompatible casting
net/mlx5: Check firmware sync reset requested is set before trying to abort it
net/mlx5e: Check for needed capability for cvlan matching
net/mlx5: DR, Create multi-destination flow table with level less than 64
nvmet: fix freeing unallocated p2pmem
netfilter: nft_ct: skip expectations for confirmed conntrack
netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
bpf: Simplify cases in bpf_base_func_proto
bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks
ieee802154: fix error return code in ieee802154_add_iface()
ieee802154: fix error return code in ieee802154_llsec_getparams()
igb: add correct exception tracing for XDP
ixgbevf: add correct exception tracing for XDP
cxgb4: fix regression with HASH tc prio value update
ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
ice: Fix allowing VF to request more/less queues via virtchnl
ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
ice: handle the VF VSI rebuild failure
ice: report supported and advertised autoneg using PHY capabilities
ice: Allow all LLDP packets from PF to Tx
i2c: qcom-geni: Add shutdown callback for i2c
cxgb4: avoid link re-train during TC-MQPRIO configuration
i40e: optimize for XDP_REDIRECT in xsk path
i40e: add correct exception tracing for XDP
ice: simplify ice_run_xdp
ice: optimize for XDP_REDIRECT in xsk path
ice: add correct exception tracing for XDP
ixgbe: optimize for XDP_REDIRECT in xsk path
ixgbe: add correct exception tracing for XDP
arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
optee: use export_uuid() to copy client UUID
bus: ti-sysc: Fix am335x resume hang for usb otg module
arm64: dts: ls1028a: fix memory node
arm64: dts: zii-ultra: fix 12V_MAIN voltage
arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
ARM: dts: imx7d-pico: Fix the 'tuning-step' property
ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
tipc: add extack messages for bearer/media failure
tipc: fix unique bearer names sanity check
serial: stm32: fix threaded interrupt handling
riscv: vdso: fix and clean-up Makefile
io_uring: fix link timeout refs
io_uring: use better types for cflags
drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
Bluetooth: fix the erroneous flush_work() order
Bluetooth: use correct lock to prevent UAF of hdev object
wireguard: do not use -O3
wireguard: peer: allocate in kmem_cache
wireguard: use synchronize_net rather than synchronize_rcu
wireguard: selftests: remove old conntrack kconfig value
wireguard: selftests: make sure rp_filter is disabled on vethc
wireguard: allowedips: initialize list head in selftest
wireguard: allowedips: remove nodes in O(1)
wireguard: allowedips: allocate nodes in kmem_cache
wireguard: allowedips: free empty intermediate nodes when removing single node
net: caif: added cfserl_release function
net: caif: add proper error handling
net: caif: fix memory leak in caif_device_notify
net: caif: fix memory leak in cfusbl_device_notify
HID: i2c-hid: Skip ELAN power-on command after reset
HID: magicmouse: fix NULL-deref on disconnect
HID: multitouch: require Finger field to mark Win8 reports as MT
gfs2: fix scheduling while atomic bug in glocks
ALSA: timer: Fix master timer notification
ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
ALSA: hda: update the power_state during the direct-complete
ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
ext4: fix memory leak in ext4_fill_super
ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
ext4: fix fast commit alignment issues
ext4: fix memory leak in ext4_mb_init_backend on error path.
ext4: fix accessing uninit percpu counter variable with fast_commit
usb: dwc2: Fix build in periphal-only mode
pid: take a reference when initializing `cad_pid`
ocfs2: fix data corruption by fallocate
mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
mm/page_alloc: fix counting of free pages after take off from buddy
x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
x86/sev: Check SME/SEV support in CPUID first
nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
drm/amdgpu: Don't query CE and UE errors
drm/amdgpu: make sure we unpin the UVD BO
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
powerpc/kprobes: Fix validation of prefixed instructions across page boundary
btrfs: mark ordered extent and inode with error if we fail to finish
btrfs: fix error handling in btrfs_del_csums
btrfs: return errors from btrfs_del_csums in cleanup_ref_head
btrfs: fixup error handling in fixup_inode_link_counts
btrfs: abort in rename_exchange if we fail to insert the second ref
btrfs: fix deadlock when cloning inline extents and low on available space
mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
drm/msm/dpu: always use mdp device to scale bandwidth
btrfs: fix unmountable seed device after fstrim
KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
KVM: arm64: Fix debug register indexing
x86/kvm: Teardown PV features on boot CPU as well
x86/kvm: Disable kvmclock on all CPUs on shutdown
x86/kvm: Disable all PV features on crash
lib/lz4: explicitly support in-place decompression
i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
netfilter: nf_tables: missing error reporting for not selected expressions
xen-netback: take a reference to the RX task thread
neighbour: allow NUD_NOARP entries to be forced GCed
Linux 5.10.43
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d7ec0878193e4e454076809b7fb71fcc4e3d810
This was reverted in ee6918c6f7 due to
conflicts with upstream, and this attempts to reapply the majority
of that change in an upstream compatible fashion.
Test: builds, and kernel net tests passes, booted on phone,
but no real testing
Bug: 183485987
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I0fa1da3cb38d37c6888444dc36d49e6d1828a855
This was reverted in ee6918c6f7 due to conflicts with
upstream, this first patch is just the minimum necessary to make the netfilter
IDLETIMER target with --send_nl_msg load successfully:
phone-5.10:/ # iptables-save | egrep IDLETIMER
-A idletimer_raw_PREROUTING -i rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg
-A idletimer_mangle_POSTROUTING -o rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg
phone-5.10:/ # ip6tables-save | egrep IDLETIMER
-A idletimer_raw_PREROUTING -i rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg
-A idletimer_mangle_POSTROUTING -o rmnet0 -j IDLETIMER --timeout 10 --label 0 --send_nl_msg
Test: builds, and kernel net tests passes, booted on phone, observed ip{,6}tables loading rules
Bug: 183485987
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Change-Id: I1fe2c4e41a092cc82c3d6d49d1217798b2728bcb
commit 7a6b1ab747 upstream.
IFF_POINTOPOINT interfaces use NUD_NOARP entries for IPv6. It's possible to
fill up the neighbour table with enough entries that it will overflow for
valid connections after that.
This behaviour is more prevalent after commit 58956317c8 ("neighbor:
Improve garbage collection") is applied, as it prevents removal from
entries that are not NUD_FAILED, unless they are more than 5s old.
Fixes: 58956317c8 (neighbor: Improve garbage collection)
Reported-by: Kasper Dupont <kasperd@gjkwv.06.feb.2021.kasperd.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c781471d67 upstream.
Sometimes users forget to turn on nftables extensions from Kconfig that
they need. In such case, the error reporting from userspace is
misleading:
$ sudo nft add rule x y counter
Error: Could not process rule: No such file or directory
add rule x y counter
^^^^^^^^^^^^^^^^^^^^
Add missing NL_SET_BAD_ATTR() to provide a hint:
$ nft add rule x y counter
Error: Could not process rule: No such file or directory
add rule x y counter
^^^^^^^
Fixes: 83d9dcba06 ("netfilter: nf_tables: extended netlink error reporting for expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ac06a1e01 upstream.
It's possible to trigger NULL pointer dereference by local unprivileged
user, when calling getsockname() after failed bind() (e.g. the bind
fails because LLCP_SAP_MAX used as SAP):
BUG: kernel NULL pointer dereference, address: 0000000000000000
CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
Call Trace:
llcp_sock_getname+0xb1/0xe0
__sys_getpeername+0x95/0xc0
? lockdep_hardirqs_on_prepare+0xd5/0x180
? syscall_enter_from_user_mode+0x1c/0x40
__x64_sys_getpeername+0x11/0x20
do_syscall_64+0x36/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xae
This can be reproduced with Syzkaller C repro (bind followed by
getpeername):
https://syzkaller.appspot.com/x/repro.c?x=14def446e00000
Cc: <stable@vger.kernel.org>
Fixes: d646960f79 ("NFC: Initial LLCP support")
Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f5d86669f upstream.
In case of caif_enroll_dev() fail, allocated
link_support won't be assigned to the corresponding
structure. So simply free allocated pointer in case
of error.
Fixes: 7ad65bf68d ("caif: Add support for CAIF over CDC NCM USB interface")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b53558a950 upstream.
In case of caif_enroll_dev() fail, allocated
link_support won't be assigned to the corresponding
structure. So simply free allocated pointer in case
of error
Fixes: 7c18d2205e ("caif: Restructure how link caif link layer enroll")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+7ec324747ce876a29db6@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2805dca51 upstream.
caif_enroll_dev() can fail in some cases. Ingnoring
these cases can lead to memory leak due to not assigning
link_support pointer to anywhere.
Fixes: 7c18d2205e ("caif: Restructure how link caif link layer enroll")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e305509e67 upstream.
The hci_sock_dev_event() function will cleanup the hdev object for
sockets even if this object may still be in used within the
hci_sock_bound_ioctl() function, result in UAF vulnerability.
This patch replace the BH context lock to serialize these affairs
and prevent the race condition.
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a137caec2 upstream.
In the cleanup routine for failed initialization of HCI device,
the flush_work(&hdev->rx_work) need to be finished before the
flush_work(&hdev->cmd_work). Otherwise, the hci_rx_work() can
possibly invoke new cmd_work and cause a bug, like double free,
in late processings.
This was assigned CVE-2021-3564.
This patch reorder the flush_work() to fix this bug.
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: linux-bluetooth@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Hao Xiong <mart1n@zju.edu.cn>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f20a46c304 ]
When enabling a bearer by name, we don't sanity check its name with
higher slot in bearer list. This may have the effect that the name
of an already enabled bearer bypasses the check.
To fix the above issue, we just perform an extra checking with all
existing bearers.
Fixes: cb30a63384 ("tipc: refactor function tipc_enable_bearer()")
Cc: stable@vger.kernel.org
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b83e214b2e ]
Add extack error messages for -EINVAL errors when enabling bearer,
getting/setting properties for a media/bearer
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 373e864cf5 ]
Fix to return negative error code -ENOBUFS from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 3e9c156e2c ("ieee802154: add netlink interfaces for llsec")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210519141614.3040055-1-weiyongjun1@huawei.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79c6b8ed30 ]
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: be51da0f3e ("ieee802154: Stop using NLA_PUT*().")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210508062517.2574-1-thunder.leizhen@huawei.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8971ee8b08 ]
The private helper data size cannot be updated. However, updates that
contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is
the same.
Fixes: 12f7a50533 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1710eb913b ]
nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed
conntrack entry. However, nf_ct_ext_add() can only be called for
!nf_ct_is_confirmed().
[ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack]
[ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00
[ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202
[ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887
[ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440
[ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447
[ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440
[ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20
[ 1825.352240] FS: 00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000
[ 1825.352343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0
[ 1825.352508] Call Trace:
[ 1825.352544] nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack]
[ 1825.352641] nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct]
[ 1825.352716] nft_do_chain+0x232/0x850 [nf_tables]
Add the ct helper extension only for unconfirmed conntrack. Skip rule
evaluation if the ct helper extension does not exist. Thus, you can
only create expectations from the first packet.
It should be possible to remove this limitation by adding a new action
to attach a generic ct helper to the first packet. Then, use this ct
helper extension from follow up packets to create the ct expectation.
While at it, add a missing check to skip the template conntrack too
and remove check for IPCT_UNTRACK which is implicit to !ct.
Fixes: 857b46027d ("netfilter: nft_ct: add ct expectations support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c55dcdd435 ]
When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.
This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.
On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.
The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).
A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).
Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05fc8b6cbd ]
RCU synchronization is guaranteed to finish in finite time, unlike a
busy loop that polls a flag. This patch is a preparation for the bugfix
in the next patch, where the same synchronize_net() call will also be
used to sync with the TX datapath.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dd9082f4a9 ]
This patch fixes the in-kernel mark setting by doing an additional
sk_dst_reset() which was introduced by commit 50254256f3 ("sock: Reset
dst when changing sk_mark via setsockopt"). The code is now shared to
avoid any further suprises when changing the socket mark value.
Fixes: 84d1c61740 ("net: sock: add sock_set_mark")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ef8d857b5 ]
When using sub-VLANs in the range of 1-7, the resulting value from:
rx_vid = dsa_8021q_rx_vid_subvlan(ds, port, subvlan);
is wrong according to the description from tag_8021q.c:
| 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+-----------+-----+-----------------+-----------+-----------------------+
| DIR | SVL | SWITCH_ID | SUBVLAN | PORT |
+-----------+-----+-----------------+-----------+-----------------------+
For example, when ds->index == 0, port == 3 and subvlan == 1,
dsa_8021q_rx_vid_subvlan() returns 1027, same as it returns for
subvlan == 0, but it should have returned 1043.
This is because the low portion of the subvlan bits are not masked
properly when writing into the 12-bit VLAN value. They are masked into
bits 4:3, but they should be masked into bits 5:4.
Fixes: 3eaae1d05f ("net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 06f9a435b3 ]
In subflow_syn_recv_sock() we currently skip options parsing
for OoO packet, given that such packets may not carry the relevant
MPC option.
If the peer generates an MPC+data TSO packet and some of the early
segments are lost or get reorder, we server will ignore the peer key,
causing transient, unexpected fallback to TCP.
The solution is always parsing the incoming MPTCP options, and
do the fallback only for in-order packets. This actually cleans
the existing code a bit.
Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fb91702b74 ]
Fix current behavior of skipping template allocation in case the
ct action is in zone 0.
Skipping the allocation may cause the datapath ct code to ignore the
entire ct action with all its attributes (commit, nat) in case the ct
action in zone 0 was preceded by a ct clear action.
The ct clear action sets the ct_state to untracked and resets the
skb->_nfct pointer. Under these conditions and without an allocated
ct template, the skb->_nfct pointer will remain NULL which will
cause the tc ct action handler to exit without handling commit and nat
actions, if such exist.
For example, the following rule in OVS dp:
recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \
in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \
recirc(0x37a)
Will result in act_ct skipping the commit and nat actions in zone 0.
The change removes the skipping of template allocation for zone 0 and
treats it the same as any other zone.
Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cc254e5aa ]
Currently established connections are not offloaded if the filter has a
"ct commit" action. This behavior will not offload connections of the
following scenario:
$ tc_filter add dev $DEV ingress protocol ip prio 1 flower \
ct_state -trk \
action ct commit action goto chain 1
$ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \
action mirred egress redirect dev $DEV2
$ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \
action ct commit action goto chain 1
$ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \
ct_state +trk+est \
action mirred egress redirect dev $DEV
Offload established connections, regardless of the commit flag.
Fixes: 46475bb20f ("net/sched: act_ct: Software offload of established flows")
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/1622029449-27060-1-git-send-email-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b28d8f0c25 ]
Physical port name, port number attributes do not belong to virtual port
flavour. When VF or SF virtual ports are registered they incorrectly
append "np0" string in the netdevice name of the VF/SF.
Before this fix, VF netdevice name were ens2f0np0v0, ens2f0np0v1 for VF
0 and 1 respectively.
After the fix, they are ens2f0v0, ens2f0v1.
With this fix, reading /sys/class/net/ens2f0v0/phys_port_name returns
-EOPNOTSUPP.
Also devlink port show example for 2 VFs on one PF to ensure that any
physical port attributes are not exposed.
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
pci/0000:06:00.3/196608: type eth netdev ens2f0v0 flavour virtual splittable false
pci/0000:06:00.4/262144: type eth netdev ens2f0v1 flavour virtual splittable false
This change introduces a netdevice name change on systemd/udev
version 245 and higher which honors phys_port_name sysfs file for
generation of netdevice name.
This also aligns to phys_port_name usage which is limited to switchdev
ports as described in [1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/networking/switchdev.rst
Fixes: acf1ee44ca ("devlink: Introduce devlink port flavour virtual")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20210526200027.14008-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit c34cd7750e.
Bring back the commit in 5.10.38 that broke the kabi.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I570fc31f9e1f196136bbbef479fb2413011ddf0e
Changes in 5.10.42
ALSA: hda/realtek: the bass speaker can't output sound on Yoga 9i
ALSA: hda/realtek: Headphone volume is controlled by Front mixer
ALSA: hda/realtek: Chain in pop reduction fixup for ThinkStation P340
ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 15 G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 17 G8
ALSA: usb-audio: scarlett2: Fix device hang with ehci-pci
ALSA: usb-audio: scarlett2: Improve driver startup messages
cifs: set server->cipher_type to AES-128-CCM for SMB3.0
NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
iommu/vt-d: Fix sysfs leak in alloc_iommu()
perf intel-pt: Fix sample instruction bytes
perf intel-pt: Fix transaction abort handling
perf scripts python: exported-sql-viewer.py: Fix copy to clipboard from Top Calls by elapsed Time report
perf scripts python: exported-sql-viewer.py: Fix Array TypeError
perf scripts python: exported-sql-viewer.py: Fix warning display
proc: Check /proc/$pid/attr/ writes against file opener
net: hso: fix control-request directions
net/sched: fq_pie: re-factor fix for fq_pie endless loop
net/sched: fq_pie: fix OOB access in the traffic path
netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version
mac80211: assure all fragments are encrypted
mac80211: prevent mixed key and fragment cache attacks
mac80211: properly handle A-MSDUs that start with an RFC 1042 header
cfg80211: mitigate A-MSDU aggregation attacks
mac80211: drop A-MSDUs on old ciphers
mac80211: add fragment cache to sta_info
mac80211: check defrag PN against current frame
mac80211: prevent attacks on TKIP/WEP as well
mac80211: do not accept/forward invalid EAPOL frames
mac80211: extend protection against mixed key and fragment cache attacks
ath10k: add CCMP PN replay protection for fragmented frames for PCIe
ath10k: drop fragments with multicast DA for PCIe
ath10k: drop fragments with multicast DA for SDIO
ath10k: drop MPDU which has discard flag set by firmware for SDIO
ath10k: Fix TKIP Michael MIC verification for PCIe
ath10k: Validate first subframe of A-MSDU before processing the list
ath11k: Clear the fragment cache during key install
dm snapshot: properly fix a crash when an origin has no snapshots
drm/amd/pm: correct MGpuFanBoost setting
drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error
drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate
drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate
drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate
selftests/gpio: Use TEST_GEN_PROGS_EXTENDED
selftests/gpio: Move include of lib.mk up
selftests/gpio: Fix build when source tree is read only
kgdb: fix gcc-11 warnings harder
Documentation: seccomp: Fix user notification documentation
seccomp: Refactor notification handler to prepare for new semantics
serial: core: fix suspicious security_locked_down() call
misc/uss720: fix memory leak in uss720_probe
thunderbolt: usb4: Fix NVM read buffer bounds and offset issue
thunderbolt: dma_port: Fix NVM read buffer bounds and offset issue
KVM: X86: Fix vCPU preempted state from guest's point of view
KVM: arm64: Prevent mixed-width VM creation
mei: request autosuspend after sending rx flow control
staging: iio: cdc: ad7746: avoid overwrite of num_channels
iio: gyro: fxas21002c: balance runtime power in error path
iio: dac: ad5770r: Put fwnode in error case during ->probe()
iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()
iio: adc: ad7124: Fix missbalanced regulator enable / disable on error.
iio: adc: ad7124: Fix potential overflow due to non sequential channel numbers
iio: adc: ad7923: Fix undersized rx buffer.
iio: adc: ad7793: Add missing error code in ad7793_setup()
iio: adc: ad7192: Avoid disabling a clock that was never enabled.
iio: adc: ad7192: handle regulator voltage error first
serial: 8250: Add UART_BUG_TXRACE workaround for Aspeed VUART
serial: 8250_dw: Add device HID for new AMD UART controller
serial: 8250_pci: Add support for new HPE serial device
serial: 8250_pci: handle FL_NOIRQ board flag
USB: trancevibrator: fix control-request direction
Revert "irqbypass: do not start cons/prod when failed connect"
USB: usbfs: Don't WARN about excessively large memory allocations
drivers: base: Fix device link removal
serial: tegra: Fix a mask operation that is always true
serial: sh-sci: Fix off-by-one error in FIFO threshold register setting
serial: rp2: use 'request_firmware' instead of 'request_firmware_nowait'
USB: serial: ti_usb_3410_5052: add startech.com device id
USB: serial: option: add Telit LE910-S1 compositions 0x7010, 0x7011
USB: serial: ftdi_sio: add IDs for IDS GmbH Products
USB: serial: pl2303: add device id for ADLINK ND-6530 GC
thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID
usb: dwc3: gadget: Properly track pending and queued SG
usb: gadget: udc: renesas_usb3: Fix a race in usb3_start_pipen()
usb: typec: mux: Fix matching with typec_altmode_desc
net: usb: fix memory leak in smsc75xx_bind
Bluetooth: cmtp: fix file refcount when cmtp_attach_device fails
fs/nfs: Use fatal_signal_pending instead of signal_pending
NFS: fix an incorrect limit in filelayout_decode_layout()
NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
drm/meson: fix shutdown crash when component not probed
net/mlx5e: reset XPS on error flow if netdev isn't registered yet
net/mlx5e: Fix multipath lag activation
net/mlx5e: Fix error path of updating netdev queues
{net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table
net/mlx5e: Fix nullptr in add_vlan_push_action()
net/mlx5: Set reformat action when needed for termination rules
net/mlx5e: Fix null deref accessing lag dev
net/mlx4: Fix EEPROM dump support
net/mlx5: Set term table as an unmanaged flow table
SUNRPC in case of backlog, hand free slots directly to waiting task
Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"
tipc: wait and exit until all work queues are done
tipc: skb_linearize the head skb when reassembling msgs
spi: spi-fsl-dspi: Fix a resource leak in an error handling path
netfilter: flowtable: Remove redundant hw refresh bit
net: dsa: mt7530: fix VLAN traffic leaks
net: dsa: fix a crash if ->get_sset_count() fails
net: dsa: sja1105: update existing VLANs from the bridge VLAN list
net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic
net: dsa: sja1105: error out on unsupported PHY mode
net: dsa: sja1105: add error handling in sja1105_setup()
net: dsa: sja1105: call dsa_unregister_switch when allocating memory fails
net: dsa: sja1105: fix VL lookup command packing for P/Q/R/S
i2c: s3c2410: fix possible NULL pointer deref on read message after write
i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
i2c: i801: Don't generate an interrupt on bus reset
i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
afs: Fix the nlink handling of dir-over-dir rename
perf jevents: Fix getting maximum number of fds
nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
mptcp: avoid error message on infinite mapping
mptcp: drop unconditional pr_warn on bad opt
mptcp: fix data stream corruption
platform/x86: hp_accel: Avoid invoking _INI to speed up resume
gpio: cadence: Add missing MODULE_DEVICE_TABLE
Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions"
Revert "media: usb: gspca: add a missed check for goto_low_power"
Revert "ALSA: sb: fix a missing check of snd_ctl_add"
Revert "serial: max310x: pass return value of spi_register_driver"
serial: max310x: unregister uart driver in case of failure and abort
Revert "net: fujitsu: fix a potential NULL pointer dereference"
net: fujitsu: fix potential null-ptr-deref
Revert "net/smc: fix a NULL pointer dereference"
net/smc: properly handle workqueue allocation failure
Revert "net: caif: replace BUG_ON with recovery code"
net: caif: remove BUG_ON(dev == NULL) in caif_xmit
Revert "char: hpet: fix a missing check of ioremap"
char: hpet: add checks after calling ioremap
Revert "ALSA: gus: add a check of the status of snd_ctl_add"
Revert "ALSA: usx2y: Fix potential NULL pointer dereference"
Revert "isdn: mISDNinfineon: fix potential NULL pointer dereference"
isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io
Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()"
ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()
Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc"
isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info
Revert "dmaengine: qcom_hidma: Check for driver register failure"
dmaengine: qcom_hidma: comment platform_driver_register call
Revert "libertas: add checks for the return value of sysfs_create_group"
libertas: register sysfs groups properly
Revert "ASoC: cs43130: fix a NULL pointer dereference"
ASoC: cs43130: handle errors in cs43130_probe() properly
Revert "media: dvb: Add check on sp8870_readreg"
media: dvb: Add check on sp8870_readreg return
Revert "media: gspca: mt9m111: Check write_bridge for timeout"
media: gspca: mt9m111: Check write_bridge for timeout
Revert "media: gspca: Check the return value of write_bridge for timeout"
media: gspca: properly check for errors in po1030_probe()
Revert "net: liquidio: fix a NULL pointer dereference"
net: liquidio: Add missing null pointer checks
Revert "brcmfmac: add a check for the status of usb_register"
brcmfmac: properly check for bus register errors
btrfs: return whole extents in fiemap
scsi: ufs: ufs-mediatek: Fix power down spec violation
scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic
openrisc: Define memory barrier mb
scsi: pm80xx: Fix drives missing during rmmod/insmod loop
btrfs: release path before starting transaction when cloning inline extent
btrfs: do not BUG_ON in link_to_fixup_dir
platform/x86: hp-wireless: add AMD's hardware id to the supported list
platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI
platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet
SMB3: incorrect file id in requests compounded with open
drm/amd/display: Disconnect non-DP with no EDID
drm/amd/amdgpu: fix refcount leak
drm/amdgpu: Fix a use-after-free
drm/amd/amdgpu: fix a potential deadlock in gpu reset
drm/amdgpu: stop touching sched.ready in the backend
platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet
block: fix a race between del_gendisk and BLKRRPART
linux/bits.h: fix compilation error with GENMASK
net: netcp: Fix an error message
net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count
interconnect: qcom: bcm-voter: add a missing of_node_put()
interconnect: qcom: Add missing MODULE_DEVICE_TABLE
ASoC: cs42l42: Regmap must use_single_read/write
net: stmmac: Fix MAC WoL not working if PHY does not support WoL
net: ipa: memory region array is variable size
vfio-ccw: Check initialized flag in cp_init()
spi: Assume GPIO CS active high in ACPI case
net: really orphan skbs tied to closing sk
net: packetmmap: fix only tx timestamp on request
net: fec: fix the potential memory leak in fec_enet_init()
chelsio/chtls: unlock on error in chtls_pt_recvmsg()
net: mdio: thunder: Fix a double free issue in the .remove function
net: mdio: octeon: Fix some double free issues
cxgb4/ch_ktls: Clear resources when pf4 device is removed
openvswitch: meter: fix race when getting now_ms.
tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
net: sched: fix packet stuck problem for lockless qdisc
net: sched: fix tx action rescheduling issue during deactivation
net: sched: fix tx action reschedule issue with stopped queue
net: hso: check for allocation failure in hso_create_bulk_serial_device()
net: bnx2: Fix error return code in bnx2_init_board()
bnxt_en: Include new P5 HV definition in VF check.
bnxt_en: Fix context memory setup for 64K page size.
mld: fix panic in mld_newpack()
net/smc: remove device from smcd_dev_list after failed device_add()
gve: Check TX QPL was actually assigned
gve: Update mgmt_msix_idx if num_ntfy changes
gve: Add NULL pointer checks when freeing irqs.
gve: Upgrade memory barrier in poll routine
gve: Correct SKB queue index validation.
iommu/virtio: Add missing MODULE_DEVICE_TABLE
net: hns3: fix incorrect resp_msg issue
net: hns3: put off calling register_netdev() until client initialize complete
iommu/vt-d: Use user privilege for RID2PASID translation
cxgb4: avoid accessing registers when clearing filters
staging: emxx_udc: fix loop in _nbu2ss_nuke()
ASoC: cs35l33: fix an error code in probe()
bpf, offload: Reorder offload callback 'prepare' in verifier
bpf: Set mac_len in bpf_skb_change_head
ixgbe: fix large MTU request from VF
ASoC: qcom: lpass-cpu: Use optional clk APIs
scsi: libsas: Use _safe() loop in sas_resume_port()
net: lantiq: fix memory corruption in RX ring
ipv6: record frag_max_size in atomic fragments in input path
ALSA: usb-audio: scarlett2: snd_scarlett_gen2_controls_create() can be static
net: ethernet: mtk_eth_soc: Fix packet statistics support for MT7628/88
sch_dsmark: fix a NULL deref in qdisc_reset()
net: hsr: fix mac_len checks
MIPS: alchemy: xxs1500: add gpio-au1000.h header file
MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
net: zero-initialize tc skb extension on allocation
net: mvpp2: add buffer header handling in RX
i915: fix build warning in intel_dp_get_link_status()
samples/bpf: Consider frame size in tx_only of xdpsock sample
net: hns3: check the return of skb_checksum_help()
bpftool: Add sock_release help info for cgroup attach/prog load command
SUNRPC: More fixes for backlog congestion
Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference""
net: hso: bail out on interrupt URB allocation failure
scripts/clang-tools: switch explicitly to Python 3
neighbour: Prevent Race condition in neighbour subsytem
usb: core: reduce power-on-good delay time of root hub
Linux 5.10.42
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I05d98d1355a080e0951b4b2ae77f0a9ccb6dfc5d
commit eefb45eef5 upstream.
Following Race Condition was detected:
<CPU A, t0>: Executing: __netif_receive_skb() ->__netif_receive_skb_core()
-> arp_rcv() -> arp_process().arp_process() calls __neigh_lookup() which
takes a reference on neighbour entry 'n'.
Moves further along, arp_process() and calls neigh_update()->
__neigh_update(). Neighbour entry is unlocked just before a call to
neigh_update_gc_list.
This unlocking paves way for another thread that may take a reference on
the same and mark it dead and remove it from gc_list.
<CPU B, t1> - neigh_flush_dev() is under execution and calls
neigh_mark_dead(n) marking the neighbour entry 'n' as dead. Also n will be
removed from gc_list.
Moves further along neigh_flush_dev() and calls
neigh_cleanup_and_release(n), but since reference count increased in t1,
'n' couldn't be destroyed.
<CPU A, t3>- Code hits neigh_update_gc_list, with neighbour entry
set as dead.
<CPU A, t4> - arp_process() finally calls neigh_release(n), destroying
the neighbour entry and we have a destroyed ntry still part of gc_list.
Fixes: eb4e8fac00d1("neighbour: Prevent a dead entry from updating gc_list")
Signed-off-by: Chinmay Agarwal <chinagar@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e86be3a04b upstream.
Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
as socket based transports.
Ensure we always initialise the request after waking up from the backlog
list.
Fixes: e877a88d1f ("SUNRPC in case of backlog, hand free slots directly to waiting task")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 48b491a5cc ]
Commit 2e9f60932a ("net: hsr: check skb can contain struct hsr_ethhdr
in fill_frame_info") added the following which resulted in -EINVAL
always being returned:
if (skb->mac_len < sizeof(struct hsr_ethhdr))
return -EINVAL;
mac_len was not being set correctly so this check completely broke
HSR/PRP since it was always 14, not 20.
Set mac_len correctly and modify the mac_len checks to test in the
correct places since sometimes it is legitimately 14.
Fixes: 2e9f60932a ("net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e29f011e8f ]
Commit dbd1759e6a ("ipv6: on reassembly, record frag_max_size")
filled the frag_max_size field in IP6CB in the input path.
The field should also be filled in case of atomic fragments.
Fixes: dbd1759e6a ('ipv6: on reassembly, record frag_max_size')
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84316ca4e1 ]
The skb_change_head() helper did not set "skb->mac_len", which is
problematic when it's used in combination with skb_redirect_peer().
Without it, redirecting a packet from a L3 device such as wireguard to
the veth peer device will cause skb->data to point to the middle of the
IP header on entry to tcp_v4_rcv() since the L2 header is not pulled
correctly due to mac_len=0.
Fixes: 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure")
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210519154743.2554771-2-joamaki@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 444d7be953 ]
If the device_add() for a smcd_dev fails, there's no cleanup step that
rolls back the earlier list_add(). The device subsequently gets freed,
and we end up with a corrupted list.
Add some error handling that removes the device from the list.
Fixes: c6ba7c9ba4 ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 020ef930b8 ]
mld_newpack() doesn't allow to allocate high order page,
only order-0 allocation is allowed.
If headroom size is too large, a kernel panic could occur in skb_put().
Test commands:
ip netns del A
ip netns del B
ip netns add A
ip netns add B
ip link add veth0 type veth peer name veth1
ip link set veth0 netns A
ip link set veth1 netns B
ip netns exec A ip link set lo up
ip netns exec A ip link set veth0 up
ip netns exec A ip -6 a a 2001:db8:0::1/64 dev veth0
ip netns exec B ip link set lo up
ip netns exec B ip link set veth1 up
ip netns exec B ip -6 a a 2001:db8:0::2/64 dev veth1
for i in {1..99}
do
let A=$i-1
ip netns exec A ip link add ip6gre$i type ip6gre \
local 2001:db8:$A::1 remote 2001:db8:$A::2 encaplimit 100
ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev ip6gre$i
ip netns exec A ip link set ip6gre$i up
ip netns exec B ip link add ip6gre$i type ip6gre \
local 2001:db8:$A::2 remote 2001:db8:$A::1 encaplimit 100
ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev ip6gre$i
ip netns exec B ip link set ip6gre$i up
done
Splat looks like:
kernel BUG at net/core/skbuff.c:110!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.12.0+ #891
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:skb_panic+0x15d/0x15f
Code: 92 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 00 ae 79 83
41 57 41 56 41 55 48 8b 54 24 a6 26 f9 ff <0f> 0b 48 8b 6c 24 20 89
34 24 e8 4a 4e 92 fe 8b 34 24 48 c7 c1 20
RSP: 0018:ffff88810091f820 EFLAGS: 00010282
RAX: 0000000000000089 RBX: ffff8881086e9000 RCX: 0000000000000000
RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed1020123efb
RBP: ffff888005f6eac0 R08: ffffed1022fc0031 R09: ffffed1022fc0031
R10: ffff888117e00187 R11: ffffed1022fc0030 R12: 0000000000000028
R13: ffff888008284eb0 R14: 0000000000000ed8 R15: 0000000000000ec0
FS: 0000000000000000(0000) GS:ffff888117c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8b801c5640 CR3: 0000000033c2c006 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
skb_put.cold.104+0x22/0x22
ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
? rcu_read_lock_sched_held+0x91/0xc0
mld_newpack+0x398/0x8f0
? ip6_mc_hdr.isra.26.constprop.46+0x600/0x600
? lock_contended+0xc40/0xc40
add_grhead.isra.33+0x280/0x380
add_grec+0x5ca/0xff0
? mld_sendpack+0xf40/0xf40
? lock_downgrade+0x690/0x690
mld_send_initial_cr.part.34+0xb9/0x180
ipv6_mc_dad_complete+0x15d/0x1b0
addrconf_dad_completed+0x8d2/0xbb0
? lock_downgrade+0x690/0x690
? addrconf_rs_timer+0x660/0x660
? addrconf_dad_work+0x73c/0x10e0
addrconf_dad_work+0x73c/0x10e0
Allowing high order page allocation could fix this problem.
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dcad9ee9e0 ]
The netdev qeueue might be stopped when byte queue limit has
reached or tx hw ring is full, net_tx_action() may still be
rescheduled if STATE_MISSED is set, which consumes unnecessary
cpu without dequeuing and transmiting any skb because the
netdev queue is stopped, see qdisc_run_end().
This patch fixes it by checking the netdev queue state before
calling qdisc_run() and clearing STATE_MISSED if netdev queue is
stopped during qdisc_run(), the net_tx_action() is rescheduled
again when netdev qeueue is restarted, see netif_tx_wake_queue().
As there is time window between netif_xmit_frozen_or_stopped()
checking and STATE_MISSED clearing, between which STATE_MISSED
may set by net_tx_action() scheduled by netif_tx_wake_queue(),
so set the STATE_MISSED again if netdev queue is restarted.
Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 102b55ee92 ]
Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
qdisc before calling __qdisc_run(), which ultimately clear the
STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
is set before clearing STATE_MISSED, there may be rescheduling
of net_tx_action() at the end of qdisc_run_end(), see below:
CPU0(net_tx_atcion) CPU1(__dev_xmit_skb) CPU2(dev_deactivate)
. . .
. set STATE_MISSED .
. __netif_schedule() .
. . set STATE_DEACTIVATED
. . qdisc_reset()
. . .
.<--------------- . synchronize_net()
clear __QDISC_STATE_SCHED | . .
. | . .
. | . some_qdisc_is_busy()
. | . return *false*
. | . .
test STATE_DEACTIVATED | . .
__qdisc_run() *not* called | . .
. | . .
test STATE_MISS | . .
__netif_schedule()--------| . .
. . .
. . .
__qdisc_run() is not called by net_tx_atcion() in CPU0 because
CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
is called at the end of qdisc_run_end(), causing tx action
rescheduling problem.
qdisc_run() called by net_tx_action() runs in the softirq context,
which should has the same semantic as the qdisc_run() called by
__dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
synchronize_net() between STATE_DEACTIVATED flag being set and
qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
bail out for the deactived lockless qdisc in net_tx_action(), and
qdisc_reset() will reset all skb not dequeued yet.
So add the rcu_read_lock() explicitly to protect the qdisc_run()
and do the STATE_DEACTIVATED checking in net_tx_action() before
calling qdisc_run_begin(). Another option is to do the checking in
the qdisc_run_end(), but it will add unnecessary overhead for
non-tx_action case, because __dev_queue_xmit() will not see qdisc
with STATE_DEACTIVATED after synchronize_net(), the qdisc with
STATE_DEACTIVATED can only be seen by net_tx_action() because of
__netif_schedule().
The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
between net_tx_action() and qdisc_reset(), see:
commit d518d2ed86 ("net/sched: fix race between deactivation
and dequeue for NOLOCK qdisc"). As the bailout added above for
deactived lockless qdisc in net_tx_action() provides better
protection for the race without calling qdisc_run() at all, so
remove the STATE_DEACTIVATED checking in qdisc_run().
After qdisc_reset(), there is no skb in qdisc to be dequeued, so
clear the STATE_MISSED in dev_reset_queue() too.
Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
V8: Clearing STATE_MISSED before calling __netif_schedule() has
avoid the endless rescheduling problem, but there may still
be a unnecessary rescheduling, so adjust the commit log.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a90c57f2ce ]
Lockless qdisc has below concurrent problem:
cpu0 cpu1
. .
q->enqueue .
. .
qdisc_run_begin() .
. .
dequeue_skb() .
. .
sch_direct_xmit() .
. .
. q->enqueue
. qdisc_run_begin()
. return and do nothing
. .
qdisc_run_end() .
cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().
Lockless qdisc has below another concurrent problem when
tx_action is involved:
cpu0(serving tx_action) cpu1 cpu2
. . .
. q->enqueue .
. qdisc_run_begin() .
. dequeue_skb() .
. . q->enqueue
. . .
. sch_direct_xmit() .
. . qdisc_run_begin()
. . return and do nothing
. . .
clear __QDISC_STATE_SCHED . .
qdisc_run_begin() . .
return and do nothing . .
. . .
. qdisc_run_end() .
This patch fixes the above data race by:
1. If the first spin_trylock() return false and STATE_MISSED is
not set, set STATE_MISSED and retry another spin_trylock() in
case other CPU may not see STATE_MISSED after it releases the
lock.
2. reschedule if STATE_MISSED is set after the lock is released
at the end of qdisc_run_end().
For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.
Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the second
spin_trylock() and __netif_schedule() calling.
Also clear the STATE_MISSED before calling __netif_schedule()
at the end of qdisc_run_end() to avoid doing another round of
dequeuing in the pfifo_fast_dequeue().
The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:
threads without+this_patch with+this_patch delta
1 2.61Mpps 2.60Mpps -0.3%
2 3.97Mpps 3.82Mpps -3.7%
4 5.62Mpps 5.59Mpps -0.5%
8 2.78Mpps 2.77Mpps -0.3%
16 2.22Mpps 2.22Mpps -0.0%
Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 974271e5ed ]
In tls_sw_splice_read, checkout MSG_* is inappropriate, should use
SPLICE_*, update tls_wait_data to accept nonblock arguments instead
of flags for recvmsg and splice.
Fixes: c46234ebb4 ("tls: RX path for ktls")
Signed-off-by: Jim Ma <majinjing3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e4df1b0c24 ]
We have observed meters working unexpected if traffic is 3+Gbit/s
with multiple connections.
now_ms is not pretected by meter->lock, we may get a negative
long_delta_ms when another cpu updated meter->used, then:
delta_ms = (u32)long_delta_ms;
which will be a large value.
band->bucket += delta_ms * band->rate;
then we get a wrong band->bucket.
OpenVswitch userspace datapath has fixed the same issue[1] some
time ago, and we port the implementation to kernel datapath.
[1] https://patchwork.ozlabs.org/project/openvswitch/patch/20191025114436.9746-1-i.maximets@ovn.org/
Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 171c3b1511 ]
The packetmmap tx ring should only return timestamps if requested via
setsockopt PACKET_TIMESTAMP, as documented. This allows compatibility
with non-timestamp aware user-space code which checks
tp_status == TP_STATUS_AVAILABLE; not expecting additional timestamp
flags to be set in tp_status.
Fixes: b9c32fb271 ("packet: if hw/sw ts enabled in rx/tx ring, report which ts we got")
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Richard Sanger <rsanger@wand.net.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 098116e7e6 ]
If the owing socket is shutting down - e.g. the sock reference
count already dropped to 0 and only sk_wmem_alloc is keeping
the sock alive, skb_orphan_partial() becomes a no-op.
When forwarding packets over veth with GRO enabled, the above
causes refcount errors.
This change addresses the issue with a plain skb_orphan() call
in the critical scenario.
Fixes: 9adc89af72 ("net: let skb_orphan_partial wake-up waiters.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b94cbc909f ]
DSA implements a bunch of 'standardized' ethtool statistics counters,
namely tx_packets, tx_bytes, rx_packets, rx_bytes. So whatever the
hardware driver returns in .get_sset_count(), we need to add 4 to that.
That is ok, except that .get_sset_count() can return a negative error
code, for example:
b53_get_sset_count
-> phy_ethtool_get_sset_count
-> return -EIO
-EIO is -5, and with 4 added to it, it becomes -1, aka -EPERM. One can
imagine that certain error codes may even become positive, although
based on code inspection I did not see instances of that.
Check the error code first, if it is negative return it as-is.
Based on a similar patch for dsa_master_get_strings from Dan Carpenter:
https://patchwork.kernel.org/project/netdevbpf/patch/YJaSe3RPgn7gKxZv@mwanda/
Fixes: 91da11f870 ("net: Distributed Switch Architecture protocol support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bbeb18f27a ]
In smcd_alloc_dev(), if alloc_ordered_workqueue() fails, properly catch
it, clean up and return NULL to let the caller know there was a failure.
Move the call to alloc_ordered_workqueue higher in the function in order
to abort earlier without needing to unwind the call to device_initialize().
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20210503115736.2104747-18-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>