USBSTS is mostly RW1C, so to clear EINT we should write just this
one bit. Remove pointless code which ORs the bit with current value
of the register, even though the bit is already known to be set,
and writes the result back, which clears all active RW1C flags.
We used to inadvertently clear PCD and SRE in this way. PCD isn't
used by the driver and SRE is only used at resume, so clearing them
should make no difference. Don't clear them anymore.
Tested by connecting and mounting a storage device on a few HCs.
Before: xhci_irq USBSTS 0x00000018 EINT PCD -> 0x00000000
xhci_irq USBSTS 0x00000008 EINT -> 0x00000000
After: xhci_irq USBSTS 0x00000018 EINT PCD -> 0x00000010 PCD
xhci_irq USBSTS 0x00000018 EINT PCD -> 0x00000010 PCD
Some flags are RsvdZ - should be written as zero regardless of the
value read, so technically it was a bug. But no problems are known.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20260402131342.2628648-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
Device plug/unplug scenarios on Android devices. HCE is checked in
xhci_irq() function and causes an interrupt storm (since the interrupt
isn’t cleared), leading to severe system-level faults.
When the xHC controller reports HCE in the interrupt handler, the driver
only logs a warning and assumes xHC activity will stop as stated in xHCI
specification. An interrupt storm does however continue on some hosts
even after HCE, and only ceases after manually disabling xHC interrupt
and stopping the controller by calling xhci_halt().
Add xhci_halt() to xhci_irq() function where STS_HCE status is checked,
mirroring the existing error handling pattern used for STS_FATAL errors.
This only fixes the interrupt storm. Proper HCE recovery requires resetting
and re-initializing the xHC.
CC: stable@vger.kernel.org
Signed-off-by: Dayu Jiang <jiangdayu@xiaomi.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20260304223639.3882398-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here is the big set of USB and Thunderbolt driver updates for 6.19-rc1.
Nothing major here, just lots of tiny updates for most of the common USB
drivers. Included in here are:
- more xhci driver updates and fixes
- Thunderbolt driver cleanups
- usb serial driver updates
- typec driver updates
- USB tracepoint additions
- dwc3 driver updates, including support for Apple hardware
- lots of other smaller driver updates and cleanups
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaTTLNQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn29wCcD23xCSWuRIKHfSWlefCwptHKT0AAoM0NJdeO
C4/4pui5ifTu1g5Vud1r
=Sm/3
-----END PGP SIGNATURE-----
Merge tag 'usb-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt updates from Greg KH:
"Here is the big set of USB and Thunderbolt driver updates for
6.19-rc1. Nothing major here, just lots of tiny updates for most of
the common USB drivers. Included in here are:
- more xhci driver updates and fixes
- Thunderbolt driver cleanups
- usb serial driver updates
- typec driver updates
- USB tracepoint additions
- dwc3 driver updates, including support for Apple hardware
- lots of other smaller driver updates and cleanups
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (161 commits)
usb: gadget: tegra-xudc: Always reinitialize data toggle when clear halt
USB: serial: option: move Telit 0x10c7 composition in the right place
USB: serial: option: add Telit Cinterion FE910C04 new compositions
usb: typec: ucsi: fix use-after-free caused by uec->work
usb: typec: ucsi: fix probe failure in gaokun_ucsi_probe()
usb: dwc3: core: Remove redundant comment in core init
usb: phy: Initialize struct usb_phy list_head
USB: serial: option: add Foxconn T99W760
usb: usb-storage: No additional quirks need to be added to the EL-R12 optical drive.
usb: typec: hd3ss3220: Enable VBUS based on ID pin state
dt-bindings: usb: ti,hd3ss3220: Add support for VBUS based on ID state
usb: typec: anx7411: add WQ_PERCPU to alloc_workqueue users
USB: add WQ_PERCPU to alloc_workqueue users
dt-bindings: usb: dwc3-xilinx: Describe the reset constraint for the versal platform
drivers/usb/storage: use min() instead of min_t()
usb: raw-gadget: cap raw_io transfer length to KMALLOC_MAX_SIZE
usb: ohci-da8xx: remove unused platform data
usb: gadget: functionfs: use dma_buf_unmap_attachment_unlocked() helper
usb: uas: reduce time under spinlock
usb: dwc3: eic7700: Add EIC7700 USB driver
...
Convert single bit-field macros to simple masks. The change makes the
masks more universal. Multi bit-field macros are changed in the next
commit. After both changes, all masks in xhci-caps.h will follow the
same format. I plan to introduce this change to all xhci macros.
Bit shift operations on a 32-bit signed can be problematic on some
architectures. Instead use BIT() macro, which returns a 64-bit unsigned
value. This ensures that the shift operation is performed on an unsigned
type, which is safer and more portable across different architectures.
Using unsigned integers for bit shifts avoids issues related to sign bits
and ensures consistent behavior.
Switch from 32-bit to 64-bit?
As far as I am aware, this does not cause any issues.
Performing bitwise operations between 32 and 64 bit values, the smaller
operand is promoted to match the size of the larger one, resulting in a
64-bit operation. This promotion extends the 32-bit value to 64 bits,
by zero-padding (for unsigned).
Will the change to 64-bit slow down the xhci driver?
On a 64-bit architecture - No. On a 32-bit architecture, yes? but in my
opinion the performance decrease does not outweigh the readability and
other benefits of using BIT() macro.
Why not use FIELD_GET() and FIELD_PREP()?
While they can be used for single bit macros, I prefer to use simple
bitwise operation directly. Because, it takes less space, is less overhead
and is as clear as if using FIELD_GET() and FIELD_PREP().
Why not use test_bit() macro?
Same reason as with FIELD_GET() and FIELD_PREP().
Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-23-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The IST is represented by bits 2:0, with bit 3 indicating the unit of
measurement, Frames or Microframes. Introduce xhci_ist_microframes(),
which returns the IST value in Microframes, simplifying the code and
reducing duplication.
Improve documentation in xhci-caps.h to clarify the IST register specifics,
including the unit conversion details. These change removes the need to
explain it each time the IST values is retrieved.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-20-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 32-bit read-only HCSPARAMS1 register contains the following fields:
Bits 7:0 - Number of Device Slots (MaxSlots)
Bits 18:8 - Number of Interrupters (MaxIntrs)
Bits 23:19 - Reserved
Bits 31:24 - Number of Ports (MaxPorts)
Since the register value is constant for the lifetime of the controller,
it is cached in 'xhci->hcs_params1'. However, platform drivers may
override the number of interrupters through a separate variable,
'xhci->max_interrupters', leaving only the maximum slots and ports values
still derived from the cached register.
To simplify the code and improve readability, replace 'xhci->hcs_params1'
with two dedicated 'u8' fields: 'xhci->max_slots' and 'xhci->max_ports'.
These values are initialized once and used directly instead of calling
'HCS_MAX_SLOTS()' and 'HCS_MAX_PORTS()' macros.
This change reduces code clutter without increasing memory usage.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-16-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")
Switch to using system_percpu_wq because system_wq is going away as part of
a workqueue restructuring.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-12-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some old HCs ignore transfer ring link TRBs whose chain bit is unset.
This breaks endpoint operation and sometimes makes it execute other
ring's TDs, which may corrupt their buffers or cause unwanted device
action. We avoid this by chaining all link TRBs on affected rings.
Fix an omission which allows them to be unchained by cancelling TDs.
The patch was tested by reproducing this condition on an isochronous
endpoint (non-power-of-two TDs are sometimes split not to cross 64K)
and printing link TRBs in trb_to_noop() on good and buggy HCs.
Actual hardware malfunction is rare since it requires Missed Service
Error shortly before the unchained link TRB, at least on NEC and AMD.
I have never seen it after commit bb0ba4cb10 ("usb: xhci: Apply the
link chain quirk on NEC isoc endpoints"), but it's Russian roulette
and I can't test all affected hosts and workloads. Fairly often MSEs
happen after cancellation because the endpoint was stopped.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-11-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xHCI 4.8.3 recommends that software should simply assume endpoints to
halt after certain events, without looking at the Endpoint Context for
confirmation, because HCs may be slow to update that.
While no cases of such "slowness" appear to be known, different problem
exists on AMD Promontory chipsets: they may halt and generate a transfer
event, but fail to ever update the Endpoint Context at all, at least not
until some command is queued and fails with Context State Error. This is
easily triggered by disconnecting D- of a full speed serial device.
Possibly similar bug in non-AMD hardware has been reported to linux-usb.
In such case, failed TD is given back without erasing from the ring and
endpoint isn't reset. If some URB is unlinked later, Stop Endpoint fails
and its handler resets the endpoint. On next submission it will restart
on the stale TD. Outcome is UAF on success, or another halt on error and
then Dequeue doesn't move and URBs are stuck. Unlinking and resubmitting
the URBs causes unlimited ring expansion if the situation repeats.
This can be solved by ignoring Endpoint Context State and trusting that
endpoints halt when required, except one known case in ancient hardware.
The check for "Already resolving halted ep" becomes redundant, because
for these completion codes we now jump to xhci_handle_halted_endpoint()
which deals with pending EP_HALTED internally.
Link: https://lore.kernel.org/linux-usb/20250311234139.0e73e138@foxbook/
Link: https://lore.kernel.org/linux-usb/20250918055527.4157212-1-zhangjinpeng@kylinos.cn/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-10-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a dedicated helper function to read the USB Port Status and Control
(PORTSC) register. This complements xhci_portsc_writel() and improves code
clarity by providing a clear counterpart for reading the register.
Suggested-by: Peter Chen <peter.chen@kernel.org>
Reviewed-by: Peter Chen <peter.chen@kerne.org>
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-7-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The trb_in_td() checking is quite complex, especially when checking for
TRBs in ranges that can span several segments.
Simplify the search by creating a position index for each TRB on the
ring, and just compare the position indexes.
Add a more generic dma_in_range() helper that checks if a trb dma
address is in the range between a start and end trb and call it from
trb_in_td()
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a xhci_dma_to_trb() helper, and use it to find the transfer TRB
early in handle_tx_event() based on the dma address found in the
event TRB.
With this helper we can avoid using 'ep_seg' transfer TRB segment
variable as both a a boolean to indicate if the transfer TRB is part
of the next queued TD, and to actually find the transfer TRB based
on ep_seg and ep_trb_dma.
This is a first step in reworking and cleaning up trb_in_td() and
handle_tx_event()
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251119142417.2820519-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A usb device caught behind a link in ss.Inactive error state needs to
be reset to recover. A VDEV_PORT_ERROR flag is used to track this state,
preventing new transfers from being queued until error is cleared.
This flag may be left uncleared if link goes to error state between two
resets, and print the following message:
"xhci_hcd 0000:00:14.0: Can't queue urb, port error, link inactive"
Fix setting and clearing the flag.
The flag is cleared after hub driver has successfully reset the device
when hcd->reset_device is called. xhci-hcd issues an internal "reset
device" command in this callback, and clear all flags once the command
completes successfully.
This command may complete with a context state error if slot was recently
reset and is already in the defauilt state. This is treated as a success
but flag was left uncleared.
The link state field is also unreliable if port is currently in reset,
so don't set the flag in active reset cases.
Also clear the flag immediately when link is no longer in ss.Inactive
state and port event handler detects a completed reset.
This issue was discovered while debugging kernel bugzilla issue 220491.
It is likely one small part of the problem, causing some of the failures,
but root cause remains unknown
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220491
Fixes: b8c3b71808 ("usb: xhci: Don't try to recover an endpoint if port is in error state.")
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://patch.msgid.link/20251107162819.1362579-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Address the naming and usage of the TR Dequeue Pointer mask in the xhci
driver. The Endpoint Context Field at offset 0x08 is defined as follows:
Bit 0 Dequeue Cycle State (DCS)
Bits 3:1 RsvdZ (Reserved and Zero)
Bits 63:4 TR Dequeue Pointer
When extracting the TR Dequeue Pointer for an Endpoint without Streams,
in xhci_handle_cmd_set_deq(), the inverted Dequeue Cycle State mask
(~EP_CTX_CYCLE_MASK) is used, inadvertently including the Reserved bits.
Although bits 3:1 are typically zero, using the incorrect mask could cause
issues.
The existing mask, named "SCTX_DEQ_MASK," is misleading because "SCTX"
implies exclusivity to Stream Contexts, whereas the TR Dequeue Pointer is
applicable to both Stream and non-Stream Contexts.
Rename the mask to "TR_DEQ_PTR_MASK", utilize GENMASK_ULL() macro and use
the mask when handling the TR Dequeue Pointer field.
Function xhci_get_hw_deq() returns the Endpoint Context Field 0x08, either
directly from the Endpoint context or a Stream.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250917210726.97100-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Retries are no longer gated by a quirk, so remove that part.
Add a brief explanation of the timeout.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250917210726.97100-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 28a76fcc4c.
No actual HW bugs are known where Endpoint Context shows Running state
but Stop Endpoint fails repeatedly with Context State Error and leaves
the endpoint state unchanged. Stop Endpoint retries on Running EPs have
been performed since early 2021 with no such issues reported so far.
Trying to handle this hypothetical case brings a more realistic danger:
if Stop Endpoint fails on an endpoint which hasn't yet started after a
doorbell ring and enough latency occurs before this completion event is
handled, the driver may time out and begin removing cancelled TDs from
a running endpoint, even though one more retry would stop it reliably.
Such high latency is rare but not impossible, and removing TDs from a
running endpoint can cause more damage than not giving back a cancelled
URB (which wasn't happening anyway). So err on the side of caution and
revert to the old policy of always retrying if the EP appears running.
[Remove stable tag as we are dealing with theoretical cases -Mathias]
Fixes: 28a76fcc4c ("usb: xhci: Avoid Stop Endpoint retry loop if the endpoint seems Running")
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250917210726.97100-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sakari Ailus <sakari.ailus@linux.intel.com> says:
This series enables support for eUSB2 Double Isochronous IN Bandwidth UVC
devices specified in 'USB 2.0 Double Isochronous IN Bandwidth' ECN. In
short, it adds support for new integrated USB2 webcams that can send twice
the data compared to conventional USB2 webcams.
These devices are identified by the device descriptor bcdUSB 0x0220 value.
They have an additional eUSB2 Isochronous Endpoint Companion Descriptor,
and a zero max packet size in regular isoc endpoint descriptor. Support
for parsing that new descriptor was added in commit
c749f058b4 ("USB: core: Add eUSB2 descriptor and parsing in USB core")
This series adds support to UVC, USB core, and xHCI to identify eUSB2
double isoc devices, and allow and set proper max packet, iso frame desc
sizes, bytes per interval, and other values in URBs and xHCI endpoint
contexts needed to support the double data rates for eUSB2 double isoc
devices.
since v4:
https://lore.kernel.org/linux-usb/20250812132445.3185026-1-sakari.ailus@linux.intel.com
- New patch: use le16_to_cpu() to access endpoint descriptor's
wMaxPacketSize field, which is an __le16. This isn't a bugfix as the
value was compared to 0.
- New patch: add USB device speed check for eUSB2 isochronous endpoint
companion parsing. The check is then removed from sites checking the
existence of the companion (through companion's bDescriptorType field,
which is non-zero for valid descriptors).
- New patch: do not parse eUSB2 isoc double BW companion descriptor on
interrupt or OUT endpoints. It is not supposed to be found there,
according to the ECN.
- Rename usb_endpoint_max_isoc_bpi() as
usb_endpoint_max_periodic_payload() and move it right after
usb_maxpacket().
- Fixed @ep reference in kernel-doc documentation for
usb_endpoint_max_periodic_payload().
- In usb_endpoint_max_periodic_payload(), call struct usb_device pointer
argument "udev" instead of "dev", to align with naming elsewhere.
- Add support for interrupt endpoints in
usb_endpoint_max_periodic_payload(); eUSB2 double isoc BW is still
limited to isochronous endpoints though.
- In usb_endpoint_max_periodic_payload(), remove the separate case for
USB_SPEED_HIGH as the check is already done in parsing the eUSB isoc
double BW companion, which is checked for.
- New patch: use usb_endpoint_max_periodic_payload() in xHCI driver,
replacing xhci_get_max_esit_payload().
- Check non-zero bDescriptorType field of ep->eusb2_isoc_ep_comp instead
of dwBytesPerInterval value exceeding 3072, where
xhci_eusb2_is_isoc_bw_double() was used. This aligns the checks of eUSB2
isochronous double bandwidth support for an endpoint.
- New patch: introduce usb_endpoint_is_hs_isoc_double() to figure out
whether an endpoint uses isochronous double bandwidth and use the
function in the xHCI driver and the usb core.
xhci_eusb2_is_isoc_bw_double() is dropped, as well as the
MAX_ISOC_XFER_SIZE_HS macro. usb_endpoint_is_hs_isoc_double() also
includes check for bcdUSB == 0x220, to anticipate adding support for
eUSB2V2.
- Merge condition for checking eUSB2 isoc double bw support for
xHCI/endpoint in xhci_get_endpoint_mult().
- Improve comment regarding maximum packet size bits 12:11 in
xhci_get_endpoint_max_burst().
- Aligned subject prefixes with the recent patches to the same files.
since v3:
https://lore.kernel.org/linux-usb/20250807055355.1257029-1-sakari.ailus@linux.intel.com/
- Use spaces in aligning macro body for HCC2_EUSB2_DIC() (1st patch).
- Move usb_endpoint_max_isoc_bpi() to drivers/usb/core/usb.c (3rd patch).
since v2:
https://lore.kernel.org/linux-usb/20250711083413.1552423-1-sakari.ailus@linux.intel.com
- Use ep->eusb2_isoc_ep_comp.bDescriptorType to determined whether the
eUSB2 isochronous endpoint companion descriptor exists.
- Clean up eUSB2 double isoc bw maxp calculation.
- Drop le16_to_cpu(udev->descriptor.bcdUSB) == 0x220 check from
xhci_eusb2_is_isoc_bw_double() -- it's redundant as
ep->eusb2_isoc_ep_comp.dwBytesPerInterval will be zero otherwise.
- Add kernel-doc documentation for usb_endpoint_max_isoc_bpi().
- Check the endpoint has IN direction in usb_endpoint_max_isoc_bpi() and
usb_submit_urb() as a condition for eUSB2 isoc double bw.
since v1:
https://lore.kernel.org/linux-usb/20250616093730.2569328-2-mathias.nyman@linux.intel.com
- Introduce uvc_endpoint_max_isoc_bpi() to obtain maximum bytes per
interval value for an endpoint, in a new patch (3rd). This code has been
slightly reworked from the instance in the UVC driver, including support
for SuperSpeedPlus Isochronous Endpoint Companion.
- Use usb_endpoint_max_isoc_bpi() in the UVC driver instead of open-coding
eUSB2 support there, also drop now-redundant uvc_endpoint_max_bpi().
- Use u32 for maximum bpi and related information in the UVC driver -- the
value could be larger than a 16-bit type can hold.
- Assume max in usb_submit_urb() is a natural number as
usb_endpoint_maxp() returns only natural numbers (2nd patch).
Link: https://lore.kernel.org/r/20250820143824.551777-1-sakari.ailus@linux.intel.com
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Detect eUSB2 double isoc bw capable hosts and devices, and set the proper
xhci endpoint context values such as 'Mult', 'Max Burst Size', and 'Max
ESIT Payload' to enable the double isochronous bandwidth endpoints.
Intel xHC uses the endpoint context 'Mult' field for eUSB2 isoc
endpoints even if hosts supporting Large ESIT Payload Capability should
normally ignore the mult field.
Signed-off-by: Rai, Amardeep <amardeep.rai@intel.com>
Co-developed-by: Kannappan R <r.kannappan@intel.com>
Signed-off-by: Kannappan R <r.kannappan@intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Co-developed-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Co-developed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250820143824.551777-8-sakari.ailus@linux.intel.com
xHC controller may immediately reuse a slot_id after it's disabled,
giving it to a new enumerating device before the xhci driver freed
all resources related to the disabled device.
In such a scenario, device-A with slot_id equal to 1 is disconnecting
while device-B is enumerating, device-B will fail to enumerate in the
follow sequence.
1.[device-A] send disable slot command
2.[device-B] send enable slot command
3.[device-A] disable slot command completed and wakeup waiting thread
4.[device-B] enable slot command completed with slot_id equal to 1 and
wakeup waiting thread
5.[device-B] driver checks that slot_id is still in use (by device-A) in
xhci_alloc_virt_device, and fail to enumerate due to this
conflict
6.[device-A] xhci->devs[slot_id] set to NULL in xhci_free_virt_device
To fix driver's slot_id resources conflict, clear xhci->devs[slot_id] and
xhci->dcbba->dev_context_ptrs[slot_id] pointers in the interrupt context
when disable slot command completes successfully. Simultaneously, adjust
function xhci_free_virt_device to accurately handle device release.
[minor smatch warning and commit message fix -Mathias]
Cc: stable@vger.kernel.org
Fixes: 7faac1953e ("xhci: avoid race between disable slot command and host runtime suspend")
Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250819125844.2042452-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When encounters some errors like these:
xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command
xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment
usb usb5-port6: couldn't allocate usb_device
It's hard to know whether xhc_state is dying or halted. So it's better
to print xhc_state's value which can help locate the resaon of the bug.
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a USB4 dock is unplugged from a system it won't respond to ring
events. The PCI core handles the surprise removal event and notifies
all PCI drivers. The XHCI PCI driver sets a flag that the device is
being removed as well.
When that flag is set don't show messages in the cleanup path for
marking the controller dead.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250717073107.488599-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 6ccb83d6c4.
Commit 6ccb83d6c4 ("usb: xhci: Implement xhci_handshake_check_state()
helper") was introduced to workaround watchdog timeout issues on some
platforms, allowing xhci_reset() to bail out early without waiting
for the reset to complete.
Skipping the xhci handshake during a reset is a dangerous move. The
xhci specification explicitly states that certain registers cannot
be accessed during reset in section 5.4.1 USB Command Register (USBCMD),
Host Controller Reset (HCRST) field:
"This bit is cleared to '0' by the Host Controller when the reset
process is complete. Software cannot terminate the reset process
early by writinga '0' to this bit and shall not write any xHC
Operational or Runtime registers until while HCRST is '1'."
This behavior causes a regression on SNPS DWC3 USB controller with
dual-role capability. When the DWC3 controller exits host mode and
removes xhci while a reset is still in progress, and then tries to
configure its hardware for device mode, the ongoing reset leads to
register access issues; specifically, all register reads returns 0.
These issues extend beyond the xhci register space (which is expected
during a reset) and affect the entire DWC3 IP block, causing the DWC3
device mode to malfunction.
Cc: stable <stable@kernel.org>
Fixes: 6ccb83d6c4 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Signed-off-by: Roy Luo <royluo@google.com>
Link: https://lore.kernel.org/r/20250522190912.457583-3-royluo@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Interrupt Register Set contains Interrupt Management register (IMAN).
The IMAN register contains the following fields:
- Bit 0: Interrupt Pending (IP)
- Bit 1: Interrupt Enable (IE)
- Bits 31:2: RsvdP (Reserved and Preserved)
Tn the xhci driver, the pointer currently named 'irq_pending' refers to the
IMAN register. However, the name "irq_pending" only describes one of the
fields within the IMAN register, rather than the entire register itself.
To improve clarity and better align with the xHCI specification,
the pointer is renamed to 'iman'.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250515135621.335595-22-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xHCI specification 1.2, section 5.5.2.1.
Interrupt Pending bit is RW1C (Write-1-to-clear), which means that
writing '0' to is has no effect and is removed.
The Interrupt Pending (IP) bit is cleared at the start of interrupt
handling; xhci_clear_interrupt_pending(). This could theoretically
cause a new interrupt to be issued before the xhci driver reaches
the interrupter disable functions.
To address this, the IP bit is read after Interrupt Enable is
disabled, and a debug message is issued if the IP bit is still set.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250515135621.335595-18-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add read call to guarantee that the write to the IMAN register has
been flushed.
xHCI specification 1.2, section 5.5.2.1, Note:
"Most systems have write buffers that minimize overhead, but this may
require a read operation to guarantee that the write has been flushed
from the posted buffer."
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250515135621.335595-17-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In many projects, you need to obtain the available bandwidth of the
xhci roothub port. Refer to xhci rev1_2 and use the TRB_GET_BW
command to obtain it.
hardware tested:
03:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
(prog-if 30 [XHCI])
Subsystem: Huawei Technologies Co., Ltd. Raven USB 3.1
Flags: bus master, fast devsel, latency 0, IRQ 30
Memory at c0300000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=8 Masked-
Kernel driver in use: xhci_hcd
test progress:
1. cd /sys/kernel/debug/usb/xhci/0000:03:00.3/port_bandwidth# ls
FS_BW HS_BW SS_BW
2. test fs speed device
cat FS_BW
port[1] available bw: 90%.
port[2] available bw: 90%.
port[3] available bw: 90%.
port[4] available bw: 90%.
port[5] available bw: 0%.
port[6] available bw: 0%.
port[7] available bw: 0%.
port[8] available bw: 0%.
plug in fs usb audio ID 0d8c:013c
cat FS_BW
port[1] available bw: 76%.
port[2] available bw: 76%.
port[3] available bw: 76%.
port[4] available bw: 76%.
port[5] available bw: 0%.
port[6] available bw: 0%.
port[7] available bw: 0%.
port[8] available bw: 0%.
3. test hs speed device
cat HS_BW
port[1] available bw: 79%.
port[2] available bw: 79%.
port[3] available bw: 79%.
port[4] available bw: 79%.
port[5] available bw: 0%.
port[6] available bw: 0%.
port[7] available bw: 0%.
port[8] available bw: 0%.
plug in hs usb video ID 0408:1040
cat HS_BW
port[1] available bw: 39%.
port[2] available bw: 39%.
port[3] available bw: 39%.
port[4] available bw: 39%.
port[5] available bw: 0%.
port[6] available bw: 0%.
port[7] available bw: 0%.
port[8] available bw: 0%.
4.cat SS_BW
port[1] available bw: 0%.
port[2] available bw: 0%.
port[3] available bw: 0%.
port[4] available bw: 0%.
port[5] available bw: 90%.
port[6] available bw: 90%.
port[7] available bw: 90%.
port[8] available bw: 90%.
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250515135621.335595-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The error message above used to span two lines, rarely more. A recent
cleanup concentrated useful information from it in one line, but then
it added printing the list of all ring segments, which is even longer
than before. It provides no new information in usual cases and little
in unusual ones, but adds noise to the log. Drop it.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250515135621.335595-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
VIA VL805 doesn't bother updating the EP Context cycle bit when the
endpoint halts. This is seen by patching xhci_move_dequeue_past_td()
to print the cycle bits of the EP Context and the TRB at hw_dequeue
and then disconnecting a flash drive while reading it. Actual cycle
state is random as expected, but the EP Context bit is always 1.
This means that the cycle state produced by this function is wrong
half the time, and then the endpoint stops working.
Work around it by looking at the cycle bit of TD's end_trb instead
of believing the Endpoint or Stream Context. Specifically:
- rename cycle_found to hw_dequeue_found to avoid confusion
- initialize new_cycle from td->end_trb instead of hw_dequeue
- switch new_cycle toggling to happen after end_trb is found
Now a workload which regularly stalls the device works normally for
a few hours and clearly demonstrates the HW bug - the EP Context bit
is not updated in a new cycle until Set TR Dequeue overwrites it:
[ +0,000298] sd 10:0:0:0: [sdc] Attached SCSI disk
[ +0,011758] cycle bits: TRB 1 EP Ctx 1
[ +5,947138] cycle bits: TRB 1 EP Ctx 1
[ +0,065731] cycle bits: TRB 0 EP Ctx 1
[ +0,064022] cycle bits: TRB 0 EP Ctx 0
[ +0,063297] cycle bits: TRB 0 EP Ctx 0
[ +0,069823] cycle bits: TRB 0 EP Ctx 0
[ +0,063390] cycle bits: TRB 1 EP Ctx 0
[ +0,063064] cycle bits: TRB 1 EP Ctx 1
[ +0,062293] cycle bits: TRB 1 EP Ctx 1
[ +0,066087] cycle bits: TRB 0 EP Ctx 1
[ +0,063636] cycle bits: TRB 0 EP Ctx 0
[ +0,066360] cycle bits: TRB 0 EP Ctx 0
Also tested on the buggy ASM1042 which moves EP Context dequeue to
the next TRB after errors, one problem case addressed by the rework
that implemented this loop. In this case hw_dequeue can be enqueue,
so simply picking the cycle bit of TRB at hw_dequeue wouldn't work.
Commit 5255660b20 ("xhci: add quirk for host controllers that
don't update endpoint DCS") tried to solve the stale cycle problem,
but it was more complex and got reverted due to a reported issue.
Cc: Jonathan Bell <jonathan@raspberrypi.org>
Cc: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250505125630.561699-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo
t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U
C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u
EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy
FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9
FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ
b/dCGNg=
=xDGd
-----END PGP SIGNATURE-----
Merge 6.15-rc4 into usb-next
We need the USB fixes in here as well, and this resolves the following
merge conflicts that were reported in linux-next:
drivers/usb/chipidea/ci_hdrc_imx.c
drivers/usb/host/xhci.h
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This check is performed before prepare_transfer() and prepare_ring(), so
enqueue can already point at the final link TRB of a segment. And indeed
it will, some 0.4% of times this code is called.
Then enqueue + 1 is an invalid pointer. It will crash the kernel right
away or load some junk which may look like a link TRB and cause the real
link TRB to be replaced with a NOOP. This wouldn't end well.
Use a functionally equivalent test which doesn't dereference the pointer
and always gives correct result.
Something has crashed my machine twice in recent days while playing with
an Etron HC, and a control transfer stress test ran for confirmation has
just crashed it again. The same test passes with this patch applied.
Fixes: 5e1c67abc9 ("xhci: Fix control transfer error on Etron xHCI host")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reviewed-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Link: https://lore.kernel.org/r/20250410151828.2868740-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A Short Packet event before the last TRB of a TD is followed by another
event on the final TRB on spec-compliant HCs, which is most of them.
A 'last_td_was_short' flag was added to know if a TD has just completed
as Short Packet and another event is to come. The flag was cleared after
seeing the event (unless no TDs are pending, but that's a separate bug)
or seeing a new TD complete as something other than Short Packet.
A rework replaced the flag with an 'old_trb_comp_code' variable. When
an event doesn't match the pending TD and the previous event was Short
Packet, the new event is silently ignored.
To preserve old behavior, 'old_trb_comp_code' should be cleared at this
point, but instead it is being set to current comp code, which is often
Short Packet again. This can cause more events to be silently ignored,
even though they are no longer connected with the old TD that completed
short and indicate a serious problem with the driver or the xHC.
Common device classes like UAC in async mode, UVC, serial or the UAS
status pipe complete as Short Packet routinely and could be affected.
Clear 'old_trb_comp_code' to zero, which is an invalid completion code
and the same value the variable starts with. This restores original
behavior on Short Packet and also works for illegal Etron events, which
the code has been extended to cover too.
Fixes: b331a3d809 ("xhci: Handle spurious events on Etron host isoc enpoints")
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250410151828.2868740-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As part of xHCI bus suspend, the xHCI is halted. However, if there are
pending events in the secondary event ring, it is observed that the xHCI
controller stops responding to further commands upon host or device
initiated bus resume. Iterate through all pending events and update the
dequeue pointer to the beginning of the event ring.
Tested-by: Puma Hsu <pumahsu@google.com>
Tested-by: Daehwan Jung <dh10.jung@samsung.com>
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250409194804.3773260-3-quic_wcheng@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nothing prevents a broken HC from claiming that an endpoint is Running
and repeatedly rejecting Stop Endpoint with Context State Error.
Avoid infinite retries and give back cancelled TDs.
No such cases known so far, but HCs have bugs.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250311154551.4035726-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the device stalls an endpoint, current TD is assigned -EPIPE
status and Reset Endpoint is queued. If a Stop Endpoint is pending
at the time, it will run before Reset Endpoint and fail due to the
stall. Its handler will change TD's status to -EPROTO before Reset
Endpoint handler runs and initiates giveback.
Check if the stall has already been handled and don't try to do it
again. Since xhci_handle_halted_endpoint() performs this check too,
not overwriting td->status is the only difference.
I haven't seen this case yet, but I have seen a related one where
the xHC has already executed Reset Endpoint, EP Context state is
now Stopped and EP_HALTED is set. If the xHC took a bit longer to
execute Reset Endpoint, said case would become this one.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250311154551.4035726-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unplugging a USB3.0 webcam from Etron hosts while streaming results
in errors like this:
[ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650
[ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670
Etron xHC generates two transfer events for the TRB if an error is
detected while processing the last TRB of an isoc TD.
The first event can be any sort of error (like USB Transaction or
Babble Detected, etc), and the final event is Success.
The xHCI driver will handle the TD after the first event and remove it
from its internal list, and then print an "Transfer event TRB DMA ptr
not part of current TD" error message after the final event.
Commit 5372c65e13 ("xhci: process isoc TD properly when there was a
transaction error mid TD.") is designed to address isoc transaction
errors, but unfortunately it doesn't account for this scenario.
This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success
event follows a 'short transfer' event, but the TD the event points to
is already given back.
Expand the spurious success 'short transfer' event handling to cover
the spurious success after error on Etron hosts.
Kuangyi Chiang reported this issue and submitted a different solution
based on using error_mid_td. This commit message is mostly taken
from that patch.
Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Closes: https://lore.kernel.org/linux-usb/20241028025337.6372-6-ki.chiang65@gmail.com/
Tested-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Tested-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-16-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Extract a block of code copied from inc_enq() into a separate function
and call it from inc_enq() and the other function which used this code.
Remove the pointless 'next' variable which only aliases ring->enqueue.
Note: I don't know if any 0.95 xHC ever reached series production, but
"AMD 0.96 host" appears to be the "Llano" family APU. Example dmesg at
https://linux-hardware.org/?probe=79d5cfd4fd&log=dmesg
pci 0000:00:10.0: [1022:7812] type 00 class 0x0c0330
hcc params 0x014042c3 hci version 0x96 quirks 0x0000000000000608
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-15-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ensure that an endpoint halted due to device STALL is not
restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to
the device.
The host side of the endpoint may otherwise be started early by the
'Set TR Deq' command completion handler which is called if dequeue
is moved past a cancelled or halted TD.
Prevent this with a new flag set for bulk and interrupt endpoints
when a Stall Error is received. Clear it in hcd->endpoint_reset()
which is called after Clear_Feature(ENDPOINT_HALT) is sent.
Also add a debug message if a class driver queues a new URB after the
STALL. Note that class driver might not be aware of the STALL
yet when it submits the URB as URBs are given back in BH.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-13-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Function trb_in_td() currently includes debug capabilities that are
triggered when its debug argument is set to true. The only consumer of
these debug capabilities is handle_tx_event(), which calls trb_in_td()
twice, once for its primary functionality and a second time solely for
debugging purposes if the first call returns 'NULL'.
This approach is inefficient and can lead to confusion, as trb_in_td()
executes the same code with identical arguments twice, differing only in
the debug output during the second execution.
To enhance clarity and efficiency, move the debug capabilities out of
trb_in_td() and integrates them directly into handle_tx_event().
This change reduces the argument count of trb_in_td() and ensures that
debug steps are executed only when necessary, streamlining the function's
operation.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-12-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Relocate trb_in_td() and marks it as static, as it's exclusively utilized
in xhci-ring.c. This adjustment lays the groundwork for future rework of
the function.
The function's logic remains unchanged; only its access specifier is
altered to static and a redundant "else" is removed on line 325
(due to checkpatch.pl complaining).
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-11-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If skipping is deferred to events other than Missed Service Error itsef,
it means we are running on an xHCI 1.0 host and don't know how many TDs
were missed until we reach some ordinary transfer completion event.
And in case of ring xrun, we can't know where the xrun happened either.
If we skip all pending TDs, we may prematurely give back TDs added after
the xrun had occurred, risking data loss or buffer UAF by the xHC.
If we skip none, a driver may become confused and stop working when all
its URBs are missed and appear to be "in flight" forever.
Skip exactly one TD on each xrun event - the first one that was missed,
as we can now be sure that the HC has finished processing it. Provided
that one more TD is queued before any subsequent doorbell ring, it will
become safe to skip another TD by the time we get an xrun again.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-8-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xHCI spec rev. 1.0 allowed the TRB pointer of Missed Service events
to be NULL. Having no idea which of the queued TDs were missed and
which are waiting, we can only set a flag to skip missed TDs later.
But HCs are also allowed to give us pointer to the last missed TRB,
and this became mandatory in spec rev. 1.1 and later.
Use this pointer, if available, to immediately skip all missed TDs.
This reduces latency and risk of skipping-related bugs, because we
can now leave the skip flag cleared for future events.
Handle Missed Service Error events as 'error mid TD', if applicable,
because rev. 1.0 spec excplicitly says so in notes to 4.10.3.2 and
later revs in 4.10.3.2 and 4.11.2.5.2. Notes to 4.9.1 seem to apply.
Tested on ASM1142 and ASM3142 v1.1 xHCs which provide TRB pointers.
Tested on AMD, Etron, Renesas v1.0 xHCs which provide TRB pointers.
Tested on NEC v0.96 and VIA v1.0 xHCs which send a NULL pointer.
Change inspired by a discussion about realtime USB audio.
Link: https://lore.kernel.org/linux-usb/76e1a191-020d-4a76-97f6-237f9bd0ede0@gmx.net/T/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-7-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The TRB pointer of these events points at enqueue at the time of error
occurrence on xHCI 1.1+ HCs or it's NULL on older ones. By the time we
are handling the event, a new TD may be queued at this ring position.
I can trigger this race by rising interrupt moderation to increase IRQ
handling delay. Similar delay may occur naturally due to system load.
If this ever happens after a Missed Service Error, missed TDs will be
skipped and the new TD processed as if it matched the event. It could
be given back prematurely, risking data loss or buffer UAF by the xHC.
Don't complete TDs on xrun events and don't warn if queued TDs don't
match the event's TRB pointer, which can be NULL or a link/no-op TRB.
Don't warn if there are no queued TDs at all.
Now that it's safe, also handle xrun events if the skip flag is clear.
This ensures completion of any TD stuck in 'error mid TD' state right
before the xrun event, which could happen if a driver submits a finite
number of URBs to a buggy HC and then an error occurs on the last TD.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-6-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Missed Service Error after an error mid TD means that the failed TD has
already been passed by the xHC without acknowledgment of the final TRB,
a known hardware bug. So don't wait any more and give back the TD.
Reproduced on NEC uPD720200 under conditions of ludicrously bad USB link
quality, confirmed to behave as expected using dynamic debug.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Up until commit d56b0b2ab1 ("usb: xhci: ensure skipped isoc TDs are
returned when isoc ring is stopped") in v6.11, the driver didn't skip
missed isochronous TDs when handling Stoppend and Stopped - Length
Invalid events. Instead, it erroneously cleared the skip flag, which
would cause the ring to get stuck, as future events won't match the
missed TD which is never removed from the queue until it's cancelled.
This buggy logic seems to have been in place substantially unchanged
since the 3.x series over 10 years ago, which probably speaks first
and foremost about relative rarity of this case in normal usage, but
by the spec I see no reason why it shouldn't be possible.
After d56b0b2ab1, TDs are immediately skipped when handling those
Stopped events. This poses a potential problem in case of Stopped -
Length Invalid, which occurs either on completed TDs (likely already
given back) or Link and No-Op TRBs. Such event won't be recognized
as matching any TD (unless it's the rare Link TRB inside a TD) and
will result in skipping all pending TDs, giving them back possibly
before they are done, risking isoc data loss and maybe UAF by HW.
As a compromise, don't skip and don't clear the skip flag on this
kind of event. Then the next event will skip missed TDs. A downside
of not handling Stopped - Length Invalid on a Link inside a TD is
that if the TD is cancelled, its actual length will not be updated
to account for TRBs (silently) completed before the TD was stopped.
I had no luck producing this sequence of completion events so there
is no compelling demonstration of any resulting disaster. It may be
a very rare, obscure condition. The sole motivation for this patch
is that if such unlikely event does occur, I'd rather risk reporting
a cancelled partially done isoc frame as empty than gamble with UAF.
This will be fixed more properly by looking at Stopped event's TRB
pointer when making skipping decisions, but such rework is unlikely
to be backported to v6.12, which will stay around for a few years.
Fixes: d56b0b2ab1 ("usb: xhci: ensure skipped isoc TDs are returned when isoc ring is stopped")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function is a remnant from a previous implementation and is now
redundant. There is no longer a need to search for the dequeue pointer,
as both the TRB and segment dequeue pointers are saved within
'queued_deq_seg' and 'queued_deq_ptr'.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250306144954.3507700-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace ternary (condition ? "enable" : "disable") syntax with helpers
from string_choices.h because:
1. Simple function call with one argument is easier to read. Ternary
operator has three arguments and with wrapping might lead to quite
long code.
2. Is slightly shorter thus also easier to read.
3. It brings uniformity in the text - same string.
4. Allows deduping by the linker, which results in a smaller binary
file.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250114-str-enable-disable-usb-v1-2-c8405df47c19@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xHC hosts can pass 24 bits of data with a command completion event TRB
as the completion code only uses 8 bits of the 32 bit status field.
Only Configure Endpoint, and Get Extended Property commands utilize
this "command completion parameter" 24 bit field.
For other command completion events the xHC should keep it RsvdZ.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241227120142.1035206-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>