linux/drivers
willy tarreau 8df2d2d784 net: mvneta: fix Tx interrupt delay
[ Upstream commit aebea2ba0f ]

The mvneta driver sets the amount of Tx coalesce packets to 16 by
default. Normally that does not cause any trouble since the driver
uses a much larger Tx ring size (532 packets). But some sockets
might run with very small buffers, much smaller than the equivalent
of 16 packets. This is what ping is doing for example, by setting
SNDBUF to 324 bytes rounded up to 2kB by the kernel.

The problem is that there is no documented method to force a specific
packet to emit an interrupt (eg: the last of the ring) nor is it
possible to make the NIC emit an interrupt after a given delay.

In this case, it causes trouble, because when ping sends packets over
its raw socket, the few first packets leave the system, and the first
15 packets will be emitted without an IRQ being generated, so without
the skbs being freed. And since the socket's buffer is small, there's
no way to reach that amount of packets, and the ping ends up with
"send: no buffer available" after sending 6 packets. Running with 3
instances of ping in parallel is enough to hide the problem, because
with 6 packets per instance, that's 18 packets total, which is enough
to grant a Tx interrupt before all are sent.

The original driver in the LSP kernel worked around this design flaw
by using a software timer to clean up the Tx descriptors. This timer
was slow and caused terrible network performance on some Tx-bound
workloads (such as routing) but was enough to make tools like ping
work correctly.

Instead here, we simply set the packet counts before interrupt to 1.
This ensures that each packet sent will produce an interrupt. NAPI
takes care of coalescing interrupts since the interrupt is disabled
once generated.

No measurable performance impact nor CPU usage were observed on small
nor large packets, including when saturating the link on Tx, and this
fixes tools like ping which rely on too small a send buffer. If one
wants to increase this value for certain workloads where it is safe
to do so, "ethtool -C $dev tx-frames" will override this default
setting.

This fix needs to be applied to stable kernels starting with 3.10.

Tested-By: Maggie Mae Roxas <maggie.mae.roxas@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16 09:09:43 -08:00
..
accessibility
acpi ACPICA: Update to GPIO region handler interface. 2014-10-05 14:54:11 -07:00
amba
ata sata_fsl: fix error handling of irq_of_parse_and_map 2014-12-16 09:09:42 -08:00
atm
auxdisplay
base sysfs: driver core: Fix glue dir race condition by gdp_mutex 2014-11-14 08:48:01 -08:00
bcma
block sunvdc: don't call VD_OP_GET_VTOC 2014-11-21 09:22:52 -08:00
bluetooth Bluetooth: Fix issue with USB suspend in btusb driver 2014-10-30 09:35:12 -07:00
bus bus: mvebu-mbus: allow several windows with the same target/attribute 2014-06-07 13:25:37 -07:00
cdrom
char random: add and use memzero_explicit() for clearing data 2014-11-14 08:47:55 -08:00
clk clk: spear3xx: Use proper control register offset 2014-07-17 15:58:02 -07:00
clocksource clocksource: Exynos_mct: Register clock event after request_irq() 2014-06-07 13:25:29 -07:00
connector net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-06-26 15:12:37 -04:00
cpufreq cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy 2014-11-14 08:47:58 -08:00
cpuidle cpuidle: Check the result of cpuidle_get_driver() against NULL 2014-04-14 06:42:15 -07:00
crypto crypto: ux500 - make interrupt mode plausible 2014-09-05 16:28:35 -07:00
dca
devfreq
dio
dma ioat: fix tasklet tear down 2014-03-06 21:30:14 -08:00
edac cpc925_edac: Report UE events properly 2014-11-14 08:48:00 -08:00
eisa Revert "EISA: Initialize device before its resources" 2014-02-13 13:47:59 -08:00
extcon extcon: max77693: Fix two NULL pointer exceptions on missing pdata 2014-07-06 18:54:15 -07:00
firewire firewire: cdev: prevent kernel stack leaking into ioctl arguments 2014-11-21 09:22:53 -08:00
firmware firmware: Do not use WARN_ON(!spin_is_locked()) 2014-09-17 09:03:57 -07:00
gpio gpio: mxs: Allow for recursive enable_irq_wake() call 2014-05-13 13:59:45 +02:00
gpu drm/i915: Unlock panel even when LVDS is disabled 2014-12-16 09:09:42 -08:00
hid HID: logitech-dj: prevent false errors to be shown 2014-10-05 14:54:08 -07:00
hsi
hv Drivers: hv: vmbus: Fix a bug in vmbus_open() 2014-10-30 09:35:11 -07:00
hwmon hwmon: (dme1737) Prevent overflow problem when writing large limits 2014-09-05 16:28:35 -07:00
hwspinlock
i2c i2c: davinci: generate STP always when NACK is received 2014-12-16 09:09:42 -08:00
ide
idle x86 idle: Repair large-server 50-watt idle-power regression 2014-01-09 12:24:21 -08:00
iio iio:inkern: fix overwritten -EPROBE_DEFER in of_iio_channel_get_by_name 2014-10-05 14:54:12 -07:00
infiniband iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly 2014-12-06 15:05:49 -08:00
input Input: xpad - use proper endpoint type 2014-12-06 15:05:49 -08:00
iommu iommu/amd: Fix cleanup_domain for mass device removal 2014-09-17 09:03:57 -07:00
ipack
irqchip irqchip: gic: Fix core ID calculation when topology is read from DT 2014-07-28 08:00:06 -07:00
isdn isdnloop: several buffer overflows 2014-04-14 06:42:18 -07:00
leds leds: leds-pwm: properly clean up after probe failure 2014-06-07 13:25:34 -07:00
lguest x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED 2014-11-14 08:47:54 -08:00
macintosh
mailbox
md dm raid: ensure superblock's size matches device's logical block size 2014-11-21 09:22:53 -08:00
media media: smiapp: Only some selection targets are settable 2014-12-16 09:09:42 -08:00
memory
memstick
message mptfusion: enable no_write_same for vmware scsi disks 2014-10-30 09:35:10 -07:00
mfd mfd: rtsx_pcr: Fix MSI enable error handling 2014-11-14 08:47:55 -08:00
misc mei: bus: fix possible boundaries violation 2014-11-21 09:22:55 -08:00
mmc mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response 2014-11-14 08:47:53 -08:00
mtd UBI: add missing kmem_cache_free() in process_pool_aeb error path 2014-11-14 08:47:55 -08:00
net net: mvneta: fix Tx interrupt delay 2014-12-16 09:09:43 -08:00
nfc NFC: microread: Potential overflows in microread_target_discovered() 2014-10-05 14:54:12 -07:00
ntb
nubus
of of/base: Fix PowerPC address parsing hack 2014-12-06 15:05:47 -08:00
oprofile
parisc
parport parport: parport_pc: remove double PCI ID for NetMos 2014-02-06 11:08:15 -08:00
pci PCI/MSI: Add device flag indicating that 64-bit MSIs don't work 2014-12-06 15:05:47 -08:00
pcmcia
pinctrl pinctrl: protect pinctrl_list add 2014-02-20 11:06:11 -08:00
platform dell-wmi: Fix access out of memory 2014-11-21 09:22:55 -08:00
pnp PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures 2014-03-23 21:38:22 -07:00
power power: max17040: Fix NULL pointer dereference when there is no platform_data 2014-02-22 12:41:29 -08:00
pps
ps3
ptp
pwm
rapidio rapidio/tsi721_dma: fix failure to obtain transaction descriptor 2014-08-07 14:30:25 -07:00
regulator regulator: arizona-ldo1: remove bypass functionality 2014-09-17 09:03:57 -07:00
remoteproc
reset
rpmsg
rtc rtc: rtc-at91rm9200: fix infinite wait for ACKUPD irq 2014-06-26 15:12:37 -04:00
s390 s390/chsc: fix SEI usage on old FW levels 2014-05-13 13:59:42 +02:00
sbus bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000 2014-08-14 09:24:16 +08:00
scsi bnx2fc: do not add shared skbs to the fcoe_rx_list 2014-12-06 15:05:49 -08:00
sfi
sh
sn
spi spi: dw: Fix dynamic speed change. 2014-12-06 15:05:49 -08:00
ssb
ssbi
staging staging:iio:ade7758: Remove "raw" from channel name 2014-11-14 08:47:58 -08:00
target target: Don't call TFO->write_pending if data_length == 0 2014-12-06 15:05:49 -08:00
tc
thermal
tty tty: Fix high cpu load if tty is unreleaseable 2014-11-14 08:48:00 -08:00
uio
usb USB: xhci: don't start a halted endpoint before its new dequeue is set 2014-12-06 15:05:48 -08:00
uwb
vfio mm: close PageTail race 2014-04-03 12:01:05 -07:00
vhost vhost: validate vhost_get_vq_desc return value 2014-04-14 06:42:18 -07:00
video framebuffer: fix border color 2014-11-14 08:47:56 -08:00
virt
virtio virtio_pci: fix virtio spec compliance on restore 2014-11-14 08:47:55 -08:00
vlynq
vme VME: Correct read/write alignment algorithm 2014-02-22 12:41:28 -08:00
w1 w1: fix w1_send_slave dropping a slave id 2014-05-06 07:55:28 -07:00
watchdog watchdog: ath79_wdt: avoid spurious restarts on AR934x 2014-07-06 18:54:14 -07:00
xen
zorro
Kconfig
Makefile