Use the mutex in struct mtk_mdp_ctx to protect the
capture and output vb2_queues. This allows to replace
the ad-hoc wait_{prepare, finish} with
vb2_ops_wait_{prepare, finish}.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Currently, this driver does not serialize its video4linux
ioctls, which is a bug, as race conditions might appear.
In addition, video_device and vb2_queue locks are now both
mandatory. Add them, and implement wait_prepare and
wait_finish.
To stay on the safe side, this commit uses a single mutex
for both locks. Better latency can be obtained by separating
these if needed.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
For m2m devices the vdev->queue lock was always taken instead of the
lock for the specific capture or output queue. Now that we pushed
the locking down into __video_do_ioctl() we can pick the correct
lock and potentially improve the performance of m2m devices.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
imx274_write_table() is a mere wrapper (and the only user) to
imx274_regmap_util_write_table_8(). Remove this useless indirection by
merging the two functions into one.
Also get rid of the wait_ms_addr and end_addr parameters since it does
not make any sense to give them any values other than
IMX274_TABLE_WAIT_MS and IMX274_TABLE_END.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
IMX274_DEFAULT_MODE is defined but not used. Start using it, so the
default can be more easily changed without digging into the code.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
After restructuring struct imx274_frmfmt, the mode_index field is
still in use only for two dev_dbg() calls in imx274_s_stream(). Let's
remove it and avoid duplicated information.
Replacing the first usage requires some rather annoying but trivial
pointer math. The other one can be removed entirely since it would
print the same value anyway.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Data about the implemented readout modes is partially stored in
imx274_formats[], the rest is scattered in several arrays. The latter
are then accessed using the mode index, e.g.:
min_frame_len[priv->mode_index]
Consolidate all these data in imx274_formats[], and store a pointer to
the selected mode (i.e. imx274_formats[priv->mode_index]) in the main
device struct. This way code to use these data becomes more readable,
e.g.:
priv->mode->min_frame_len
This removes lots of scaffolding code and keeps data about each mode
in a unique place.
Also remove a parameter to imx274_mode_regs() that is now unused.
While this adds the mode pointer to the device struct, it does not
remove the mode_index from it because mode_index is still used in two
dev_dbg() calls. This will be handled in a follow-up commit.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The current probe function calls v4l2_ctrl_handler_setup() before
initializing the format info. This triggers call paths such as:
imx274_probe -> v4l2_ctrl_handler_setup -> imx274_s_ctrl ->
imx274_set_exposure, where priv->mode_index is accessed before being
assigned.
This is wrong but does not trigger a visible bug because priv is
zero-initialized and 0 is the default value for priv->mode_index. But
this would become a crash in follow-up commits when mode_index is
replaced by a pointer that must always be valid.
Fix the bug before it shows up by initializing struct members early.
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Add a v4l2 sub-device driver for the ak7375 lens voice coil.
This is a voice coil module using the i2c bus to control the
focus position.
ak7375 can write multiple bytes of data at a time. If more
data is received instead of the stop condition after receiving
one byte of data, the address inside the chip is automatically
incremented and the data is written into the next address.
The ak7375 can control the position with 12 bits value and
consists of two 8 bit registers show as below:
register 0x00(AK7375_REG_POSITION):
+---+---+---+---+---+---+---+---+
|D11|D10|D09|D08|D07|D06|D05|D04|
+---+---+---+---+---+---+---+---+
register 0x01:
+---+---+---+---+---+---+---+---+
|D03|D02|D01|D00|---|---|---|---|
+---+---+---+---+---+---+---+---+
This driver support :
- set ak7375 to standby mode once suspend and
turn it back to active if resume
- set the position via V4L2_CID_FOCUS_ABSOLUTE ctrl
[Sakari Ailus: Rename val as ret in probe, drop redundant error message]
Signed-off-by: Tianshu Qiu <tian.shu.qiu@intel.com>
Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
If a PMU doesn't have any IRQs, we should return 0 from
armpmu_request_irqs(), rather than uninitialised stack.
Signed-off-by: Will Deacon <will.deacon@arm.com>
A hooking API was implemented for 4.17 in fa93854f7a followed
by hooks for Thinkpad laptops in 2801b9683f. The Thinkpad
drivers did not support the Thinkpad 13 and the hooking API crashes
on unsupported batteries by altering a list of hooks during unsafe
iteration. Thus, Thinkpad 13 laptops could no longer boot.
Additionally, a lock was kept in place and debugging information was
printed out of order.
Fixes: fa93854f7a (battery: Add the battery hooking API)
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add device tree bindings for AKM ak7375 voice coil lens
driver. This chip is used to drive a lens in a camera module.
Signed-off-by: Tianshu Qiu <tian.shu.qiu@intel.com>
Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Permanently enable the SYNA7500 touchscreen device on the Dell
Venue Pro 7139. The DSDT hides the touchscreen ACPI device on
the 7139 in the same fashion as the 7130, and needs to
be enabled in the same way.
Signed-off-by: Tristian Celestin <tristiancelestin@fastmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Dell Venue Pro 7140 supports the Low Power S0 Idle state, but
does not support any of the _DSM functions that the current heuristic
checks for.
Since suspend-to-mem can not be safely performed on this machine,
and since the bitfield check can't cover this case, it is safer
to enable s2idle by default by checking for the presence of the
_DSM alone and removing the bitfield check.
Signed-off-by: Tristian Celestin <tristiancelestin@fastmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jani spotted this when reviewing my earlier patch to remove the driver
internal usage of this field in:
Commit 3cf91adaa5 ("backlight: Nuke BL_CORE_DRIVER1")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The patch 'backlight: pwm_bl: compute brightness of LED linearly to
human eye' introduced a default brightness-levels table that is used
when brightness-levels is not available in the dts. So move
brightness-levels and default-brightness-level to be optional.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
When you want to change the brightness using a PWM signal, one thing you
need to consider is how human perceive the brightness. Human perceive
the brightness change non-linearly, we have better sensitivity at low
luminance than high luminance, so to achieve perceived linear dimming,
the brightness must be matches to the way our eyes behave. The CIE 1931
lightness formula is what actually describes how we perceive light.
This patch computes a default table with the brightness levels filled
with the numbers provided by the CIE 1931 algorithm, the number of the
brightness levels is calculated based on the PWM resolution.
The calculation of the table using the CIE 1931 algorithm is enabled by
default when you do not define the 'brightness-levels' propriety in your
device tree.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The num-interpolated-steps property specifies the number of
interpolated steps between each value of brightness-level table. This is
useful for high resolution PWMs to not have to list out every possible
value in the brightness-level array.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Setting num-interpolated-steps in the dts will allow you to have linear
interpolation between values of brightness-levels. This way a high
resolution pwm duty cycle can be used without having to list out every
possible value in the dts. This system also allows for gamma corrected
values.
The most simple example is interpolate between two brightness values a
number of steps, this can be done setting the following in the dts:
brightness-levels = <0 65535>;
num-interpolated-steps = <1024>;
default-brightness-level = <512>;
This will create a brightness-level table with the following values:
<0 63 126 189 252 315 378 441 ... 64260 64323 64386 64449 65535>
Another use case can be describe a gamma corrected curve, as we have
better sensitivity at low luminance than high luminance we probably
want have smaller steps for low brightness levels values and bigger
steps for high brightness levels values. This can be achieved with
the following in the dts:
brightness-levels = <0 4096 65535>;
num-interpolated-steps = <1024>;
default-brightness-level = <512>;
This will create a brightness-levels table with the following values:
<0 4 8 12 16 20 ... 4096 4156 4216 4276 ... 65535>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This change updates the device headers to the latest device version.
Where renaming affects the existing code, it's updated accordingly.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The buffer object backing the user fence is reserved using the non-user
fence, i.e., as soon as the non-user fence is signaled, the user fence
buffer object can be moved or even destroyed.
Therefore, emit the user fence first.
Both fences have the same cache invalidation behavior, so this should
have no user-visible effect.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
In the quest to remove all stack VLA usage from the kernel[1], this moves
the "$#" replacement from being an argument to being inside the function,
which avoids generating VLAs.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Change the tasklets parameter type to fix W=1 warnings when building
with gcc 8 like below:
drivers/s390/block/dasd.c: In function 'dasd_alloc_device':
drivers/s390/block/dasd.c:129:8: warning: cast between incompatible function types
from 'void (*)(struct dasd_device *)' to 'void (*)(long unsigned int)' [-Wcast-function-type]
(void (*)(unsigned long)) dasd_device_tasklet,
^
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Acked-by: Jan Höppner <hoeppner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for format 3 function measurement blocks.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
NetworkManager likes to manage linklocal prefix routes and does so with
the NLM_F_APPEND flag, breaking attempts to simplify the IPv6 route
code and by extension enable multipath routes with device only nexthops.
Revert f34436a430 and these followup patches:
6eba08c362 ("ipv6: Only emit append events for appended routes").
ce45bded64 ("mlxsw: spectrum_router: Align with new route replace logic")
53b562df8c ("mlxsw: spectrum_router: Allow appending to dev-only routes")
Update the fib_tests cases to reflect the old behavior.
Fixes: f34436a430 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: David Ahern <dsahern@gmail.com>
Add support for DA9063L, which is a reduced variant of the DA9063
with less regulators and without RTC.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Move the LDOs present only on DA9063 at the end of the list, so that
the DA9063L can simply indicate less LDOs and still share the list of
regulators with DA9063.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The DA9063L does not contain RTC block, unlike the full DA9063.
Split the RTC block into separate mfd cell and register it only
on DA9063.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The DA9063L does not have an RTC. Add custom IRQ map for DA9063L to
ignore the Alarm and Tick IRQs from the PMIC.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The DA9063L does not have an RTC. Add custom regmap for DA9063L to
prevent access into that register block.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Add type for DA9063L, which is a reduced variant of the DA9063
without RTC block and with less regulators.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The model number stored in the struct da9063 is the same for all
variants of the da9063 since it is the chip ID, which is always
the same. Replace that with a separate identifier instead, which
allows us to discern the DA9063 variants by setting the type
based on either DT match or otherwise.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The PMIC_DA9063 is a complete misnomer, it denotes the value of the
DA9063 chip ID register, so rename it as such. It is also the value
of chip ID register of DA9063L though, so drop the enum as all the
DA9063 "models" share the same chip ID and thus the distinction will
have to be made using DT or otherwise.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Replace DA9063_NUM_IRQ macro which is not used anywhere with
plain ARRAY_SIZE().
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Use PLATFORM_DEVID_NONE instead of -1 in mfd_add_devices.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Convert the regmap_irq table to use REGMAP_IRQ_REG().
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Acked-by: Steve Twiss <stwiss.opensource@diasemi.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Convert the regmap_range tables to use regmap_reg_range() macro.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Use devm_mfd_add_devices() instead of plain mfd_add_devices(), which
removes the need for da9063_device_exit() altogether and also for the
.remove callback in da9063-i2c.c .
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Use devm_regmap_add_irq_chip() instead of plain regmap_add_irq_chip(),
which removes the need for da9063_irq_exit() altogether and also
fixes a bug in da9063_device_init() where the da9063_irq_exit() was
not called in a failpath.
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Check whether this EC instance has USBPD host command support and
instatiate the cros_usbpd-charger driver as a subdevice in such case.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The USBPD charger driver gets information from the ChromeOS EC, this
patch adds the USBPD charger definitions needed by this driver.
Signed-off-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
The gen_stats facility will add a header for the toplevel nlattr of type
TCA_STATS2 that contains all stats added by qdisc callbacks. A reference
to this header is stored in the gnet_dump struct, and when all the
per-qdisc callbacks have finished adding their stats, the length of the
containing header will be adjusted to the right value.
However, on architectures that need padding (i.e., that don't set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS), the padding nlattr is added
before the stats, which means that the stored pointer will point to the
padding, and so when the header is fixed up, the result is just a very
big padding nlattr. Because most qdiscs also supply the legacy TCA_STATS
struct, this problem has been mostly invisible, but we exposed it with
the netlink attribute-based statistics in CAKE.
Fix the issue by fixing up the stored pointer if it points to a padding
nlattr.
Tested-by: Pete Heist <pete@heistp.net>
Tested-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says:
====================
More mirror-to-gretap tests with bridge in UL
This patchset adds two more tests where the mirror-to-gretap has a
bridge in underlay packet path, without a VLAN above or below that
bridge.
In patch #1, a non-VLAN-filtering bridge is tested.
In patch #2, a VLAN-filtering bridge is tested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Test for "tc action mirred egress mirror" that mirrors to gretap when
the underlay route points at a VLAN-aware bridge (802.1q).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test for "tc action mirred egress mirror" that mirrors to gretap when
the underlay route points at a VLAN-unaware bridge (802.1d).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree says:
====================
Handle multiple received packets at each stage
This patch series adds the capability for the network stack to receive a
list of packets and process them as a unit, rather than handling each
packet singly in sequence. This is done by factoring out the existing
datapath code at each layer and wrapping it in list handling code.
The motivation for this change is twofold:
* Instruction cache locality. Currently, running the entire network
stack receive path on a packet involves more code than will fit in the
lowest-level icache, meaning that when the next packet is handled, the
code has to be reloaded from more distant caches. By handling packets
in "row-major order", we ensure that the code at each layer is hot for
most of the list. (There is a corresponding downside in _data_ cache
locality, since we are now touching every packet at every layer, but in
practice there is easily enough room in dcache to hold one cacheline of
each of the 64 packets in a NAPI poll.)
* Reduction of indirect calls. Owing to Spectre mitigations, indirect
function calls are now more expensive than ever; they are also heavily
used in the network stack's architecture (see [1]). By replacing 64
indirect calls to the next-layer per-packet function with a single
indirect call to the next-layer list function, we can save CPU cycles.
Drivers pass an SKB list to the stack at the end of the NAPI poll; this
gives a natural batch size (the NAPI poll weight) and avoids waiting at
the software level for further packets to make a larger batch (which
would add latency). It also means that the batch size is automatically
tuned by the existing interrupt moderation mechanism.
The stack then runs each layer of processing over all the packets in the
list before proceeding to the next layer. Where the 'next layer' (or
the context in which it must run) differs among the packets, the stack
splits the list; this 'late demux' means that packets which differ only
in later headers (e.g. same L2/L3 but different L4) can traverse the
early part of the stack together.
Also, where the next layer is not (yet) list-aware, the stack can revert
to calling the rest of the stack in a loop; this allows gradual/creeping
listification, with no 'flag day' patch needed to listify everything.
Patches 1-2 simply place received packets on a list during the event
processing loop on the sfc EF10 architecture, then call the normal stack
for each packet singly at the end of the NAPI poll. (Analogues of patch
#2 for other NIC drivers should be fairly straightforward.)
Patches 3-9 extend the list processing as far as the IP receive handler.
Patches 1-2 alone give about a 10% improvement in packet rate in the
baseline test; adding patches 3-9 raises this to around 25%.
Performance measurements were made with NetPerf UDP_STREAM, using 1-byte
packets and a single core to handle interrupts on the RX side; this was
in order to measure as simply as possible the packet rate handled by a
single core. Figures are in Mbit/s; divide by 8 to obtain Mpps. The
setup was tuned for maximum reproducibility, rather than raw performance.
Full details and more results (both with and without retpolines) from a
previous version of the patch series are presented in [2].
The baseline test uses four streams, and multiple RXQs all bound to a
single CPU (the netperf binary is bound to a neighbouring CPU). These
tests were run with retpolines.
net-next: 6.91 Mb/s (datum)
after 9: 8.46 Mb/s (+22.5%)
Note however that these results are not robust; changes in the parameters
of the test sometimes shrink the gain to single-digit percentages. For
instance, when using only a single RXQ, only a 4% gain was seen.
One test variation was the use of software filtering/firewall rules.
Adding a single iptables rule (UDP port drop on a port range not matching
the test traffic), thus making the netfilter hook have work to do,
reduced baseline performance but showed a similar gain from the patches:
net-next: 5.02 Mb/s (datum)
after 9: 6.78 Mb/s (+35.1%)
Similarly, testing with a set of TC flower filters (kindly supplied by
Cong Wang) gave the following:
net-next: 6.83 Mb/s (datum)
after 9: 8.86 Mb/s (+29.7%)
These data suggest that the batching approach remains effective in the
presence of software switching rules, and perhaps even improves the
performance of those rules by allowing them and their codepaths to stay
in cache between packets.
Changes from v3:
* Fixed build error when CONFIG_NETFILTER=n (thanks kbuild).
Changes from v2:
* Used standard list handling (and skb->list) instead of the skb-queue
functions (that use skb->next, skb->prev).
- As part of this, changed from a "dequeue, process, enqueue" model to
using list_for_each_safe, list_del, and (new) list_cut_before.
* Altered __netif_receive_skb_core() changes in patch 6 as per Willem de
Bruijn's suggestions (separate **ppt_prev from *pt_prev; renaming).
* Removed patches to Generic XDP, since they were producing no benefit.
I may revisit them later.
* Removed RFC tags.
Changes from v1:
* Rebased across 2 years' net-next movement (surprisingly straightforward).
- Added Generic XDP handling to netif_receive_skb_list_internal()
- Dealt with changes to PFMEMALLOC setting APIs
* General cleanup of code and comments.
* Skipped function calls for empty lists at various points in the stack
(patch #9).
* Added listified Generic XDP handling (patches 10-12), though it doesn't
seem to help (see above).
* Extended testing to cover software firewalls / netfilter etc.
[1] http://vger.kernel.org/netconf2018_files/DavidMiller_netconf2018.pdf
[2] http://vger.kernel.org/netconf2018_files/EdwardCree_netconf2018.pdf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>