linux/include
Yunsheng Lin 2f23d5bcd9 net: sched: fix tx action rescheduling issue during deactivation
[ Upstream commit 102b55ee92 ]

Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
qdisc before calling __qdisc_run(), which ultimately clear the
STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
is set before clearing STATE_MISSED, there may be rescheduling
of net_tx_action() at the end of qdisc_run_end(), see below:

CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
          .                   .                     .
          .            set STATE_MISSED             .
          .           __netif_schedule()            .
          .                   .           set STATE_DEACTIVATED
          .                   .                qdisc_reset()
          .                   .                     .
          .<---------------   .              synchronize_net()
clear __QDISC_STATE_SCHED  |  .                     .
          .                |  .                     .
          .                |  .            some_qdisc_is_busy()
          .                |  .               return *false*
          .                |  .                     .
  test STATE_DEACTIVATED   |  .                     .
__qdisc_run() *not* called |  .                     .
          .                |  .                     .
   test STATE_MISS         |  .                     .
 __netif_schedule()--------|  .                     .
          .                   .                     .
          .                   .                     .

__qdisc_run() is not called by net_tx_atcion() in CPU0 because
CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
is called at the end of qdisc_run_end(), causing tx action
rescheduling problem.

qdisc_run() called by net_tx_action() runs in the softirq context,
which should has the same semantic as the qdisc_run() called by
__dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
synchronize_net() between STATE_DEACTIVATED flag being set and
qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
bail out for the deactived lockless qdisc in net_tx_action(), and
qdisc_reset() will reset all skb not dequeued yet.

So add the rcu_read_lock() explicitly to protect the qdisc_run()
and do the STATE_DEACTIVATED checking in net_tx_action() before
calling qdisc_run_begin(). Another option is to do the checking in
the qdisc_run_end(), but it will add unnecessary overhead for
non-tx_action case, because __dev_queue_xmit() will not see qdisc
with STATE_DEACTIVATED after synchronize_net(), the qdisc with
STATE_DEACTIVATED can only be seen by net_tx_action() because of
__netif_schedule().

The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
between net_tx_action() and qdisc_reset(), see:
commit d518d2ed86 ("net/sched: fix race between deactivation
and dequeue for NOLOCK qdisc"). As the bailout added above for
deactived lockless qdisc in net_tx_action() provides better
protection for the race without calling qdisc_run() at all, so
remove the STATE_DEACTIVATED checking in qdisc_run().

After qdisc_reset(), there is no skb in qdisc to be dequeued, so
clear the STATE_MISSED in dev_reset_queue() too.

Fixes: 6b3ba9146f ("net: sched: allow qdiscs to handle locking")
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
V8: Clearing STATE_MISSED before calling __netif_schedule() has
    avoid the endless rescheduling problem, but there may still
    be a unnecessary rescheduling, so adjust the commit log.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-03 09:00:47 +02:00
..
acpi ACPI: scan: Use unique number for instance_no 2021-03-30 14:32:06 +02:00
asm-generic static_call: Allow module use without exposing static_call_key 2021-03-30 14:31:53 +02:00
clocksource
crypto crypto: poly1305 - fix poly1305_core_setkey() declaration 2021-05-14 09:50:13 +02:00
drm drm/dp/mst: Export drm_dp_get_vc_payload_bw() 2021-02-10 09:29:18 +01:00
dt-bindings ASoC: dt-bindings: lpass: Fix and common up lpass dai ids 2021-02-03 23:28:46 +01:00
keys security: keys: trusted: fix TPM2 authorizations 2021-05-14 09:50:20 +02:00
kunit kunit: fix display of failed expectations for strings 2020-11-10 13:45:15 -07:00
kvm ARM: 2020-10-23 11:17:56 -07:00
linux linux/bits.h: fix compilation error with GENMASK 2021-06-03 09:00:45 +02:00
math-emu
media media: v4l2-ctrls: fix reference to freed memory 2021-05-11 14:47:39 +02:00
memory
misc
net net: sched: fix tx action rescheduling issue during deactivation 2021-06-03 09:00:47 +02:00
pcmcia
ras
rdma RDMA: Lift ibdev_to_node from rds to common code 2021-02-26 10:12:59 +01:00
scsi Fix misc new gcc warnings 2021-05-11 14:47:36 +02:00
soc net: dsa: felix: implement port flushing on .phylink_mac_link_down 2021-02-17 11:02:27 +01:00
sound ALSA: hda: intel-nhlt: verify config type 2021-03-09 11:11:14 +01:00
target scsi: target: core: Add cmd length set before cmd complete 2021-03-17 17:06:25 +01:00
trace SUNRPC: Remove trace_xprt_transmit_queued 2021-05-19 10:13:03 +02:00
uapi netfilter: xt_SECMARK: add new revision to fix structure layout 2021-05-19 10:13:06 +02:00
vdso
video gpu: ipu-v3: remove unused functions 2020-10-26 10:42:38 +01:00
xen Xen/gntdev: correct error checking in gntdev_map_grant_pages() 2021-02-23 15:53:24 +01:00