mirror of
https://github.com/torvalds/linux.git
synced 2026-06-01 11:03:43 +02:00
net: Introduce skb tc depth field to track packet loops
Add a 2-bit per-skb tc depth field to track packet loops across the stack. The previous per-CPU loop counters like MIRRED_NEST_LIMIT assume a single call stack and lose state in two cases: 1) When a packet is queued and reprocessed later (e.g., egress->ingress via backlog), the per-cpu state is gone by the time it is dequeued. 2) With XPS/RPS a packet may arrive on one CPU and be processed on another. A per-skb field solves both by travelling with the packet itself. The field fits in existing padding, using 2 bits that were previously a hole: pahole before(-) and after (+) diff looks like: __u8 slow_gro:1; /* 132: 3 1 */ __u8 csum_not_inet:1; /* 132: 4 1 */ __u8 unreadable:1; /* 132: 5 1 */ + __u8 tc_depth:2; /* 132: 6 1 */ - /* XXX 2 bits hole, try to pack */ /* XXX 1 byte hole, try to pack */ __u16 tc_index; /* 134 2 */ There used to be a ttl field which was removed as part of tc_verd in commitaec745e2c5("net-tc: remove unused tc_verd fields"). It was already unused by that time, due to remove earlier in commitc19ae86a51("tc: remove unused redirect ttl"). The first user of this field is netem, which increments tc_depth on duplicated packets before re-enqueueing them at the root qdisc. On re-entry, netem skips duplication for any skb with tc_depth already set, bounding recursion to a single level regardless of tree topology. The other user is mirred which increments it on each pass and limits to depth to MIRRED_DEFER_LIMIT (3). The new field was called ttl in earlier versions of this patch but renamed to tc_depth to avoid confusion with IP ttl. Note (looking at you Sashiko! Dont ignore me and continue bringing this up): 1. Since both mirred and netem utilize the same 2-bit tc_depth field it is possible when netem and mirred are used together that netem qdisc to skip the duplication step. This is a known trade-off, as a 2-bit field cannot independently track both features' recursion depths and it is not considered sane to have a setup that addresses both features on at the same time. 2. skb_scrub_packet does not clear tc_depth. This means a packet's loop history is preserved even across namespaces. While this might be restrictive for some topologies, it is also design intent to provide robustness against loops across namespaces. Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260525122556.973584-2-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This commit is contained in:
parent
9d5e7a46a9
commit
98b34f3e8c
|
|
@ -821,6 +821,7 @@ enum skb_tstamp_type {
|
|||
* @_sk_redir: socket redirection information for skmsg
|
||||
* @_nfct: Associated connection, if any (with nfctinfo bits)
|
||||
* @skb_iif: ifindex of device we arrived on
|
||||
* @tc_depth: counter for packet duplication
|
||||
* @tc_index: Traffic control index
|
||||
* @hash: the packet hash
|
||||
* @queue_mapping: Queue mapping for multiqueue devices
|
||||
|
|
@ -1030,6 +1031,7 @@ struct sk_buff {
|
|||
__u8 csum_not_inet:1;
|
||||
#endif
|
||||
__u8 unreadable:1;
|
||||
__u8 tc_depth:2;
|
||||
#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
|
||||
__u16 tc_index; /* traffic control index */
|
||||
#endif
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user