linux/net/bridge
Ido Schimmel 91b6dbced0 bridge: netfilter: Fix forwarding of fragmented packets
When netfilter defrag hooks are loaded (due to the presence of conntrack
rules, for example), fragmented packets entering the bridge will be
defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
ipv4_conntrack_defrag()).

Later on, in the bridge's post-routing hook, the defragged packet will
be fragmented again. If the size of the largest fragment is larger than
what the kernel has determined as the destination MTU (using
ip_skb_dst_mtu()), the defragged packet will be dropped.

Before commit ac6627a28d ("net: ipv4: Consolidate ipv4_mtu and
ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
the destination MTU. Assuming the dst entry attached to the packet is
the bridge's fake rtable one, this would simply be the bridge's MTU (see
fake_mtu()).

However, after above mentioned commit, ip_skb_dst_mtu() ends up
returning the route's MTU stored in the dst entry's metrics. Ideally, in
case the dst entry is the bridge's fake rtable one, this should be the
bridge's MTU as the bridge takes care of updating this metric when its
MTU changes (see br_change_mtu()).

Unfortunately, the last operation is a no-op given the metrics attached
to the fake rtable entry are marked as read-only. Therefore,
ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
defragged packets are dropped during fragmentation when dealing with
large fragments and high MTU (e.g., 9k).

Fix by moving the fake rtable entry's metrics to be per-bridge (in a
similar fashion to the fake rtable entry itself) and marking them as
writable, thereby allowing MTU changes to be reflected.

Fixes: 62fa8a846d ("net: Implement read-only protection and COW'ing of metrics.")
Fixes: 33eb9873a2 ("bridge: initialize fake_rtable metrics")
Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250515084848.727706-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16 16:02:06 -07:00
..
netfilter netfilter: nf_tables: replace deprecated strncpy with strscpy_pad 2024-10-15 17:29:51 +02:00
br_arp_nd_proxy.c bridge: Make br_is_nd_neigh_msg() accept pointer to "const struct sk_buff" 2025-01-07 15:13:10 +01:00
br_cfm_netlink.c bridge: cfm: fix enum typo in br_cc_ccm_tx_parse 2023-12-26 22:38:13 +00:00
br_cfm.c bridge: cfm: remove redundant return 2021-06-22 10:35:15 -07:00
br_device.c net: move misc netdev_lock flavors to a separate header 2025-03-08 09:06:50 -08:00
br_fdb.c rtnetlink: add ndo_fdb_dump_context 2024-12-10 18:32:32 -08:00
br_forward.c net: bridge: add skb drop reasons to the most common drop points 2024-12-23 10:11:04 -08:00
br_if.c net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode 2023-07-03 09:11:34 +01:00
br_input.c net: bridge: add skb drop reasons to the most common drop points 2024-12-23 10:11:04 -08:00
br_ioctl.c net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF. 2025-03-21 22:10:06 +01:00
br_mdb.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
br_mrp_netlink.c bridge: mrp: Use hlist_head instead of list_head for mrp 2020-11-09 16:42:12 -08:00
br_mrp_switchdev.c bridge: mrp: Extend br_mrp_switchdev to detect better the errors 2021-02-16 14:47:46 -08:00
br_mrp.c net: bridge: mrp: Update the Test frames for MRA 2021-06-28 15:46:10 -07:00
br_mst.c net: bridge: mst: fix suspicious rcu usage in br_mst_set_state 2024-06-12 18:24:24 -07:00
br_multicast_eht.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
br_multicast.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
br_netfilter_hooks.c netfilter: br_netfilter: remove unused conditional and dead code 2025-01-19 16:41:52 +01:00
br_netfilter_ipv6.c netfilter: bridge: replace physindev with physinif in nf_bridge_info 2024-01-17 12:02:49 +01:00
br_netlink_tunnel.c net: bridge: fix an inconsistent indentation 2024-06-05 10:04:47 +01:00
br_netlink.c rtnetlink: Pack newlink() params into struct 2025-02-21 15:28:02 -08:00
br_nf_core.c bridge: netfilter: Fix forwarding of fragmented packets 2025-05-16 16:02:06 -07:00
br_private_cfm.h bridge: cfm: Kernel space implementation of CFM. CCM frame RX added. 2020-10-29 18:39:43 -07:00
br_private_mcast_eht.h net: bridge: multicast: use multicast contexts instead of bridge or port 2021-07-20 05:41:19 -07:00
br_private_mrp.h net: bridge: mrp: Update the Test frames for MRA 2021-06-28 15:46:10 -07:00
br_private_stp.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
br_private_tunnel.h bridge: always declare tunnel functions 2023-05-17 21:28:58 -07:00
br_private.h bridge: netfilter: Fix forwarding of fragmented packets 2025-05-16 16:02:06 -07:00
br_stp_bpdu.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
br_stp_if.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
br_stp_timer.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
br_stp.c treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
br_switchdev.c net: bridge: switchdev: Ensure deferred event delivery on unoffload 2024-02-16 09:36:37 +00:00
br_sysfs_br.c net: bridge: constify 'struct bin_attribute' 2024-12-17 19:00:43 -08:00
br_sysfs_if.c bridge: move from strlcpy with unused retval to strscpy 2022-08-22 17:57:30 -07:00
br_vlan_options.c bridge: vlan: Allow setting VLAN neighbor suppression state 2023-04-21 08:25:50 +01:00
br_vlan_tunnel.c ip_tunnel: convert __be16 tunnel flags to bitmaps 2024-04-01 10:49:28 +01:00
br_vlan.c net: bridge: switchdev: do not notify new brentries as changed 2025-04-16 18:11:39 -07:00
br.c net: bridge: Handle changes in VLAN_FLAG_BRIDGE_BINDING 2024-12-20 13:14:17 -08:00
Kconfig bridge: cfm: Add BRIDGE_CFM to Kconfig. 2020-10-29 18:39:43 -07:00
Makefile net: bridge: mst: Multiple Spanning Tree (MST) mode 2022-03-17 16:49:57 -07:00