mirror of
https://github.com/torvalds/linux.git
synced 2026-05-12 16:18:45 +02:00
Some modern cpus disable X86_FEATURE_RETPOLINE feature, even if a direct call can still be beneficial. Even when IBRS is present, an indirect call is more expensive than a direct one: Direct Calls: Compilers can perform powerful optimizations like inlining, where the function body is directly inserted at the call site, eliminating call overhead entirely. Indirect Calls: Inlining is much harder, if not impossible, because the compiler doesn't know the target function at compile time. Techniques like Indirect Call Promotion can help by using profile-guided optimization to turn frequently taken indirect calls into conditional direct calls, but they still add complexity and potential overhead compared to a truly direct call. In this patch, I split tc_skip_wrapper in two different static keys, one for tc_act() (tc_skip_wrapper_act) and one for tc_classify() (tc_skip_wrapper_cls). Then I enable the tc_skip_wrapper_cls only if the count of builtin classifiers is above one. I enable tc_skip_wrapper_act only it the count of builtin actions is above one. In our production kernels, we only have CONFIG_NET_CLS_BPF=y and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled. Tested on AMD Turin cpus, cls_bpf_classify() cost went from 1% down to 0.18 %, and FDO will be able to inline it in tcf_classify() for further gains. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260307133601.3863071-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|---|---|---|
| .. | ||
| acpi | ||
| asm-generic | ||
| clocksource | ||
| crypto | ||
| cxl | ||
| drm | ||
| dt-bindings | ||
| hyperv | ||
| keys | ||
| kunit | ||
| kvm | ||
| linux | ||
| math-emu | ||
| media | ||
| memory | ||
| misc | ||
| net | ||
| pcmcia | ||
| ras | ||
| rdma | ||
| rv | ||
| scsi | ||
| soc | ||
| sound | ||
| target | ||
| trace | ||
| uapi | ||
| ufs | ||
| vdso | ||
| video | ||
| xen | ||
| Kbuild | ||