Merge branch 'bpf-propagate-read-precision-marks-over-state-graph-backedges'

Eduard Zingerman says:

====================
bpf: propagate read/precision marks over state graph backedges

Current loop_entry-based states comparison logic does not handle the
following case:

 .-> A --.  Assume the states are visited in the order A, B, C.
 |   |   |  Assume that state B reaches a state equivalent to state A.
 |   v   v  At this point, state C is not processed yet, so state A
 '-- B   C  has not received any read or precision marks from C.
            As a result, these marks won't be propagated to B.

If B has incomplete marks, it is unsafe to use it in states_equal()
checks. This issue was first reported in [1].

This patch-set
--------------

Here is the gist of the algorithm implemented by this patch-set:
- Compute strongly connected components (SCCs) in the program CFG.
- When a verifier state enters an SCC, that state is recorded as the
  SCC's entry point.
- When a verifier state is found to be equivalent to another
  (e.g., B to A in the example above), it is recorded as a
  states-graph backedge.
- Backedges are accumulated per SCC (*).
- When an SCC entry state reaches `branches == 0`, propagate read and
  precision marks through the backedges until a fixed point is reached
  (e.g., from A to B, from C to A, and then again from A to B).

(*) This is an oversimplification, see patch #8 for details.

Unfortunately, this means that commit [2] needs to be reverted,
as precision propagation requires access to jump history,
and backedges represent history not belonging to `env->cur_state`.

Details are provided in patch #8; a comment in `is_state_visited()`
explains most of the mechanics.

Patch #2 adds a `compute_scc()` function, which computes SCCs in the
program CFG. This function was tested using property-based testing in
[3], but it is not included in selftests.

Previous attempt
----------------

A previous attempt to fix this is described in [4]:
1. Within the states loop, `states_equal(... RANGE_WITHIN)` ignores
   read and precision marks.
2. For states outside the loop, all registers for states within the
   loop are marked as read and precise.

This approach led to an 86x regression on the `cond_break1` selftest.
In that test, one loop was followed by another, and a certain variable
was incremented in the second loop. This variable was marked as
precise due to rule (2), which hindered convergence in the first loop.

After some off-list discussion, it was decided that this might be a
typical case and such regressions are undesirable.

This patch-set avoids such eager precision markings.

Alternatives
------------

Another option is to associate a mask of read/written/precise stack
slots with each instruction. This mask can be populated during
verifier states exploration. Upon reaching an `EXIT` instruction or an
equivalent state, the accumulated masks can be used to propagate
read/written/precise bits across the program's control flow graph
using an analysis similar to use-def.

Unfortunately, a naive implementation of this approach [5] results in
a 10x regression in `veristat` for some `sched_ext` programs due to
the inability to express the must-write property. This issue requires
further investigation.

Changes in verification performance
-----------------------------------

There are some veristat regressions when comparing with master using
selftests and sched_ext BPF binaries. The comparison is done using
master from [6] and this patch-set from [7] where memory accounting
logic is added to veristat.

========= selftests: master vs patch-set =========

File                  Program                              Insns                           Peak memory (KiB)
--------------------- -----------------------------------  -----  -----  ----------------  ----  -----  ----------------
bpf_qdisc_fq.bpf.o    bpf_fq_dequeue                        1187   1645    +458 (+38.58%)   768   1240    +472 (+61.46%)
dynptr_success.bpf.o  test_copy_from_user_str_dynptr         208    279     +71 (+34.13%)   512   1024   +512 (+100.00%)
dynptr_success.bpf.o  test_copy_from_user_task_str_dynptr    205    263     +58 (+28.29%)   512   1024   +512 (+100.00%)
dynptr_success.bpf.o  test_probe_read_kernel_str_dynptr      686    857    +171 (+24.93%)   992   1724    +732 (+73.79%)
dynptr_success.bpf.o  test_probe_read_user_str_dynptr        689    860    +171 (+24.82%)  1016   1744    +728 (+71.65%)
iters.bpf.o           checkpoint_states_deletion            1211   1216       +5 (+0.41%)   512   1280   +768 (+150.00%)
pyperf600_iter.bpf.o  on_event                              2591   5929  +3338 (+128.83%)  4744  11176  +6432 (+135.58%)
verifier_gotol.bpf.o  gotol_large_imm                      40004  40004       +0 (+0.00%)  1024   1536    +512 (+50.00%)

Total progs: 3725
Old success: 2157
New success: 2157
total_insns diff min:    0.00%
total_insns diff max:  128.83%
0 -> value: 0
value -> 0: 0
total_insns abs max old: 837,487
total_insns abs max new: 837,487
   0 .. 5    %: 3710
   5 .. 15   %: 6
  20 .. 30   %: 6
  30 .. 40   %: 2
 125 .. 130  %: 1

mem_peak diff min:  -27.78%
mem_peak diff max:  198.44%
mem_peak abs max old: 269,312 KiB
mem_peak abs max new: 269,312 KiB
 -30 .. -20  %: 1
  -5 .. 0    %: 18
   0 .. 5    %: 3568
   5 .. 15   %: 4
  15 .. 25   %: 3
  45 .. 55   %: 4
  60 .. 70   %: 1
  70 .. 80   %: 2
 100 .. 110  %: 3
 135 .. 145  %: 1
 150 .. 160  %: 1
 195 .. 200  %: 1

========= scx: master vs patch-set =========

Program                   Insns                          Peak memory (KiB)
------------------------  -----  -----  ---------------  -----  -----  -----------------
arena_topology_node_init   2133   2395   +262 (+12.28%)    768    768        +0 (+0.00%)
chaos_dispatch             2835   2868     +33 (+1.16%)   1972   1720     -252 (-12.78%)
chaos_init                 4324   5210   +886 (+20.49%)   2528   3028     +500 (+19.78%)
lavd_cpu_offline           5107   5726   +619 (+12.12%)   4188   6304    +2116 (+50.53%)
lavd_cpu_online            5107   5726   +619 (+12.12%)   4188   6304    +2116 (+50.53%)
lavd_dispatch             41775  47601  +5826 (+13.95%)   6196  29192  +22996 (+371.14%)
lavd_enqueue              20238  24188  +3950 (+19.52%)  22084  42156   +20072 (+90.89%)
lavd_init                  6974   7685   +711 (+10.20%)   5428   6928    +1500 (+27.63%)
lavd_select_cpu           22138  26088  +3950 (+17.84%)  24448  43688   +19240 (+78.70%)
layered_dispatch          17847  26581  +8734 (+48.94%)  11728  28740  +17012 (+145.05%)
layered_dump               1891   2098   +207 (+10.95%)   2036   3048    +1012 (+49.71%)
layered_runnable           2606   2634     +28 (+1.07%)    748   1240     +492 (+65.78%)
p2dq_init                  3691   4554   +863 (+23.38%)   2016   2528     +512 (+25.40%)
rusty_enqueue             28853  28853      +0 (+0.00%)   2072   1824     -248 (-11.97%)
rusty_init_task           31128  31128      +0 (+0.00%)   2176   2560     +384 (+17.65%)

Total progs: 148
Old success: 135
New success: 135
total_insns diff min:    0.00%
total_insns diff max:   48.94%
0 -> value: 0
value -> 0: 0
total_insns abs max old: 41,775
total_insns abs max new: 47,601
   0 .. 5    %: 133
   5 .. 15   %: 7
  15 .. 25   %: 4
  35 .. 45   %: 3
  45 .. 50   %: 1

mem_peak diff min:  -12.78%
mem_peak diff max:  371.14%
mem_peak abs max old: 24,448 KiB
mem_peak abs max new: 43,688 KiB
 -15 .. -5   %: 2
  -5 .. 0    %: 2
   0 .. 5    %: 129
   5 .. 15   %: 1
  15 .. 25   %: 2
  25 .. 35   %: 2
  45 .. 55   %: 3
  65 .. 75   %: 1
  75 .. 85   %: 1
  90 .. 100  %: 1
 145 .. 155  %: 1
 195 .. 205  %: 1
 370 .. 375  %: 1

Changelog
---------

v1: https://lore.kernel.org/bpf/20250524191932.389444-1-eddyz87@gmail.com/
v1 -> v2:
- Rebase
- added mem_peak statistics (Alexei)
- selftests: fixed comments and removed useless r7 assignments (Yonghong)
v2: https://lore.kernel.org/bpf/20250606210352.1692944-1-eddyz87@gmail.com/
v2 -> v3:
- Rebase

Links
-----

[1] https://lore.kernel.org/bpf/20250312031344.3735498-1-eddyz87@gmail.com/
[2] commit 96a30e469c ("bpf: use common instruction history across all states")
[3] https://github.com/eddyz87/scc-test
[4] https://lore.kernel.org/bpf/20250426104634.744077-1-eddyz87@gmail.com/
[5] https://github.com/eddyz87/bpf/tree/propagate-read-and-precision-in-cfg
[6] https://github.com/eddyz87/bpf/tree/veristat-memory-accounting
[7] https://github.com/eddyz87/bpf/tree/scc-accumulate-backedges
====================

Link: https://patch.msgid.link/20250611200546.4120963-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit is contained in:
Alexei Starovoitov 2025-06-11 13:55:39 -07:00
commit e3f6660b78
3 changed files with 1001 additions and 329 deletions

View File

@ -344,7 +344,7 @@ struct bpf_func_state {
#define MAX_CALL_FRAMES 8
/* instruction history flags, used in bpf_insn_hist_entry.flags field */
/* instruction history flags, used in bpf_jmp_history_entry.flags field */
enum {
/* instruction references stack slot through PTR_TO_STACK register;
* we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
@ -366,7 +366,7 @@ enum {
static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
struct bpf_insn_hist_entry {
struct bpf_jmp_history_entry {
u32 idx;
/* insn idx can't be bigger than 1 million */
u32 prev_idx : 20;
@ -449,32 +449,20 @@ struct bpf_verifier_state {
/* first and last insn idx of this verifier state */
u32 first_insn_idx;
u32 last_insn_idx;
/* If this state is a part of states loop this field points to some
* parent of this state such that:
* - it is also a member of the same states loop;
* - DFS states traversal starting from initial state visits loop_entry
* state before this state.
* Used to compute topmost loop entry for state loops.
* State loops might appear because of open coded iterators logic.
* See get_loop_entry() for more information.
/* if this state is a backedge state then equal_state
* records cached state to which this state is equal.
*/
struct bpf_verifier_state *loop_entry;
/* Sub-range of env->insn_hist[] corresponding to this state's
* instruction history.
* Backtracking is using it to go from last to first.
* For most states instruction history is short, 0-3 instructions.
struct bpf_verifier_state *equal_state;
/* jmp history recorded from first to last.
* backtracking is using it to go from last to first.
* For most states jmp_history_cnt is [0-3].
* For loops can go up to ~40.
*/
u32 insn_hist_start;
u32 insn_hist_end;
struct bpf_jmp_history_entry *jmp_history;
u32 jmp_history_cnt;
u32 dfs_depth;
u32 callback_unroll_depth;
u32 may_goto_depth;
/* If this state was ever pointed-to by other state's loop_entry field
* this flag would be set to true. Used to avoid freeing such states
* while they are still in use.
*/
u32 used_as_loop_entry;
};
#define bpf_get_spilled_reg(slot, frame, mask) \
@ -610,6 +598,11 @@ struct bpf_insn_aux_data {
* accepts callback function as a parameter.
*/
bool calls_callback;
/*
* CFG strongly connected component this instruction belongs to,
* zero if it is a singleton SCC.
*/
u32 scc;
/* registers alive before this instruction. */
u16 live_regs_before;
};
@ -719,6 +712,38 @@ struct bpf_idset {
u32 ids[BPF_ID_MAP_SIZE];
};
/* see verifier.c:compute_scc_callchain() */
struct bpf_scc_callchain {
/* call sites from bpf_verifier_state->frame[*]->callsite leading to this SCC */
u32 callsites[MAX_CALL_FRAMES - 1];
/* last frame in a chain is identified by SCC id */
u32 scc;
};
/* verifier state waiting for propagate_backedges() */
struct bpf_scc_backedge {
struct bpf_scc_backedge *next;
struct bpf_verifier_state state;
};
struct bpf_scc_visit {
struct bpf_scc_callchain callchain;
/* first state in current verification path that entered SCC
* identified by the callchain
*/
struct bpf_verifier_state *entry_state;
struct bpf_scc_backedge *backedges; /* list of backedges */
u32 num_backedges;
};
/* An array of bpf_scc_visit structs sharing tht same bpf_scc_callchain->scc
* but having different bpf_scc_callchain->callsites.
*/
struct bpf_scc_info {
u32 num_visits;
struct bpf_scc_visit visits[];
};
/* single container for all structs
* one verifier_env per bpf_check() call
*/
@ -776,9 +801,7 @@ struct bpf_verifier_env {
int cur_postorder;
} cfg;
struct backtrack_state bt;
struct bpf_insn_hist_entry *insn_hist;
struct bpf_insn_hist_entry *cur_hist_ent;
u32 insn_hist_cap;
struct bpf_jmp_history_entry *cur_hist_ent;
u32 pass_cnt; /* number of times do_check() was called */
u32 subprog_cnt;
/* number of instructions analyzed by the verifier */
@ -800,6 +823,7 @@ struct bpf_verifier_env {
u32 longest_mark_read_walk;
u32 free_list_size;
u32 explored_states_size;
u32 num_backedges;
bpfptr_t fd_array;
/* bit mask to keep track of whether a register has been accessed
@ -817,6 +841,9 @@ struct bpf_verifier_env {
char tmp_str_buf[TMP_STR_BUF_LEN];
struct bpf_insn insn_buf[INSN_BUF_SIZE];
struct bpf_insn epilogue_buf[INSN_BUF_SIZE];
/* array of pointers to bpf_scc_info indexed by SCC id */
struct bpf_scc_info **scc_info;
u32 scc_cnt;
};
static inline struct bpf_func_info_aux *subprog_aux(struct bpf_verifier_env *env, int subprog)

File diff suppressed because it is too large Load Diff

View File

@ -1649,4 +1649,281 @@ int clean_live_states(const void *ctx)
return 0;
}
SEC("?raw_tp")
__flag(BPF_F_TEST_STATE_FREQ)
__failure __msg("misaligned stack access off 0+-31+0 size 8")
__naked int absent_mark_in_the_middle_state(void)
{
/* This is equivalent to C program below.
*
* r8 = bpf_get_prandom_u32();
* r6 = -32;
* bpf_iter_num_new(&fp[-8], 0, 10);
* if (unlikely(bpf_get_prandom_u32()))
* r6 = -31;
* while (bpf_iter_num_next(&fp[-8])) {
* if (unlikely(bpf_get_prandom_u32()))
* *(fp + r6) = 7;
* }
* bpf_iter_num_destroy(&fp[-8])
* return 0
*/
asm volatile (
"call %[bpf_get_prandom_u32];"
"r8 = r0;"
"r7 = 0;"
"r6 = -32;"
"r0 = 0;"
"*(u64 *)(r10 - 16) = r0;"
"r1 = r10;"
"r1 += -8;"
"r2 = 0;"
"r3 = 10;"
"call %[bpf_iter_num_new];"
"call %[bpf_get_prandom_u32];"
"if r0 == r8 goto change_r6_%=;"
"loop_%=:"
"call noop;"
"r1 = r10;"
"r1 += -8;"
"call %[bpf_iter_num_next];"
"if r0 == 0 goto loop_end_%=;"
"call %[bpf_get_prandom_u32];"
"if r0 == r8 goto use_r6_%=;"
"goto loop_%=;"
"loop_end_%=:"
"r1 = r10;"
"r1 += -8;"
"call %[bpf_iter_num_destroy];"
"r0 = 0;"
"exit;"
"use_r6_%=:"
"r0 = r10;"
"r0 += r6;"
"r1 = 7;"
"*(u64 *)(r0 + 0) = r1;"
"goto loop_%=;"
"change_r6_%=:"
"r6 = -31;"
"goto loop_%=;"
:
: __imm(bpf_iter_num_new),
__imm(bpf_iter_num_next),
__imm(bpf_iter_num_destroy),
__imm(bpf_get_prandom_u32)
: __clobber_all
);
}
__used __naked
static int noop(void)
{
asm volatile (
"r0 = 0;"
"exit;"
);
}
SEC("?raw_tp")
__flag(BPF_F_TEST_STATE_FREQ)
__failure __msg("misaligned stack access off 0+-31+0 size 8")
__naked int absent_mark_in_the_middle_state2(void)
{
/* This is equivalent to C program below.
*
* r8 = bpf_get_prandom_u32();
* r6 = -32;
* bpf_iter_num_new(&fp[-8], 0, 10);
* if (unlikely(bpf_get_prandom_u32())) {
* r6 = -31;
* jump_into_loop:
* goto +0;
* goto loop;
* }
* if (unlikely(bpf_get_prandom_u32()))
* goto jump_into_loop;
* loop:
* while (bpf_iter_num_next(&fp[-8])) {
* if (unlikely(bpf_get_prandom_u32()))
* *(fp + r6) = 7;
* }
* bpf_iter_num_destroy(&fp[-8])
* return 0
*/
asm volatile (
"call %[bpf_get_prandom_u32];"
"r8 = r0;"
"r7 = 0;"
"r6 = -32;"
"r0 = 0;"
"*(u64 *)(r10 - 16) = r0;"
"r1 = r10;"
"r1 += -8;"
"r2 = 0;"
"r3 = 10;"
"call %[bpf_iter_num_new];"
"call %[bpf_get_prandom_u32];"
"if r0 == r8 goto change_r6_%=;"
"call %[bpf_get_prandom_u32];"
"if r0 == r8 goto jump_into_loop_%=;"
"loop_%=:"
"r1 = r10;"
"r1 += -8;"
"call %[bpf_iter_num_next];"
"if r0 == 0 goto loop_end_%=;"
"call %[bpf_get_prandom_u32];"
"if r0 == r8 goto use_r6_%=;"
"goto loop_%=;"
"loop_end_%=:"
"r1 = r10;"
"r1 += -8;"
"call %[bpf_iter_num_destroy];"
"r0 = 0;"
"exit;"
"use_r6_%=:"
"r0 = r10;"
"r0 += r6;"
"r1 = 7;"
"*(u64 *)(r0 + 0) = r1;"
"goto loop_%=;"
"change_r6_%=:"
"r6 = -31;"
"jump_into_loop_%=: "
"goto +0;"
"goto loop_%=;"
:
: __imm(bpf_iter_num_new),
__imm(bpf_iter_num_next),
__imm(bpf_iter_num_destroy),
__imm(bpf_get_prandom_u32)
: __clobber_all
);
}
SEC("?raw_tp")
__flag(BPF_F_TEST_STATE_FREQ)
__failure __msg("misaligned stack access off 0+-31+0 size 8")
__naked int absent_mark_in_the_middle_state3(void)
{
/*
* bpf_iter_num_new(&fp[-8], 0, 10)
* loop1(-32, &fp[-8])
* loop1_wrapper(&fp[-8])
* bpf_iter_num_destroy(&fp[-8])
*/
asm volatile (
"r1 = r10;"
"r1 += -8;"
"r2 = 0;"
"r3 = 10;"
"call %[bpf_iter_num_new];"
/* call #1 */
"r1 = -32;"
"r2 = r10;"
"r2 += -8;"
"call loop1;"
"r1 = r10;"
"r1 += -8;"
"call %[bpf_iter_num_destroy];"
/* call #2 */
"r1 = r10;"
"r1 += -8;"
"r2 = 0;"
"r3 = 10;"
"call %[bpf_iter_num_new];"
"r1 = r10;"
"r1 += -8;"
"call loop1_wrapper;"
/* return */
"r1 = r10;"
"r1 += -8;"
"call %[bpf_iter_num_destroy];"
"r0 = 0;"
"exit;"
:
: __imm(bpf_iter_num_new),
__imm(bpf_iter_num_destroy),
__imm(bpf_get_prandom_u32)
: __clobber_all
);
}
__used __naked
static int loop1(void)
{
/*
* int loop1(num, iter) {
* r6 = num;
* r7 = iter;
* while (bpf_iter_num_next(r7)) {
* if (unlikely(bpf_get_prandom_u32()))
* *(fp + r6) = 7;
* }
* return 0
* }
*/
asm volatile (
"r6 = r1;"
"r7 = r2;"
"call %[bpf_get_prandom_u32];"
"r8 = r0;"
"loop_%=:"
"r1 = r7;"
"call %[bpf_iter_num_next];"
"if r0 == 0 goto loop_end_%=;"
"call %[bpf_get_prandom_u32];"
"if r0 == r8 goto use_r6_%=;"
"goto loop_%=;"
"loop_end_%=:"
"r0 = 0;"
"exit;"
"use_r6_%=:"
"r0 = r10;"
"r0 += r6;"
"r1 = 7;"
"*(u64 *)(r0 + 0) = r1;"
"goto loop_%=;"
:
: __imm(bpf_iter_num_next),
__imm(bpf_get_prandom_u32)
: __clobber_all
);
}
__used __naked
static int loop1_wrapper(void)
{
/*
* int loop1_wrapper(iter) {
* r6 = -32;
* r7 = iter;
* if (unlikely(bpf_get_prandom_u32()))
* r6 = -31;
* loop1(r6, r7);
* return 0;
* }
*/
asm volatile (
"r6 = -32;"
"r7 = r1;"
"call %[bpf_get_prandom_u32];"
"r8 = r0;"
"call %[bpf_get_prandom_u32];"
"if r0 == r8 goto change_r6_%=;"
"loop_%=:"
"r1 = r6;"
"r2 = r7;"
"call loop1;"
"r0 = 0;"
"exit;"
"change_r6_%=:"
"r6 = -31;"
"goto loop_%=;"
:
: __imm(bpf_iter_num_next),
__imm(bpf_get_prandom_u32)
: __clobber_all
);
}
char _license[] SEC("license") = "GPL";