linux/kernel
Daniel Borkmann 107e215c29 bpf, lru: avoid messing with eviction heuristics upon syscall lookup
commit 50b045a8c0 upstream.

One of the biggest issues we face right now with picking LRU map over
regular hash table is that a map walk out of user space, for example,
to just dump the existing entries or to remove certain ones, will
completely mess up LRU eviction heuristics and wrong entries such
as just created ones will get evicted instead. The reason for this
is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
system call lookup side as well. Thus upon walk, all entries are
being marked, so information of actual least recently used ones
are "lost".

In case of Cilium where it can be used (besides others) as a BPF
based connection tracker, this current behavior causes disruption
upon control plane changes that need to walk the map from user space
to evict certain entries. Discussion result from bpfconf [0] was that
we should simply just remove marking from system call side as no
good use case could be found where it's actually needed there.
Therefore this patch removes marking for regular LRU and per-CPU
flavor. If there ever should be a need in future, the behavior could
be selected via map creation flag, but due to mentioned reason we
avoid this here.

  [0] http://vger.kernel.org/bpfconf.html

Fixes: 29ba732acb ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
Fixes: 8f8449384e ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25 18:23:48 +02:00
..
bpf bpf, lru: avoid messing with eviction heuristics upon syscall lookup 2019-05-25 18:23:48 +02:00
cgroup cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting 2019-04-05 22:33:13 +02:00
configs
debug
dma
events perf/core: Fix perf_event_disable_inatomic() race 2019-05-10 17:54:09 +02:00
gcov
irq genirq: Prevent use-after-free and work list corruption 2019-05-10 17:54:10 +02:00
livepatch
locking locking/rwsem: Prevent decrement of reader count before increment 2019-05-22 07:37:34 +02:00
power
printk
rcu kprobes: Prohibit probing on RCU debug routine 2019-04-05 22:33:08 +02:00
sched sched/cpufreq: Fix kobject memleak 2019-05-25 18:23:45 +02:00
time timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() 2019-04-27 09:36:38 +02:00
trace tracing: Fix partial reading of trace event's id file 2019-05-25 18:23:32 +02:00
.gitignore
acct.c
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c
audit.c
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
compat.c
configs.c
context_tracking.c
cpu_pm.c
cpu.c cpu/speculation: Add 'mitigations=' cmdline option 2019-05-14 19:17:59 +02:00
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting 2019-04-05 22:33:13 +02:00
extable.c
fail_function.c
fork.c userfaultfd: use RCU to free the task struct when fork fails 2019-05-22 07:37:41 +02:00
freezer.c
futex_compat.c
futex.c locking/futex: Allow low-level atomic operations to return -EAGAIN 2019-05-10 17:54:11 +02:00
groups.c
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:16:02 +02:00
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kernel/kcov.c: mark write_comp_data() as notrace 2019-02-12 19:47:20 +01:00
kexec_core.c
kexec_file.c
kexec_internal.h
kexec.c
kmod.c
kprobes.c kprobes: Fix error check when reusing optimized probes 2019-04-27 09:36:37 +02:00
ksysfs.c
kthread.c
latencytop.c
Makefile
memremap.c
module_signing.c
module-internal.h
module.c
notifier.c
nsproxy.c
padata.c
panic.c
params.c
pid_namespace.c
pid.c
profile.c
ptrace.c ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK 2019-05-04 09:20:22 +02:00
range.c
reboot.c
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:02:35 -07:00
resource.c
rseq.c
seccomp.c
signal.c signal: Restore the stop PTRACE_EVENT_EXIT 2019-02-20 10:25:48 +01:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:47:25 +01:00
smpboot.c
smpboot.h
softirq.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c
sysctl_binary.c
sysctl.c kernel/sysctl.c: fix out-of-bounds access when setting file-max 2019-04-27 09:36:41 +02:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog_hld.c
watchdog.c watchdog: Respect watchdog cpumask on CPU hotplug 2019-04-03 06:26:29 +02:00
workqueue_internal.h
workqueue.c workqueue: Try to catch flush_work() without INIT_WORK(). 2019-05-02 09:58:56 +02:00