linux/kernel
David Hildenbrand a9e7b8d4f6 kernel/resource: disallow access to exclusive system RAM regions
virtio-mem dynamically exposes memory inside a device memory region as
system RAM to Linux, coordinating with the hypervisor which parts are
actually "plugged" and consequently usable/accessible.

On the one hand, the virtio-mem driver adds/removes whole memory blocks,
creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other
hand, it logically (un)plugs memory inside added memory blocks,
dynamically either exposing them to the buddy or hiding them from the
buddy and marking them PG_offline.

In contrast to physical devices, like a DIMM, the virtio-mem driver is
required to actually make use of any of the device-provided memory,
because it performs the handshake with the hypervisor.  virtio-mem
memory cannot simply be access via /dev/mem without a driver.

There is no safe way to:
a) Access plugged memory blocks via /dev/mem, as they might contain
   unplugged holes or might get silently unplugged by the virtio-mem
   driver and consequently turned inaccessible.
b) Access unplugged memory blocks via /dev/mem because the virtio-mem
   driver is required to make them actually accessible first.

The virtio-spec states that unplugged memory blocks MUST NOT be written,
and only selected unplugged memory blocks MAY be read.  We want to make
sure, this is the case in sane environments -- where the virtio-mem driver
was loaded.

We want to make sure that in a sane environment, nobody "accidentially"
accesses unplugged memory inside the device managed region.  For example,
a user might spot a memory region in /proc/iomem and try accessing it via
/dev/mem via gdb or dumping it via something else.  By the time the mmap()
happens, the memory might already have been removed by the virtio-mem
driver silently: the mmap() would succeeed and user space might
accidentially access unplugged memory.

So once the driver was loaded and detected the device along the
device-managed region, we just want to disallow any access via /dev/mem to
it.

In an ideal world, we would mark the whole region as busy ("owned by a
driver") and exclude it; however, that would be wrong, as we don't really
have actual system RAM at these ranges added to Linux ("busy system RAM").
Instead, we want to mark such ranges as "not actual busy system RAM but
still soft-reserved and prepared by a driver for future use."

Let's teach iomem_is_exclusive() to reject access to any range with
"IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even
if "iomem=relaxed" is set.  Introduce EXCLUSIVE_SYSTEM_RAM to make it
easier for applicable drivers to depend on this setting in their Kconfig.

For now, there are no applicable ranges and we'll modify virtio-mem next
to properly set IORESOURCE_EXCLUSIVE on the parent resource container it
creates to contain all actual busy system RAM added via
add_memory_driver_managed().

Link: https://lkml.kernel.org/r/20210920142856.17758-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09 10:02:52 -08:00
..
bpf bpf: Fix potential race in tail call compatibility check 2021-10-26 12:37:28 -07:00
cgroup mm/page_alloc: detect allocation forbidden by cpuset and bail out early 2021-11-06 13:30:38 -07:00
configs drivers/char: remove /dev/kmem for good 2021-05-07 00:26:34 -07:00
debug kgdb patches for 5.15 2021-09-07 12:08:04 -07:00
dma memblock: use memblock_free for freeing virtual pointers 2021-11-06 13:30:41 -07:00
entry entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume() 2021-09-22 10:24:01 -04:00
events perf/core: fix userpage->time_enabled of inactive events 2021-10-01 13:57:54 +02:00
gcov Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR 2021-06-22 11:07:18 -07:00
irq irqchip fixes for 5.15, take #1 2021-09-24 14:11:04 +02:00
kcsan LKMM updates: 2021-09-02 13:00:15 -07:00
livepatch livepatch: Replace deprecated CPU-hotplug functions. 2021-08-19 12:00:24 +02:00
locking kallsyms: remove arch specific text and data check 2021-11-09 10:02:50 -08:00
power Merge branches 'pm-pci', 'pm-sleep', 'pm-domains' and 'powercap' 2021-08-30 19:25:42 +02:00
printk memblock: use memblock_free for freeing virtual pointers 2021-11-06 13:30:41 -07:00
rcu Updates for locking and atomics: 2021-08-30 14:26:36 -07:00
sched mm: move node_reclaim_distance to fix NUMA without SMP 2021-11-06 13:30:38 -07:00
time posix-cpu-timers: Prevent spuriously armed 0-value itimer 2021-09-23 11:53:51 +02:00
trace sections: move and rename core_kernel_data() to is_kernel_core_data() 2021-11-09 10:02:50 -08:00
.gitignore .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
acct.c kernel/acct.c: use dedicated helper to access rlimit values 2021-09-08 11:50:26 -07:00
async.c kernel/async.c: remove async_unregister_domain() 2021-05-07 00:26:33 -07:00
audit_fsnotify.c audit_alloc_mark(): don't open-code ERR_CAST() 2021-02-23 10:25:27 -05:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-08-24 18:52:36 -04:00
audit_watch.c
audit.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
audit.h audit: add header protection to kernel/audit.h 2021-07-19 22:38:24 -04:00
auditfilter.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
auditsc.c audit: fix possible null-pointer dereference in audit_filter_rules 2021-10-18 18:27:47 -04:00
backtracetest.c
bounds.c
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
cfi.c cfi: Use rcu_read_{un}lock_sched_notrace 2021-08-11 13:11:12 -07:00
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
cpu.c cpu/hotplug: Add debug printks for hotplug callback failures 2021-08-10 18:31:32 +02:00
crash_core.c kdump: use vmlinux_build_id to simplify 2021-07-08 11:48:22 -07:00
crash_dump.c
cred.c ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring 2021-10-20 10:34:20 -05:00
delayacct.c delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
dma.c
exec_domain.c
exit.c io_uring: remove files pointer in cancellation functions 2021-08-23 13:10:37 -06:00
extable.c extable: use is_kernel_text() helper 2021-11-09 10:02:51 -08:00
fail_function.c
fork.c kernel/fork.c: unshare(): use swap() to make code cleaner 2021-11-09 10:02:52 -08:00
freezer.c sched: Add get_current_state() 2021-06-18 11:43:08 +02:00
futex.c futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic() 2021-09-03 23:00:22 +02:00
gen_kheaders.sh kbuild: clean up ${quiet} checks in shell scripts 2021-05-27 04:01:50 +09:00
groups.c groups: simplify struct group_info allocation 2021-02-26 09:41:03 -08:00
hung_task.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
iomem.c
irq_work.c irq_work: Make irq_work_queue() NMI-safe again 2021-06-10 10:00:08 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt sched/core: Disable CONFIG_SCHED_CORE by default 2021-06-28 22:43:05 +02:00
kcov.c kcov: replace local_irq_save() with a local_lock_t 2021-11-09 10:02:52 -08:00
kexec_core.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
kexec_elf.c
kexec_file.c memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED 2021-11-06 13:30:42 -07:00
kexec_internal.h kexec: move machine_kexec_post_load() to public interface 2021-02-22 12:33:26 +00:00
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kheaders.c
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c Locking fixes: 2021-07-11 11:06:09 -07:00
ksysfs.c
kthread.c Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-02 00:43:35 +09:00
module_signature.c
module_signing.c
module-internal.h
module.c module: fix clang CFI with MODULE_UNLOAD=n 2021-09-28 12:56:26 +02:00
notifier.c notifier: Remove atomic_notifier_call_chain_robust() 2021-08-16 18:55:32 +02:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Remove repeated verbose license text 2021-08-27 16:30:18 +08:00
panic.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
params.c params: lift param_set_uint_minmax to common code 2021-08-16 14:42:22 +02:00
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2021-08-10 12:53:07 +02:00
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-08 11:50:26 -07:00
ptrace.c sched: Change task_struct::state 2021-06-18 11:43:09 +02:00
range.c
reboot.c reboot: Add hardware protection power-off 2021-06-21 13:08:36 +01:00
regset.c
relay.c
resource_kunit.c
resource.c kernel/resource: disallow access to exclusive system RAM regions 2021-11-09 10:02:52 -08:00
rseq.c KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest 2021-09-22 10:24:01 -04:00
scftorture.c scftorture: Avoid NULL pointer exception on early exit 2021-07-27 11:39:30 -07:00
scs.c
seccomp.c Merge branch 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-09-01 14:52:05 -07:00
signal.c Merge branch 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-10-21 17:27:17 -10:00
smp.c smp: Fix all kernel-doc warnings 2021-08-11 14:47:16 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c genirq: Change force_irqthreads to a static key 2021-08-10 22:50:07 +02:00
stackleak.c
stacktrace.c stacktrace: move filter_irq_stacks() to kernel/stacktrace.c 2021-11-06 13:30:43 -07:00
static_call.c static_call: Fix static_call_text_reserved() vs __init 2021-07-05 10:46:33 +02:00
stop_machine.c stop_machine: Add caller debug info to queue_stop_cpus_work 2021-03-23 16:01:58 +01:00
sys_ni.c compat: remove some compat entry points 2021-09-08 15:32:35 -07:00
sys.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Replace deprecated CPU-hotplug functions. 2021-08-10 10:48:07 -07:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c mm/mmap.c: fix a data race of mm->total_vm 2021-11-06 13:30:35 -07:00
ucount.c ucounts: Fix signal ucount refcounting 2021-10-18 16:02:30 -05:00
uid16.c
uid16.h
umh.c kernel/umh.c: fix some spelling mistakes 2021-05-07 00:26:34 -07:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: rectify kernel-doc for init_watch() 2021-01-26 11:16:34 +00:00
watchdog_hld.c
watchdog.c kernel: watchdog: modify the explanation related to watchdog thread 2021-06-29 10:53:46 -07:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00
workqueue.c workqueue, kasan: avoid alloc_pages() when recording stack 2021-11-06 13:30:33 -07:00