From bb88f9695460bec25aa30ba9072595025cf6c8af Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 31 Oct 2022 10:35:13 +0100 Subject: [PATCH 1/4] perf: Improve missing SIGTRAP checking To catch missing SIGTRAP we employ a WARN in __perf_event_overflow(), which fires if pending_sigtrap was already set: returning to user space without consuming pending_sigtrap, and then having the event fire again would re-enter the kernel and trigger the WARN. This, however, seemed to miss the case where some events not associated with progress in the user space task can fire and the interrupt handler runs before the IRQ work meant to consume pending_sigtrap (and generate the SIGTRAP). syzbot gifted us this stack trace: | WARNING: CPU: 0 PID: 3607 at kernel/events/core.c:9313 __perf_event_overflow | Modules linked in: | CPU: 0 PID: 3607 Comm: syz-executor100 Not tainted 6.1.0-rc2-syzkaller-00073-g88619e77b33d #0 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 | RIP: 0010:__perf_event_overflow+0x498/0x540 kernel/events/core.c:9313 | <...> | Call Trace: | | perf_swevent_hrtimer+0x34f/0x3c0 kernel/events/core.c:10729 | __run_hrtimer kernel/time/hrtimer.c:1685 [inline] | __hrtimer_run_queues+0x1c6/0xfb0 kernel/time/hrtimer.c:1749 | hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 | local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1096 [inline] | __sysvec_apic_timer_interrupt+0x17c/0x640 arch/x86/kernel/apic/apic.c:1113 | sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1107 | asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649 | <...> | In this case, syzbot produced a program with event type PERF_TYPE_SOFTWARE and config PERF_COUNT_SW_CPU_CLOCK. The hrtimer manages to fire again before the IRQ work got a chance to run, all while never having returned to user space. Improve the WARN to check for real progress in user space: approximate this by storing a 32-bit hash of the current IP into pending_sigtrap, and if an event fires while pending_sigtrap still matches the previous IP, we assume no progress (false negatives are possible given we could return to user space and trigger again on the same IP). Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs") Reported-by: syzbot+b8ded3e2e2c6adde4990@syzkaller.appspotmail.com Signed-off-by: Marco Elver Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20221031093513.3032814-1-elver@google.com --- kernel/events/core.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 4ec3717003d5..884871427a94 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9306,14 +9306,27 @@ static int __perf_event_overflow(struct perf_event *event, } if (event->attr.sigtrap) { - /* - * Should not be able to return to user space without processing - * pending_sigtrap (kernel events can overflow multiple times). - */ - WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + unsigned int pending_id = 1; + + if (regs) + pending_id = hash32_ptr((void *)instruction_pointer(regs)) ?: 1; if (!event->pending_sigtrap) { - event->pending_sigtrap = 1; + event->pending_sigtrap = pending_id; local_inc(&event->ctx->nr_pending); + } else if (event->attr.exclude_kernel) { + /* + * Should not be able to return to user space without + * consuming pending_sigtrap; with exceptions: + * + * 1. Where !exclude_kernel, events can overflow again + * in the kernel without returning to user space. + * + * 2. Events that can overflow again before the IRQ- + * work without user space progress (e.g. hrtimer). + * To approximate progress (with false negatives), + * check 32-bit hash of the current IP. + */ + WARN_ON_ONCE(event->pending_sigtrap != pending_id); } event->pending_addr = data->addr; irq_work_queue(&event->pending_irq); From bdfe34597139cfcecd47a2eb97fea44d77157491 Mon Sep 17 00:00:00 2001 From: Sandipan Das Date: Thu, 8 Sep 2022 10:33:15 +0530 Subject: [PATCH 2/4] perf/x86/amd/uncore: Fix memory leak for events array When a CPU comes online, the per-CPU NB and LLC uncore contexts are freed but not the events array within the context structure. This causes a memory leak as identified by the kmemleak detector. [...] unreferenced object 0xffff8c5944b8e320 (size 32): comm "swapper/0", pid 1, jiffies 4294670387 (age 151.072s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000000759fb79>] amd_uncore_cpu_up_prepare+0xaf/0x230 [<00000000ddc9e126>] cpuhp_invoke_callback+0x2cf/0x470 [<0000000093e727d4>] cpuhp_issue_call+0x14d/0x170 [<0000000045464d54>] __cpuhp_setup_state_cpuslocked+0x11e/0x330 [<0000000069f67cbd>] __cpuhp_setup_state+0x6b/0x110 [<0000000015365e0f>] amd_uncore_init+0x260/0x321 [<00000000089152d2>] do_one_initcall+0x3f/0x1f0 [<000000002d0bd18d>] kernel_init_freeable+0x1ca/0x212 [<0000000030be8dde>] kernel_init+0x11/0x120 [<0000000059709e59>] ret_from_fork+0x22/0x30 unreferenced object 0xffff8c5944b8dd40 (size 64): comm "swapper/0", pid 1, jiffies 4294670387 (age 151.072s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000306efe8b>] amd_uncore_cpu_up_prepare+0x183/0x230 [<00000000ddc9e126>] cpuhp_invoke_callback+0x2cf/0x470 [<0000000093e727d4>] cpuhp_issue_call+0x14d/0x170 [<0000000045464d54>] __cpuhp_setup_state_cpuslocked+0x11e/0x330 [<0000000069f67cbd>] __cpuhp_setup_state+0x6b/0x110 [<0000000015365e0f>] amd_uncore_init+0x260/0x321 [<00000000089152d2>] do_one_initcall+0x3f/0x1f0 [<000000002d0bd18d>] kernel_init_freeable+0x1ca/0x212 [<0000000030be8dde>] kernel_init+0x11/0x120 [<0000000059709e59>] ret_from_fork+0x22/0x30 [...] Fix the problem by freeing the events array before freeing the uncore context. Fixes: 39621c5808f5 ("perf/x86/amd/uncore: Use dynamic events array") Reported-by: Ravi Bangoria Signed-off-by: Sandipan Das Signed-off-by: Borislav Petkov Tested-by: Ravi Bangoria Cc: Link: https://lore.kernel.org/r/4fa9e5ac6d6e41fa889101e7af7e6ba372cfea52.1662613255.git.sandipan.das@amd.com --- arch/x86/events/amd/uncore.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c index d568afc705d2..83f15fe411b3 100644 --- a/arch/x86/events/amd/uncore.c +++ b/arch/x86/events/amd/uncore.c @@ -553,6 +553,7 @@ static void uncore_clean_online(void) hlist_for_each_entry_safe(uncore, n, &uncore_unused_list, node) { hlist_del(&uncore->node); + kfree(uncore->events); kfree(uncore); } } From baa014b9543c8e5e94f5d15b66abfe60750b8284 Mon Sep 17 00:00:00 2001 From: Ravi Bangoria Date: Mon, 14 Nov 2022 10:10:29 +0530 Subject: [PATCH 3/4] perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling amd_pmu_enable_all() does: if (!test_bit(idx, cpuc->active_mask)) continue; amd_pmu_enable_event(cpuc->events[idx]); A perf NMI of another event can come between these two steps. Perf NMI handler internally disables and enables _all_ events, including the one which nmi-intercepted amd_pmu_enable_all() was in process of enabling. If that unintentionally enabled event has very low sampling period and causes immediate successive NMI, causing the event to be throttled, cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop(). This will result in amd_pmu_enable_event() getting called with event=NULL when amd_pmu_enable_all() resumes after handling the NMIs. This causes a kernel crash: BUG: kernel NULL pointer dereference, address: 0000000000000198 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page [...] Call Trace: amd_pmu_enable_all+0x68/0xb0 ctx_resched+0xd9/0x150 event_function+0xb8/0x130 ? hrtimer_start_range_ns+0x141/0x4a0 ? perf_duration_warn+0x30/0x30 remote_function+0x4d/0x60 __flush_smp_call_function_queue+0xc4/0x500 flush_smp_call_function_queue+0x11d/0x1b0 do_idle+0x18f/0x2d0 cpu_startup_entry+0x19/0x20 start_secondary+0x121/0x160 secondary_startup_64_no_verify+0xe5/0xeb amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler were recently added as part of BRS enablement but I'm not sure whether we really need them. We can just disable BRS in the beginning and enable it back while returning from NMI. This will solve the issue by not enabling those events whose active_masks are set but are not yet enabled in hw pmu. Fixes: ada543459cab ("perf/x86/amd: Add AMD Fam19h Branch Sampling support") Reported-by: Linux Kernel Functional Testing Signed-off-by: Ravi Bangoria Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com --- arch/x86/events/amd/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 8b70237c33f7..d6f3703e4119 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -861,8 +861,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs) pmu_enabled = cpuc->enabled; cpuc->enabled = 0; - /* stop everything (includes BRS) */ - amd_pmu_disable_all(); + amd_brs_disable_all(); /* Drain BRS is in use (could be inactive) */ if (cpuc->lbr_users) @@ -873,7 +872,7 @@ static int amd_pmu_handle_irq(struct pt_regs *regs) cpuc->enabled = pmu_enabled; if (pmu_enabled) - amd_pmu_enable_all(0); + amd_brs_enable_all(); return amd_pmu_adjust_nmi_window(handled); } From ce0d998be9274dd3a3d971cbeaa6fe28fd2c3062 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Sat, 12 Nov 2022 17:15:08 +0200 Subject: [PATCH 4/4] perf/x86/intel/pt: Fix sampling using single range output Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect Data When Configured With Single Range Output Larger Than 4KB" by disabling single range output whenever larger than 4KB. Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode") Signed-off-by: Adrian Hunter Signed-off-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221112151508.13768-1-adrian.hunter@intel.com --- arch/x86/events/intel/pt.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 82ef87e9a897..42a55794004a 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages) if (1 << order != nr_pages) goto out; + /* + * Some processors cannot always support single range for more than + * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might + * also be affected, so for now rather than trying to keep track of + * which ones, just disable it for all. + */ + if (nr_pages > 1) + goto out; + buf->single = true; buf->nr_pages = nr_pages; ret = 0;