linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-10 07:32:29 +02:00

Author	SHA1	Message	Date
Greg Kroah-Hartman	8b444656fa	This is the 5.10.56 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEKcCUACgkQONu9yGCS aT7sMw/7BNJDmX9w+p1lgTIJJzSuz8C/eNgbeZgK7CE4DovO+WL9oEm53vqYcDDo j5REnrRhxcBYxwG/GXl1Oniv1wHqw0SplV+5G2NH1RMy23eSFGCw+8G+YOEJnU3P 94hJuEs/43Py7eZV/VtyO2UMdDRnGI6MlNvu18YjnRJcdqIIl2gln1G8wbyySYVb wR1rudvtiEdrmTQr7qGxeIrZNKGwFl0KxEl8j9X/aqxvfe8PRVYKlmtwblf5rybe TElQxz2XGRgk8g2yWQmnNoU6rfFHdZ4lTnCwfpFA1XE6/HBA64/1p22QTJUZvyOU pbQc1MRaoUncGV9UFAMY1j38JFsVar7YHHOcpp9YIJOjoyiAw4aatGDcntdWDCiG X1mCSLs10/xGRPaJJXulp786MH4aTR5qIeoNg8mu3Z3In4ElbBW5xr0wa3N8gs3O lEnK/gT2MHiQ1boa+Qy3W+XZmOjWtL69JgbOyRcOYS6lkHL4DFlGL2Nn5u8qGfL4 hzohJzH36W5SUHDQiYTt1wLNu4iHpAECjxcnk9fCvlcHA5Yu1bqgyQ62i3C9RA6a /aO0B0yraHmvCAboemDsESwylxmpiRB3caqKtzlaZjoiOfPydcBwJM46ZfbzLNPh l+/YKK2tLOXWyRIhEv8183tVeu7mZ02xjsetPtLltZPJqR+SJKE= =8nLw -----END PGP SIGNATURE----- Merge 5.10.56 into android12-5.10-lts Changes in 5.10.56 selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c io_uring: fix null-ptr-deref in io_sq_offload_start() x86/asm: Ensure asm/proto.h can be included stand-alone pipe: make pipe writes always wake up readers btrfs: fix rw device counting in __btrfs_free_extra_devids btrfs: mark compressed range uptodate only if all bio succeed Revert "ACPI: resources: Add checks for ACPI IRQ override" ACPI: DPTF: Fix reading of attributes x86/kvm: fix vcpu-id indexed array sizes KVM: add missing compat KVM_CLEAR_DIRTY_LOG ocfs2: fix zero out valid data ocfs2: issue zeroout to EOF blocks can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF can: peak_usb: pcan_usb_handle_bus_evt(): fix reading rxerr/txerr values can: mcba_usb_start(): add missing urb->transfer_dma initialization can: usb_8dev: fix memory leak can: ems_usb: fix memory leak can: esd_usb2: fix memory leak alpha: register early reserved memory in memblock HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT NIU: fix incorrect error return, missed in previous revert drm/amd/display: ensure dentist display clock update finished in DCN20 drm/amdgpu: Avoid printing of stack contents on firmware load error drm/amdgpu: Fix resource leak on probe error path blk-iocost: fix operation ordering in iocg_wake_fn() nfc: nfcsim: fix use after free during module unload cfg80211: Fix possible memory leak in function cfg80211_bss_update RDMA/bnxt_re: Fix stats counters bpf: Fix OOB read when printing XDP link fdinfo mac80211: fix enabling 4-address mode on a sta vif after assoc netfilter: conntrack: adjust stop timestamp to real expiry value netfilter: nft_nat: allow to specify layer 4 protocol NAT only i40e: Fix logic of disabling queues i40e: Fix firmware LLDP agent related warning i40e: Fix queue-to-TC mapping on Tx i40e: Fix log TC creation failure when max num of queues is exceeded tipc: fix implicit-connect for SYN+ tipc: fix sleeping in tipc accept routine net: Set true network header for ECN decapsulation net: qrtr: fix memory leaks ionic: remove intr coalesce update from napi ionic: fix up dim accounting for tx and rx ionic: count csum_none when offload enabled tipc: do not write skb_shinfo frags when doing decrytion octeontx2-pf: Fix interface down flag on error mlx4: Fix missing error code in mlx4_load_one() KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access net: llc: fix skb_over_panic drm/msm/dpu: Fix sm8250_mdp register length drm/msm/dp: Initialize the INTF_CONFIG register skmsg: Make sk_psock_destroy() static net/mlx5: Fix flow table chaining net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() sctp: fix return value check in __sctp_rcv_asconf_lookup tulip: windbond-840: Fix missing pci_disable_device() in probe and remove sis900: Fix missing pci_disable_device() in probe and remove can: hi311x: fix a signedness bug in hi3110_cmd() bpf: Introduce BPF nospec instruction for mitigating Spectre v4 bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: Remove superfluous aux sanitation on subprog rejection bpf: verifier: Allocate idmap scratch in verifier env bpf: Fix pointer arithmetic mask tightening under state pruning SMB3: fix readpage for large swap cache powerpc/pseries: Fix regression while building external modules Revert "perf map: Fix dso->nsinfo refcounting" i40e: Add additional info to PHY type error can: j1939: j1939_session_deactivate(): clarify lifetime of session object Linux 5.10.56 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib3c9244afb7ee5d6ee8d3235efe8956898f486c4	2021-08-04 15:02:23 +02:00
Daniel Borkmann	be561c0154	bpf: Fix pointer arithmetic mask tightening under state pruning commit `e042aa532c` upstream. In `7fedb63a83` ("bpf: Tighten speculative pointer arithmetic mask") we narrowed the offset mask for unprivileged pointer arithmetic in order to mitigate a corner case where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of- bounds in order to leak kernel memory via side-channel to user space. The verifier's state pruning for scalars leaves one corner case open where in the first verification path R_x holds an unknown scalar with an aux->alu_limit of e.g. 7, and in a second verification path that same register R_x, here denoted as R_x', holds an unknown scalar which has tighter bounds and would thus satisfy range_within(R_x, R_x') as well as tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3: Given the second path fits the register constraints for pruning, the final generated mask from aux->alu_limit will remain at 7. While technically not wrong for the non-speculative domain, it would however be possible to craft similar cases where the mask would be too wide as in `7fedb63a83`. One way to fix it is to detect the presence of unknown scalar map pointer arithmetic and force a deeper search on unknown scalars to ensure that we do not run into a masking mismatch. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 12:46:45 +02:00
Lorenz Bauer	ffb9d5c48b	bpf: verifier: Allocate idmap scratch in verifier env commit `c9e73e3d2b` upstream. func_states_equal makes a very short lived allocation for idmap, probably because it's too large to fit on the stack. However the function is called quite often, leading to a lot of alloc / free churn. Replace the temporary allocation with dedicated scratch space in struct bpf_verifier_env. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 12:46:45 +02:00
Daniel Borkmann	a11ca29c65	bpf: Remove superfluous aux sanitation on subprog rejection commit `59089a189e` upstream. Follow-up to `fe9a5ca7e3` ("bpf: Do not mark insn as seen under speculative path verification"). The sanitize_insn_aux_data() helper does not serve a particular purpose in today's code. The original intention for the helper was that if function-by-function verification fails, a given program would be cleared from temporary insn_aux_data[], and then its verification would be re-attempted in the context of the main program a second time. However, a failure in do_check_subprogs() will skip do_check_main() and propagate the error to the user instead, thus such situation can never occur. Given its interaction is not compatible to the Spectre v1 mitigation (due to comparing aux->seen with env->pass_cnt), just remove sanitize_insn_aux_data() to avoid future bugs in this area. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-08-04 12:46:44 +02:00
Daniel Borkmann	0e9280654a	bpf: Fix leakage due to insufficient speculative store bypass mitigation [ Upstream commit `2039f26f3a` ] Spectre v4 gadgets make use of memory disambiguation, which is a set of techniques that execute memory access instructions, that is, loads and stores, out of program order; Intel's optimization manual, section 2.4.4.5: A load instruction micro-op may depend on a preceding store. Many microarchitectures block loads until all preceding store addresses are known. The memory disambiguator predicts which loads will not depend on any previous stores. When the disambiguator predicts that a load does not have such a dependency, the load takes its data from the L1 data cache. Eventually, the prediction is verified. If an actual conflict is detected, the load and all succeeding instructions are re-executed. `af86ca4e30` ("bpf: Prevent memory disambiguation attack") tried to mitigate this attack by sanitizing the memory locations through preemptive "fast" (low latency) stores of zero prior to the actual "slow" (high latency) store of a pointer value such that upon dependency misprediction the CPU then speculatively executes the load of the pointer value and retrieves the zero value instead of the attacker controlled scalar value previously stored at that location, meaning, subsequent access in the speculative domain is then redirected to the "zero page". The sanitized preemptive store of zero prior to the actual "slow" store is done through a simple ST instruction based on r10 (frame pointer) with relative offset to the stack location that the verifier has been tracking on the original used register for STX, which does not have to be r10. Thus, there are no memory dependencies for this store, since it's only using r10 and immediate constant of zero; hence `af86ca4e30` /assumed/ a low latency operation. However, a recent attack demonstrated that this mitigation is not sufficient since the preemptive store of zero could also be turned into a "slow" store and is thus bypassed as well: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value 31: (7b) (u64 )(r10 -16) = r2 // r9 will remain "fast" register, r10 will become "slow" register below 32: (bf) r9 = r10 // JIT maps BPF reg to x86 reg: // r9 -> r15 (callee saved) // r10 -> rbp // train store forward prediction to break dependency link between both r9 // and r10 by evicting them from the predictor's LRU table. 33: (61) r0 = (u32 )(r7 +24576) 34: (63) (u32 )(r7 +29696) = r0 35: (61) r0 = (u32 )(r7 +24580) 36: (63) (u32 )(r7 +29700) = r0 37: (61) r0 = (u32 )(r7 +24584) 38: (63) (u32 )(r7 +29704) = r0 39: (61) r0 = (u32 )(r7 +24588) 40: (63) (u32 )(r7 +29708) = r0 [...] 543: (61) r0 = (u32 )(r7 +25596) 544: (63) (u32 )(r7 +30716) = r0 // prepare call to bpf_ringbuf_output() helper. the latter will cause rbp // to spill to stack memory while r13/r14/r15 (all callee saved regs) remain // in hardware registers. rbp becomes slow due to push/pop latency. below is // disasm of bpf_ringbuf_output() helper for better visual context: // // ffffffff8117ee20: 41 54 push r12 // ffffffff8117ee22: 55 push rbp // ffffffff8117ee23: 53 push rbx // ffffffff8117ee24: 48 f7 c1 fc ff ff ff test rcx,0xfffffffffffffffc // ffffffff8117ee2b: 0f 85 af 00 00 00 jne ffffffff8117eee0 <-- jump taken // [...] // ffffffff8117eee0: 49 c7 c4 ea ff ff ff mov r12,0xffffffffffffffea // ffffffff8117eee7: 5b pop rbx // ffffffff8117eee8: 5d pop rbp // ffffffff8117eee9: 4c 89 e0 mov rax,r12 // ffffffff8117eeec: 41 5c pop r12 // ffffffff8117eeee: c3 ret 545: (18) r1 = map[id:4] 547: (bf) r2 = r7 548: (b7) r3 = 0 549: (b7) r4 = 4 550: (85) call bpf_ringbuf_output#194288 // instruction 551 inserted by verifier \ 551: (7a) (u64 )(r10 -16) = 0 \| /both/ are now slow stores here // storing map value pointer r7 at fp-16 \| since value of r10 is "slow". 552: (7b) (u64 )(r10 -16) = r7 / // following "fast" read to the same memory location, but due to dependency // misprediction it will speculatively execute before insn 551/552 completes. 553: (79) r2 = (u64 )(r9 -16) // in speculative domain contains attacker controlled r2. in non-speculative // domain this contains r7, and thus accesses r7 +0 below. 554: (71) r3 = (u8 )(r2 +0) // leak r3 As can be seen, the current speculative store bypass mitigation which the verifier inserts at line 551 is insufficient since /both/, the write of the zero sanitation as well as the map value pointer are a high latency instruction due to prior memory access via push/pop of r10 (rbp) in contrast to the low latency read in line 553 as r9 (r15) which stays in hardware registers. Thus, architecturally, fp-16 is r7, however, microarchitecturally, fp-16 can still be r2. Initial thoughts to address this issue was to track spilled pointer loads from stack and enforce their load via LDX through r10 as well so that /both/ the preemptive store of zero /as well as/ the load use the /same/ register such that a dependency is created between the store and load. However, this option is not sufficient either since it can be bypassed as well under speculation. An updated attack with pointer spill/fills now _all_ based on r10 would look as follows: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value [...] // longer store forward prediction training sequence than before. 2062: (61) r0 = (u32 )(r7 +25588) 2063: (63) (u32 )(r7 +30708) = r0 2064: (61) r0 = (u32 )(r7 +25592) 2065: (63) (u32 )(r7 +30712) = r0 2066: (61) r0 = (u32 )(r7 +25596) 2067: (63) (u32 )(r7 +30716) = r0 // store the speculative load address (scalar) this time after the store // forward prediction training. 2068: (7b) (u64 )(r10 -16) = r2 // preoccupy the CPU store port by running sequence of dummy stores. 2069: (63) (u32 )(r7 +29696) = r0 2070: (63) (u32 )(r7 +29700) = r0 2071: (63) (u32 )(r7 +29704) = r0 2072: (63) (u32 )(r7 +29708) = r0 2073: (63) (u32 )(r7 +29712) = r0 2074: (63) (u32 )(r7 +29716) = r0 2075: (63) (u32 )(r7 +29720) = r0 2076: (63) (u32 )(r7 +29724) = r0 2077: (63) (u32 )(r7 +29728) = r0 2078: (63) (u32 )(r7 +29732) = r0 2079: (63) (u32 )(r7 +29736) = r0 2080: (63) (u32 )(r7 +29740) = r0 2081: (63) (u32 )(r7 +29744) = r0 2082: (63) (u32 )(r7 +29748) = r0 2083: (63) (u32 )(r7 +29752) = r0 2084: (63) (u32 )(r7 +29756) = r0 2085: (63) (u32 )(r7 +29760) = r0 2086: (63) (u32 )(r7 +29764) = r0 2087: (63) (u32 )(r7 +29768) = r0 2088: (63) (u32 )(r7 +29772) = r0 2089: (63) (u32 )(r7 +29776) = r0 2090: (63) (u32 )(r7 +29780) = r0 2091: (63) (u32 )(r7 +29784) = r0 2092: (63) (u32 )(r7 +29788) = r0 2093: (63) (u32 )(r7 +29792) = r0 2094: (63) (u32 )(r7 +29796) = r0 2095: (63) (u32 )(r7 +29800) = r0 2096: (63) (u32 )(r7 +29804) = r0 2097: (63) (u32 )(r7 +29808) = r0 2098: (63) (u32 )(r7 +29812) = r0 // overwrite scalar with dummy pointer; same as before, also including the // sanitation store with 0 from the current mitigation by the verifier. 2099: (7a) (u64 )(r10 -16) = 0 \| /both/ are now slow stores here 2100: (7b) (u64 )(r10 -16) = r7 \| since store unit is still busy. // load from stack intended to bypass stores. 2101: (79) r2 = (u64 )(r10 -16) 2102: (71) r3 = (u8 )(r2 +0) // leak r3 [...] Looking at the CPU microarchitecture, the scheduler might issue loads (such as seen in line 2101) before stores (line 2099,2100) because the load execution units become available while the store execution unit is still busy with the sequence of dummy stores (line 2069-2098). And so the load may use the prior stored scalar from r2 at address r10 -16 for speculation. The updated attack may work less reliable on CPU microarchitectures where loads and stores share execution resources. This concludes that the sanitizing with zero stores from `af86ca4e30` ("bpf: Prevent memory disambiguation attack") is insufficient. Moreover, the detection of stack reuse from `af86ca4e30` where previously data (STACK_MISC) has been written to a given stack slot where a pointer value is now to be stored does not have sufficient coverage as precondition for the mitigation either; for several reasons outlined as follows: 1) Stack content from prior program runs could still be preserved and is therefore not "random", best example is to split a speculative store bypass attack between tail calls, program A would prepare and store the oob address at a given stack slot and then tail call into program B which does the "slow" store of a pointer to the stack with subsequent "fast" read. From program B PoV such stack slot type is STACK_INVALID, and therefore also must be subject to mitigation. 2) The STACK_SPILL must not be coupled to register_is_const(&stack->spilled_ptr) condition, for example, the previous content of that memory location could also be a pointer to map or map value. Without the fix, a speculative store bypass is not mitigated in such precondition and can then lead to a type confusion in the speculative domain leaking kernel memory near these pointer types. While brainstorming on various alternative mitigation possibilities, we also stumbled upon a retrospective from Chrome developers [0]: [...] For variant 4, we implemented a mitigation to zero the unused memory of the heap prior to allocation, which cost about 1% when done concurrently and 4% for scavenging. Variant 4 defeats everything we could think of. We explored more mitigations for variant 4 but the threat proved to be more pervasive and dangerous than we anticipated. For example, stack slots used by the register allocator in the optimizing compiler could be subject to type confusion, leading to pointer crafting. Mitigating type confusion for stack slots alone would have required a complete redesign of the backend of the optimizing compiler, perhaps man years of work, without a guarantee of completeness. [...] From BPF side, the problem space is reduced, however, options are rather limited. One idea that has been explored was to xor-obfuscate pointer spills to the BPF stack: [...] // preoccupy the CPU store port by running sequence of dummy stores. [...] 2106: (63) (u32 )(r7 +29796) = r0 2107: (63) (u32 )(r7 +29800) = r0 2108: (63) (u32 )(r7 +29804) = r0 2109: (63) (u32 )(r7 +29808) = r0 2110: (63) (u32 )(r7 +29812) = r0 // overwrite scalar with dummy pointer; xored with random 'secret' value // of 943576462 before store ... 2111: (b4) w11 = 943576462 2112: (af) r11 ^= r7 2113: (7b) (u64 )(r10 -16) = r11 2114: (79) r11 = (u64 )(r10 -16) 2115: (b4) w2 = 943576462 2116: (af) r2 ^= r11 // ... and restored with the same 'secret' value with the help of AX reg. 2117: (71) r3 = (u8 )(r2 +0) [...] While the above would not prevent speculation, it would make data leakage infeasible by directing it to random locations. In order to be effective and prevent type confusion under speculation, such random secret would have to be regenerated for each store. The additional complexity involved for a tracking mechanism that prevents jumps such that restoring spilled pointers would not get corrupted is not worth the gain for unprivileged. Hence, the fix in here eventually opted for emitting a non-public BPF_ST \| BPF_NOSPEC instruction which the x86 JIT translates into a lfence opcode. Inserting the latter in between the store and load instruction is one of the mitigations options [1]. The x86 instruction manual notes: [...] An LFENCE that follows an instruction that stores to memory might complete before the data being stored have become globally visible. [...] The latter meaning that the preceding store instruction finished execution and the store is at minimum guaranteed to be in the CPU's store queue, but it's not guaranteed to be in that CPU's L1 cache at that point (globally visible). The latter would only be guaranteed via sfence. So the load which is guaranteed to execute after the lfence for that local CPU would have to rely on store-to-load forwarding. [2], in section 2.3 on store buffers says: [...] For every store operation that is added to the ROB, an entry is allocated in the store buffer. This entry requires both the virtual and physical address of the target. Only if there is no free entry in the store buffer, the frontend stalls until there is an empty slot available in the store buffer again. Otherwise, the CPU can immediately continue adding subsequent instructions to the ROB and execute them out of order. On Intel CPUs, the store buffer has up to 56 entries. [...] One small upside on the fix is that it lifts constraints from `af86ca4e30` where the sanitize_stack_off relative to r10 must be the same when coming from different paths. The BPF_ST \| BPF_NOSPEC gets emitted after a BPF_STX or BPF_ST instruction. This happens either when we store a pointer or data value to the BPF stack for the first time, or upon later pointer spills. The former needs to be enforced since otherwise stale stack data could be leaked under speculation as outlined earlier. For non-x86 JITs the BPF_ST \| BPF_NOSPEC mapping is currently optimized away, but others could emit a speculation barrier as well if necessary. For real-world unprivileged programs e.g. generated by LLVM, pointer spill/fill is only generated upon register pressure and LLVM only tries to do that for pointers which are not used often. The program main impact will be the initial BPF_ST \| BPF_NOSPEC sanitation for the STACK_INVALID case when the first write to a stack slot occurs e.g. upon map lookup. In future we might refine ways to mitigate the latter cost. [0] https://arxiv.org/pdf/1902.05178.pdf [1] https://msrc-blog.microsoft.com/2018/05/21/analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639/ [2] https://arxiv.org/pdf/1905.05725.pdf Fixes: `af86ca4e30` ("bpf: Prevent memory disambiguation attack") Fixes: `f7cf25b202` ("bpf: track spill/fill of constants") Co-developed-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-04 12:46:44 +02:00
Daniel Borkmann	bea9e2fd18	bpf: Introduce BPF nospec instruction for mitigating Spectre v4 [ Upstream commit `f5e81d1117` ] In case of JITs, each of the JIT backends compiles the BPF nospec instruction /either/ to a machine instruction which emits a speculation barrier /or/ to /no/ machine instruction in case the underlying architecture is not affected by Speculative Store Bypass or has different mitigations in place already. This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence' instruction for mitigation. In case of arm64, we rely on the firmware mitigation as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled, it works for all of the kernel code with no need to provide any additional instructions here (hence only comment in arm64 JIT). Other archs can follow as needed. The BPF nospec instruction is specifically targeting Spectre v4 since i) we don't use a serialization barrier for the Spectre v1 case, and ii) mitigation instructions for v1 and v4 might be different on some archs. The BPF nospec is required for a future commit, where the BPF verifier does annotate intermediate BPF programs with speculation barriers. Co-developed-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-08-04 12:46:44 +02:00
Greg Kroah-Hartman	1afedcdcf8	This is the 5.10.55 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEE6r8ACgkQONu9yGCS aT54gw/7B1RbIpcTJlhtiQVYvwMJTX/ZgoxkSsffz6G7TXaBbg/x29cuc3cI9IUJ fr99zp8gQbQMibXqCaIPBloetSdquX9G+gOmzVSIS+ac2Wx/pH29qRz1fslWq8Mj hOg8t2Ce+LoAgGZzSXswhvIL1Zep8HZDlRKKcolmiGEJFBr/4nPfEVOeFNpiTUYF cb5RtB1nRR0DJa8wWr18S7v2/TBdR9V7TgGuwRnpVviIQjjfE/S+lmbgtzHgm5Tf cBtP7Ta/eIGeLQIyQEoEWV2ViJRmSGk4s2dOcQ/kUMje5de+J03Hkp4jeJbB2ndx UUos2Xm4D1j4jsIBZqJKV+Fzf9RzdNXQp4VJIfzvbDbZn8y9xHRcXm/ChNBz/B5L 8oKcdM9Sf8DxiztlZ4c40IndYjwp0QxlubCYDfwYQfYB/6LLrffrgPZu/TgWw9lj lXOBUZq7yHuo9OSAIfGCCtKKnzA63wLv3zTmNmSO9fDoXUl0VHBrZC6apgB6AC9T rzvOOgRC6NfWViZl2E3Op7XfOg2jstpRXUr7a0HSaQp4QfmlLJ9HnDywQfNQeX4+ HyAxhXo8gHPDgh8DK7yRObBhu24yQcy+ukesQwkSS2m0j2ilfQszk0uP1bfFOuq2 G5Z8VMQPsIMsSijEyuonLRVUPfp8herhWMP0YsZZeEnxZvaSEaM= =li6t -----END PGP SIGNATURE----- Merge 5.10.55 into android12-5.10-lts Changes in 5.10.55 tools: Allow proper CC/CXX/... override with LLVM=1 in Makefile.include io_uring: fix link timeout refs KVM: x86: determine if an exception has an error code only when injecting it. af_unix: fix garbage collect vs MSG_PEEK workqueue: fix UAF in pwq_unbound_release_workfn() cgroup1: fix leaked context root causing sporadic NULL deref in LTP net/802/mrp: fix memleak in mrp_request_join() net/802/garp: fix memleak in garp_request_join() net: annotate data race around sk_ll_usec sctp: move 198 addresses from unusable to private scope rcu-tasks: Don't delete holdouts within trc_inspect_reader() rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader() ipv6: allocate enough headroom in ip6_finish_output2() drm/ttm: add a check against null pointer dereference hfs: add missing clean-up in hfs_fill_super hfs: fix high memory mapping in hfs_bnode_read hfs: add lock nesting notation to hfs_find_init firmware: arm_scmi: Fix possible scmi_linux_errmap buffer overflow firmware: arm_scmi: Fix range check for the maximum number of pending messages cifs: fix the out of range assignment to bit fields in parse_server_interfaces iomap: remove the length variable in iomap_seek_data iomap: remove the length variable in iomap_seek_hole ARM: dts: versatile: Fix up interrupt controller node names ipv6: ip6_finish_output2: set sk into newly allocated nskb Linux 5.10.55 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I2d673bdde784b3689af73289305091dbd4ead042	2021-07-31 08:51:04 +02:00
Paul E. McKenney	86cb49e731	rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader() [ Upstream commit `a9ab9cce93` ] Invoking trc_del_holdout() from within trc_wait_for_one_reader() is only a performance optimization because the RCU Tasks Trace grace-period kthread will eventually do this within check_all_holdout_tasks_trace(). But it is not a particularly important performance optimization because it only applies to the grace-period kthread, of which there is but one. This commit therefore removes this invocation of trc_del_holdout() in favor of the one in check_all_holdout_tasks_trace() in the grace-period kthread. Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-31 08:16:11 +02:00
Paul E. McKenney	55ddab2bfd	rcu-tasks: Don't delete holdouts within trc_inspect_reader() [ Upstream commit `1d10bf55d8` ] As Yanfei pointed out, although invoking trc_del_holdout() is safe from the viewpoint of the integrity of the holdout list itself, the put_task_struct() invoked by trc_del_holdout() can result in use-after-free errors due to later accesses to this task_struct structure by the RCU Tasks Trace grace-period kthread. This commit therefore removes this call to trc_del_holdout() from trc_inspect_reader() in favor of the grace-period thread's existing call to trc_del_holdout(), thus eliminating that particular class of use-after-free errors. Reported-by: "Xu, Yanfei" <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-31 08:16:11 +02:00
Paul Gortmaker	df34f88862	cgroup1: fix leaked context root causing sporadic NULL deref in LTP commit `1e7107c5ef` upstream. Richard reported sporadic (roughly one in 10 or so) null dereferences and other strange behaviour for a set of automated LTP tests. Things like: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:kernfs_sop_show_path+0x1b/0x60 ...or these others: RIP: 0010:do_mkdirat+0x6a/0xf0 RIP: 0010:d_alloc_parallel+0x98/0x510 RIP: 0010:do_readlinkat+0x86/0x120 There were other less common instances of some kind of a general scribble but the common theme was mount and cgroup and a dubious dentry triggering the NULL dereference. I was only able to reproduce it under qemu by replicating Richard's setup as closely as possible - I never did get it to happen on bare metal, even while keeping everything else the same. In commit `71d883c37e` ("cgroup_do_mount(): massage calling conventions") we see this as a part of the overall change: -------------- struct cgroup_subsys ss; - struct dentry dentry; [...] - dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root, - CGROUP_SUPER_MAGIC, ns); [...] - if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) { - struct super_block sb = dentry->d_sb; - dput(dentry); + ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns); + if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) { + struct super_block sb = fc->root->d_sb; + dput(fc->root); deactivate_locked_super(sb); msleep(10); return restart_syscall(); } -------------- In changing from the local "*dentry" variable to using fc->root, we now export/leave that dentry pointer in the file context after doing the dput() in the unlikely "is_dying" case. With LTP doing a crazy amount of back to back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely becomes slightly likely and then bad things happen. A fix would be to not leave the stale reference in fc->root as follows: -------------- dput(fc->root); + fc->root = NULL; deactivate_locked_super(sb); -------------- ...but then we are just open-coding a duplicate of fc_drop_locked() so we simply use that instead. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@vger.kernel.org # v5.1+ Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org> Fixes: `71d883c37e` ("cgroup_do_mount(): massage calling conventions") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-31 08:16:11 +02:00
Yang Yingliang	dcd00801f3	workqueue: fix UAF in pwq_unbound_release_workfn() commit `b42b0bddcb` upstream. I got a UAF report when doing fuzz test: [ 152.880091][ T8030] ================================================================== [ 152.881240][ T8030] BUG: KASAN: use-after-free in pwq_unbound_release_workfn+0x50/0x190 [ 152.882442][ T8030] Read of size 4 at addr ffff88810d31bd00 by task kworker/3:2/8030 [ 152.883578][ T8030] [ 152.883932][ T8030] CPU: 3 PID: 8030 Comm: kworker/3:2 Not tainted 5.13.0+ #249 [ 152.885014][ T8030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 152.886442][ T8030] Workqueue: events pwq_unbound_release_workfn [ 152.887358][ T8030] Call Trace: [ 152.887837][ T8030] dump_stack_lvl+0x75/0x9b [ 152.888525][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.889371][ T8030] print_address_description.constprop.10+0x48/0x70 [ 152.890326][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.891163][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.891999][ T8030] kasan_report.cold.15+0x82/0xdb [ 152.892740][ T8030] ? pwq_unbound_release_workfn+0x50/0x190 [ 152.893594][ T8030] __asan_load4+0x69/0x90 [ 152.894243][ T8030] pwq_unbound_release_workfn+0x50/0x190 [ 152.895057][ T8030] process_one_work+0x47b/0x890 [ 152.895778][ T8030] worker_thread+0x5c/0x790 [ 152.896439][ T8030] ? process_one_work+0x890/0x890 [ 152.897163][ T8030] kthread+0x223/0x250 [ 152.897747][ T8030] ? set_kthread_struct+0xb0/0xb0 [ 152.898471][ T8030] ret_from_fork+0x1f/0x30 [ 152.899114][ T8030] [ 152.899446][ T8030] Allocated by task 8884: [ 152.900084][ T8030] kasan_save_stack+0x21/0x50 [ 152.900769][ T8030] __kasan_kmalloc+0x88/0xb0 [ 152.901416][ T8030] __kmalloc+0x29c/0x460 [ 152.902014][ T8030] alloc_workqueue+0x111/0x8e0 [ 152.902690][ T8030] __btrfs_alloc_workqueue+0x11e/0x2a0 [ 152.903459][ T8030] btrfs_alloc_workqueue+0x6d/0x1d0 [ 152.904198][ T8030] scrub_workers_get+0x1e8/0x490 [ 152.904929][ T8030] btrfs_scrub_dev+0x1b9/0x9c0 [ 152.905599][ T8030] btrfs_ioctl+0x122c/0x4e50 [ 152.906247][ T8030] __x64_sys_ioctl+0x137/0x190 [ 152.906916][ T8030] do_syscall_64+0x34/0xb0 [ 152.907535][ T8030] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 152.908365][ T8030] [ 152.908688][ T8030] Freed by task 8884: [ 152.909243][ T8030] kasan_save_stack+0x21/0x50 [ 152.909893][ T8030] kasan_set_track+0x20/0x30 [ 152.910541][ T8030] kasan_set_free_info+0x24/0x40 [ 152.911265][ T8030] __kasan_slab_free+0xf7/0x140 [ 152.911964][ T8030] kfree+0x9e/0x3d0 [ 152.912501][ T8030] alloc_workqueue+0x7d7/0x8e0 [ 152.913182][ T8030] __btrfs_alloc_workqueue+0x11e/0x2a0 [ 152.913949][ T8030] btrfs_alloc_workqueue+0x6d/0x1d0 [ 152.914703][ T8030] scrub_workers_get+0x1e8/0x490 [ 152.915402][ T8030] btrfs_scrub_dev+0x1b9/0x9c0 [ 152.916077][ T8030] btrfs_ioctl+0x122c/0x4e50 [ 152.916729][ T8030] __x64_sys_ioctl+0x137/0x190 [ 152.917414][ T8030] do_syscall_64+0x34/0xb0 [ 152.918034][ T8030] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 152.918872][ T8030] [ 152.919203][ T8030] The buggy address belongs to the object at ffff88810d31bc00 [ 152.919203][ T8030] which belongs to the cache kmalloc-512 of size 512 [ 152.921155][ T8030] The buggy address is located 256 bytes inside of [ 152.921155][ T8030] 512-byte region [ffff88810d31bc00, ffff88810d31be00) [ 152.922993][ T8030] The buggy address belongs to the page: [ 152.923800][ T8030] page:ffffea000434c600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d318 [ 152.925249][ T8030] head:ffffea000434c600 order:2 compound_mapcount:0 compound_pincount:0 [ 152.926399][ T8030] flags: 0x57ff00000010200(slab\|head\|node=1\|zone=2\|lastcpupid=0x7ff) [ 152.927515][ T8030] raw: 057ff00000010200 dead000000000100 dead000000000122 ffff888009c42c80 [ 152.928716][ T8030] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 152.929890][ T8030] page dumped because: kasan: bad access detected [ 152.930759][ T8030] [ 152.931076][ T8030] Memory state around the buggy address: [ 152.931851][ T8030] ffff88810d31bc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.932967][ T8030] ffff88810d31bc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.934068][ T8030] >ffff88810d31bd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.935189][ T8030] ^ [ 152.935763][ T8030] ffff88810d31bd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 152.936847][ T8030] ffff88810d31be00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 152.937940][ T8030] ================================================================== If apply_wqattrs_prepare() fails in alloc_workqueue(), it will call put_pwq() which invoke a work queue to call pwq_unbound_release_workfn() and use the 'wq'. The 'wq' allocated in alloc_workqueue() will be freed in error path when apply_wqattrs_prepare() fails. So it will lead a UAF. CPU0 CPU1 alloc_workqueue() alloc_and_link_pwqs() apply_wqattrs_prepare() fails apply_wqattrs_cleanup() schedule_work(&pwq->unbound_release_work) kfree(wq) worker_thread() pwq_unbound_release_workfn() <- trigger uaf here If apply_wqattrs_prepare() fails, the new pwq are not linked, it doesn't hold any reference to the 'wq', 'wq' is invalid to access in the worker, so add check pwq if linked to fix this. Fixes: `2d5f0764b5` ("workqueue: split apply_workqueue_attrs() into 3 stages") Cc: stable@vger.kernel.org # v4.2+ Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-31 08:16:11 +02:00
Greg Kroah-Hartman	57e177ea01	Merge branch 'android12-5.10' into `android12-5.10-lts` Sync up with android12-5.10 for the following commits: `2a8bfea53d` UPSTREAM: kernel/irq: export irq_gc_set_wake `36fbb55631` FROMGIT: procfs: prevent unpriveleged processes accessing fdinfo dir `fc2d64ec5d` FROMGIT: f2fs: don't sleep while grabing nat_tree_lock `cf1646cba3` FROMLIST: scsi: ufs: Allow async suspend/resume callbacks `98afdd197f` ANDROID: ABI: update generic symbol list and ABI XML `e2e063f507` ANDROID: scsi: ufs: add vendor hook to override key reprogramming `198e728044` ANDROID: GKI: Add rockchip symbol list Change-Id: Ib1c25472ab1be58611ee2fa61047ceef48f00e7e Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2021-07-30 13:34:01 +02:00
Jianqun Xu	2a8bfea53d	UPSTREAM: kernel/irq: export irq_gc_set_wake Module driver may use irq_gc_set_wake. Bug: 194515348 Change-Id: I52f43e1dff15d987532395e5151e65419b5904b2 Signed-off-by: Jianqun Xu <jay.xu@rock-chips.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210305080658.2422114-1-jay.xu@rock-chips.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> (cherry picked from commit `024c79520f`) Signed-off-by: Kever Yang <kever.yang@rock-chips.com>	2021-07-30 06:41:28 +00:00
Greg Kroah-Hartman	e4cac2c332	This is the 5.10.54 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmEBT00ACgkQONu9yGCS aT7svA/9HCRwW+pK3UpK1+0FK7gGH8DA3jSONj775zVEKhboDZNIwZsDqG0Ly+jm /JejWXPKZlekaDXgzBfZY3H59xgij/VwYHe8p7cdxfi1TlhmAQwFjLNZnWav8as6 IyNkpsDJn8fMXmfHDi2u3cb8wrVi/aQDzTlwu88cUtREyZCaaYlo0Fdv9MJyhww/ p6LWPYQoZ8TmFY+Y/2ORVxFos2UVuU0hhhMdGt9LrX2WNEGRNUZUqmbhcXYfdsX0 ckSHbijIcWdcka3nQ6yOvdxw75rTqd8c/bP0y+yAteeJ0CykjVnI2cdK+M2ZEi4j /JqpGJrRWhsZf5MiO8b3k+I1K62JDa1GYBQ9Amp8FKKzjYLPTNeFAP9IsMyDc4oi oW98XM7XzoSEU9t/FSAIGT0hYK9k+lnPxw623LhxD6x3VPynnNAnQsLr+HirOgG6 mZ79L4ZFu3lUvVsCuCgKn/uxwDopUNlhqo5B2/4M2kSWwe2Xu5bExpGc2bT9xCOP 6fF9DmvmpG1UPGCXrOqaxemyEPmHqmyjKJpxDt6vZqlOL9vqHez4WmEEM1C+E2NZ 5VKKbBk/KZDxNX9EiFOtI2HRFb1cghoI2Hcb/QjRoB9Dv3a6cHgjxDl0eKm8SiDN +1ytV0IFH3fT4aRiXJ7I3GBwkjKcDaX0sjYwtnCx9s5XZmm9PRQ= =HAyL -----END PGP SIGNATURE----- Merge 5.10.54 into android12-5.10-lts Changes in 5.10.54 igc: Fix use-after-free error during reset igb: Fix use-after-free error during reset igc: change default return of igc_read_phy_reg() ixgbe: Fix an error handling path in 'ixgbe_probe()' igc: Fix an error handling path in 'igc_probe()' igb: Fix an error handling path in 'igb_probe()' fm10k: Fix an error handling path in 'fm10k_probe()' e1000e: Fix an error handling path in 'e1000_probe()' iavf: Fix an error handling path in 'iavf_probe()' igb: Check if num of q_vectors is smaller than max before array access igb: Fix position of assignment to *ring gve: Fix an error handling path in 'gve_probe()' net: add kcov handle to skb extensions bonding: fix suspicious RCU usage in bond_ipsec_add_sa() bonding: fix null dereference in bond_ipsec_add_sa() ixgbevf: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops bonding: fix suspicious RCU usage in bond_ipsec_del_sa() bonding: disallow setting nested bonding + ipsec offload bonding: Add struct bond_ipesc to manage SA bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() bonding: fix incorrect return value of bond_ipsec_offload_ok() ipv6: fix 'disable_policy' for fwd packets stmmac: platform: Fix signedness bug in stmmac_probe_config_dt() selftests: icmp_redirect: remove from checking for IPv6 route get selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped cxgb4: fix IRQ free race during driver unload mptcp: fix warning in __skb_flow_dissect() when do syn cookie for subflow join nvme-pci: do not call nvme_dev_remove_admin from nvme_remove KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM perf inject: Fix dso->nsinfo refcounting perf map: Fix dso->nsinfo refcounting perf probe: Fix dso->nsinfo refcounting perf env: Fix sibling_dies memory leak perf test session_topology: Delete session->evlist perf test event_update: Fix memory leak of evlist perf dso: Fix memory leak in dso__new_map() perf test maps__merge_in: Fix memory leak of maps perf env: Fix memory leak of cpu_pmu_caps perf report: Free generated help strings for sort option perf script: Fix memory 'threads' and 'cpus' leaks on exit perf lzma: Close lzma stream on exit perf probe-file: Delete namelist in del_events() on the error path perf data: Close all files in close_dir() perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set ASoC: wm_adsp: Correct wm_coeff_tlv_get handling spi: imx: add a check for speed_hz before calculating the clock spi: stm32: fixes pm_runtime calls in probe/remove regulator: hi6421: Use correct variable type for regmap api val argument regulator: hi6421: Fix getting wrong drvdata spi: mediatek: fix fifo rx mode ASoC: rt5631: Fix regcache sync errors on resume bpf, test: fix NULL pointer dereference on invalid expected_attach_type bpf: Fix tail_call_reachable rejection for interpreter when jit failed xdp, net: Fix use-after-free in bpf_xdp_link_release timers: Fix get_next_timer_interrupt() with no timers pending liquidio: Fix unintentional sign extension issue on left shift of u16 s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1] bpf, sockmap: Fix potential memory leak on unlikely error case bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats bpftool: Check malloc return value in mount_bpffs_for_pin net: fix uninit-value in caif_seqpkt_sendmsg usb: hso: fix error handling code of hso_create_net_device dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable} efi/tpm: Differentiate missing and invalid final event log table. net: decnet: Fix sleeping inside in af_decnet KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak net: sched: fix memory leak in tcindex_partial_destroy_work sctp: trim optlen when it's a huge value in sctp_setsockopt netrom: Decrease sock refcount when sock timers expire scsi: iscsi: Fix iface sysfs attr detection scsi: target: Fix protect handling in WRITE SAME(32) spi: cadence: Correct initialisation of runtime PM again ACPI: Kconfig: Fix table override from built-in initrd bnxt_en: don't disable an already disabled PCI device bnxt_en: Refresh RoCE capabilities in bnxt_ulp_probe() bnxt_en: Add missing check for BNXT_STATE_ABORT_ERR in bnxt_fw_rset_task() bnxt_en: Validate vlan protocol ID on RX packets bnxt_en: Check abort error state in bnxt_half_open_nic() net: hisilicon: rename CACHE_LINE_MASK to avoid redefinition net/tcp_fastopen: fix data races around tfo_active_disable_stamp ALSA: hda: intel-dsp-cfg: add missing ElkhartLake PCI ID net: hns3: fix possible mismatches resp of mailbox net: hns3: fix rx VLAN offload state inconsistent issue spi: spi-bcm2835: Fix deadlock net/sched: act_skbmod: Skip non-Ethernet packets ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions ceph: don't WARN if we're still opening a session to an MDS nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem" afs: Fix tracepoint string placement with built-in AFS r8169: Avoid duplicate sysfs entry creation error nvme: set the PRACT bit when using Write Zeroes with T10 PI sctp: update active_key for asoc when old key is being replaced tcp: disable TFO blackhole logic by default net: dsa: sja1105: make VID 4095 a bridge VLAN too net: sched: cls_api: Fix the the wrong parameter drm/panel: raspberrypi-touchscreen: Prevent double-free cifs: only write 64kb at a time when fallocating a small region of a file cifs: fix fallocate when trying to allocate a hole. proc: Avoid mixing integer types in mem_rw() mmc: core: Don't allocate IDA for OF aliases s390/ftrace: fix ftrace_update_ftrace_func implementation s390/boot: fix use of expolines in the DMA code ALSA: usb-audio: Add missing proc text entry for BESPOKEN type ALSA: usb-audio: Add registration quirk for JBL Quantum headsets ALSA: sb: Fix potential ABBA deadlock in CSP driver ALSA: hda/realtek: Fix pop noise and 2 Front Mic issues on a machine ALSA: hdmi: Expose all pins on MSI MS-7C94 board ALSA: pcm: Call substream ack() method upon compat mmap commit ALSA: pcm: Fix mmap capability check Revert "usb: renesas-xhci: Fix handling of unknown ROM state" usb: xhci: avoid renesas_usb_fw.mem when it's unusable xhci: Fix lost USB 2 remote wake KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state usb: hub: Disable USB 3 device initiated lpm if exit latency is too high usb: hub: Fix link power management max exit latency (MEL) calculations USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS usb: max-3421: Prevent corruption of freed memory usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop() USB: serial: option: add support for u-blox LARA-R6 family USB: serial: cp210x: fix comments for GE CS1000 USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick usb: gadget: Fix Unbalanced pm_runtime_enable in tegra_xudc_probe usb: dwc2: gadget: Fix GOUTNAK flow for Slave mode. usb: dwc2: gadget: Fix sending zero length packet in DDMA mode. usb: typec: stusb160x: register role switch before interrupt registration firmware/efi: Tell memblock about EFI iomem reservations tracepoints: Update static_call before tp_funcs when adding a tracepoint tracing/histogram: Rename "cpu" to "common_cpu" tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. tracing: Synthetic event field_pos is an index not a boolean btrfs: check for missing device in btrfs_trim_fs media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf() ixgbe: Fix packet corruption due to missing DMA sync bus: mhi: core: Validate channel ID when processing command completions posix-cpu-timers: Fix rearm racing against process tick selftest: use mmap instead of posix_memalign to allocate memory io_uring: explicitly count entries for poll reqs io_uring: remove double poll entry on arm failure userfaultfd: do not untag user pointers memblock: make for_each_mem_range() traverse MEMBLOCK_HOTPLUG regions hugetlbfs: fix mount mode command line processing rbd: don't hold lock_rwsem while running_list is being drained rbd: always kick acquire on "acquired" and "released" notifications misc: eeprom: at24: Always append device id even if label property is set. nds32: fix up stack guard gap driver core: Prevent warning when removing a device link from unregistered consumer drm: Return -ENOTTY for non-drm ioctls drm/amdgpu: update golden setting for sienna_cichlid net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d on Topaz PCI: Mark AMD Navi14 GPU ATS as broken bonding: fix build issue skbuff: Release nfct refcount on napi stolen or re-used skbs Documentation: Fix intiramfs script name perf inject: Close inject.output on exit usb: ehci: Prevent missed ehci interrupts with edge-triggered MSI drm/i915/gvt: Clear d3_entered on elsp cmd submission. sfc: ensure correct number of XDP queues xhci: add xhci_get_virt_ep() helper skbuff: Fix build with SKB extensions disabled Linux 5.10.54 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifd2823b47ab1544cd1f168b138624ffe060a471e	2021-07-28 15:23:47 +02:00
Frederic Weisbecker	6e81e2c38a	posix-cpu-timers: Fix rearm racing against process tick commit `1a3402d93c` upstream. Since the process wide cputime counter is started locklessly from posix_cpu_timer_rearm(), it can be concurrently stopped by operations on other timers from the same thread group, such as in the following unlucky scenario: CPU 0 CPU 1 ----- ----- timer_settime(TIMER B) posix_cpu_timer_rearm(TIMER A) cpu_clock_sample_group() (pct->timers_active already true) handle_posix_cpu_timers() check_process_timers() stop_process_timers() pct->timers_active = false arm_timer(TIMER A) tick -> run_posix_cpu_timers() // sees !pct->timers_active, ignore // our TIMER A Fix this with simply locking process wide cputime counting start and timer arm in the same block. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Fixes: `60f2ceaa81` ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group") Cc: stable@vger.kernel.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 14:35:45 +02:00
Steven Rostedt (VMware)	552b053f1a	tracing: Synthetic event field_pos is an index not a boolean commit `3b13911a2f` upstream. Performing the following: ># echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events ># echo 'hist:keys=pid:__arg__1=common_timestamp.usecs' > events/sched/sched_waking/trigger ># echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1:onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,prev_comm)'\ > events/sched/sched_switch/trigger ># echo 1 > events/synthetic/enable Crashed the kernel: BUG: kernel NULL pointer dereference, address: 000000000000001b #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.13.0-rc5-test+ #104 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strlen+0x0/0x20 Code: f6 82 80 2b 0b bc 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2b 0b bc 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 9 f8 c3 31 RSP: 0018:ffffaa75000d79d0 EFLAGS: 00010046 RAX: 0000000000000002 RBX: ffff9cdb55575270 RCX: 0000000000000000 RDX: ffff9cdb58c7a320 RSI: ffffaa75000d7b40 RDI: 000000000000001b RBP: ffffaa75000d7b40 R08: ffff9cdb40a4f010 R09: ffffaa75000d7ab8 R10: ffff9cdb4398c700 R11: 0000000000000008 R12: ffff9cdb58c7a320 R13: ffff9cdb55575270 R14: ffff9cdb58c7a000 R15: 0000000000000018 FS: 0000000000000000(0000) GS:ffff9cdb5aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001b CR3: 00000000c0612006 CR4: 00000000001706e0 Call Trace: trace_event_raw_event_synth+0x90/0x1d0 action_trace+0x5b/0x70 event_hist_trigger+0x4bd/0x4e0 ? cpumask_next_and+0x20/0x30 ? update_sd_lb_stats.constprop.0+0xf6/0x840 ? __lock_acquire.constprop.0+0x125/0x550 ? find_held_lock+0x32/0x90 ? sched_clock_cpu+0xe/0xd0 ? lock_release+0x155/0x440 ? update_load_avg+0x8c/0x6f0 ? enqueue_entity+0x18a/0x920 ? __rb_reserve_next+0xe5/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x1ae/0x240 trace_event_raw_event_sched_switch+0x114/0x170 __traceiter_sched_switch+0x39/0x50 __schedule+0x431/0xb00 schedule_idle+0x28/0x40 do_idle+0x198/0x2e0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb The reason is that the dynamic events array keeps track of the field position of the fields array, via the field_pos variable in the synth_field structure. Unfortunately, that field is a boolean for some reason, which means any field_pos greater than 1 will be a bug (in this case it was 2). Link: https://lkml.kernel.org/r/20210721191008.638bce34@oasis.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: `bd82631d7c` ("tracing: Add support for dynamic strings to synthetic events") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 14:35:45 +02:00
Haoran Luo	757bdba802	tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop. commit `67f0d6d988` upstream. The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when "head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to the same buffer page, whose "buffer_data_page" is empty and "read" field is non-zero. An error scenario could be constructed as followed (kernel perspective): 1. All pages in the buffer has been accessed by reader(s) so that all of them will have non-zero "read" field. 2. Read and clear all buffer pages so that "rb_num_of_entries()" will return 0 rendering there's no more data to read. It is also required that the "read_page", "commit_page" and "tail_page" points to the same page, while "head_page" is the next page of them. 3. Invoke "ring_buffer_lock_reserve()" with large enough "length" so that it shot pass the end of current tail buffer page. Now the "head_page", "commit_page" and "tail_page" points to the same page. 4. Discard current event with "ring_buffer_discard_commit()", so that "head_page", "commit_page" and "tail_page" points to a page whose buffer data page is now empty. When the error scenario has been constructed, "tracing_read_pipe" will be trapped inside a deadloop: "trace_empty()" returns 0 since "rb_per_cpu_empty()" returns 0 when it hits the CPU containing such constructed ring buffer. Then "trace_find_next_entry_inc()" always return NULL since "rb_num_of_entries()" reports there's no more entry to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking "tracing_read_pipe" back to the start of the "waitagain" loop. I've also written a proof-of-concept script to construct the scenario and trigger the bug automatically, you can use it to trace and validate my reasoning above: https://github.com/aegistudio/RingBufferDetonator.git Tests has been carried out on linux kernel 5.14-rc2 (`2734d6c1b1`), my fixed version of kernel (for testing whether my update fixes the bug) and some older kernels (for range of affected kernels). Test result is also attached to the proof-of-concept repository. Link: https://lore.kernel.org/linux-trace-devel/YPaNxsIlb2yjSi5Y@aegistudio/ Link: https://lore.kernel.org/linux-trace-devel/YPgrN85WL9VyrZ55@aegistudio Cc: stable@vger.kernel.org Fixes: `bf41a158ca` ("ring-buffer: make reentrant") Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Haoran Luo <www@aegistudio.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 14:35:45 +02:00
Steven Rostedt (VMware)	a5e1aff589	tracing/histogram: Rename "cpu" to "common_cpu" commit `1e3bac71c5` upstream. Currently the histogram logic allows the user to write "cpu" in as an event field, and it will record the CPU that the event happened on. The problem with this is that there's a lot of events that have "cpu" as a real field, and using "cpu" as the CPU it ran on, makes it impossible to run histograms on the "cpu" field of events. For example, if I want to have a histogram on the count of the workqueue_queue_work event on its cpu field, running: ># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger Gives a misleading and wrong result. Change the command to "common_cpu" as no event should have "common_*" fields as that's a reserved name for fields used by all events. And this makes sense here as common_cpu would be a field used by all events. Now we can even do: ># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger ># cat events/workqueue/workqueue_queue_work/hist # event histogram # # trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active] # { common_cpu: 0, cpu: 2 } hitcount: 1 { common_cpu: 0, cpu: 4 } hitcount: 1 { common_cpu: 7, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 1 } hitcount: 1 { common_cpu: 0, cpu: 6 } hitcount: 2 { common_cpu: 0, cpu: 5 } hitcount: 2 { common_cpu: 1, cpu: 1 } hitcount: 4 { common_cpu: 6, cpu: 6 } hitcount: 4 { common_cpu: 5, cpu: 5 } hitcount: 14 { common_cpu: 4, cpu: 4 } hitcount: 26 { common_cpu: 0, cpu: 0 } hitcount: 39 { common_cpu: 2, cpu: 2 } hitcount: 184 Now for backward compatibility, I added a trick. If "cpu" is used, and the field is not found, it will fall back to "common_cpu" and work as it did before. This way, it will still work for old programs that use "cpu" to get the actual CPU, but if the event has a "cpu" as a field, it will get that event's "cpu" field, which is probably what it wants anyway. I updated the tracefs/README to include documentation about both the common_timestamp and the common_cpu. This way, if that text is present in the README, then an application can know that common_cpu is supported over just plain "cpu". Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: `8b7622bf94` ("tracing: Add cpu field for hist triggers") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 14:35:45 +02:00
Steven Rostedt (VMware)	0edad8b9f6	tracepoints: Update static_call before tp_funcs when adding a tracepoint commit `352384d5c8` upstream. Because of the significant overhead that retpolines pose on indirect calls, the tracepoint code was updated to use the new "static_calls" that can modify the running code to directly call a function instead of using an indirect caller, and this function can be changed at runtime. In the tracepoint code that calls all the registered callbacks that are attached to a tracepoint, the following is done: it_func_ptr = rcu_dereference_raw((&__tracepoint_##name)->funcs); if (it_func_ptr) { __data = (it_func_ptr)->data; static_call(tp_func_##name)(__data, args); } If there's just a single callback, the static_call is updated to just call that callback directly. Once another handler is added, then the static caller is updated to call the iterator, that simply loops over all the funcs in the array and calls each of the callbacks like the old method using indirect calling. The issue was discovered with a race between updating the funcs array and updating the static_call. The funcs array was updated first and then the static_call was updated. This is not an issue as long as the first element in the old array is the same as the first element in the new array. But that assumption is incorrect, because callbacks also have a priority field, and if there's a callback added that has a higher priority than the callback on the old array, then it will become the first callback in the new array. This means that it is possible to call the old callback with the new callback data element, which can cause a kernel panic. static_call = callback1() funcs[] = {callback1,data1}; callback2 has higher priority than callback1 CPU 1 CPU 2 ----- ----- new_funcs = {callback2,data2}, {callback1,data1} rcu_assign_pointer(tp->funcs, new_funcs); /* * Now tp->funcs has the new array * but the static_call still calls callback1 / it_func_ptr = tp->funcs [ new_funcs ] data = it_func_ptr->data [ data2 ] static_call(callback1, data); / Now callback1 is called with * callback2's data */ [ KERNEL PANIC ] update_static_call(iterator); To prevent this from happening, always switch the static_call to the iterator before assigning the tp->funcs to the new array. The iterator will always properly match the callback with its data. To trigger this bug: In one terminal: while :; do hackbench 50; done In another terminal echo 1 > /sys/kernel/tracing/events/sched/sched_waking/enable while :; do echo 1 > /sys/kernel/tracing/set_event_pid; sleep 0.5 echo 0 > /sys/kernel/tracing/set_event_pid; sleep 0.5 done And it doesn't take long to crash. This is because the set_event_pid adds a callback to the sched_waking tracepoint with a high priority, which will be called before the sched_waking trace event callback is called. Note, the removal to a single callback updates the array first, before changing the static_call to single callback, which is the proper order as the first element in the array is the same as what the static_call is being changed to. Link: https://lore.kernel.org/io-uring/4ebea8f0-58c9-e571-fd30-0ce4f6f09c70@samba.org/ Cc: stable@vger.kernel.org Fixes: `d25e37d89d` ("tracepoint: Optimize using static_call()") Reported-by: Stefan Metzmacher <metze@samba.org> tested-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-28 14:35:45 +02:00
Roman Skakun	8983766903	dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable} [ Upstream commit `40ac971eab` ] xen-swiotlb can use vmalloc backed addresses for dma coherent allocations and uses the common helpers. Properly handle them to unbreak Xen on ARM platforms. Fixes: `1b65c4e5a9` ("swiotlb-xen: use xen_alloc/free_coherent_pages") Signed-off-by: Roman Skakun <roman_skakun@epam.com> Reviewed-by: Andrii Anisov <andrii_anisov@epam.com> [hch: split the patch, renamed the helpers] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-28 14:35:38 +02:00
Nicolas Saenz Julienne	0ff2ea9d8f	timers: Fix get_next_timer_interrupt() with no timers pending [ Upstream commit `aebacb7f6c` ] `31cd0e119d` ("timers: Recalculate next timer interrupt only when necessary") subtly altered get_next_timer_interrupt()'s behaviour. The function no longer consistently returns KTIME_MAX with no timers pending. In order to decide if there are any timers pending we check whether the next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now. Unfortunately, the next expiry time and the timer base clock are no longer updated in unison. The former changes upon certain timer operations (enqueue, expire, detach), whereas the latter keeps track of jiffies as they move forward. Ultimately breaking the logic above. A simplified example: - Upon entering get_next_timer_interrupt() with: jiffies = 1 base->clk = 0; base->next_expiry = NEXT_TIMER_MAX_DELTA; 'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function returns KTIME_MAX. - 'base->clk' is updated to the jiffies value. - The next time we enter get_next_timer_interrupt(), taking into account no timer operations happened: base->clk = 1; base->next_expiry = NEXT_TIMER_MAX_DELTA; 'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function returns a valid expire time, which is incorrect. This ultimately might unnecessarily rearm sched's timer on nohz_full setups, and add latency to the system[1]. So, introduce 'base->timers_pending'[2], update it every time 'base->next_expiry' changes, and use it in get_next_timer_interrupt(). [1] See tick_nohz_stop_tick(). [2] A quick pahole check on x86_64 and arm64 shows it doesn't make 'struct timer_base' any bigger. Fixes: `31cd0e119d` ("timers: Recalculate next timer interrupt only when necessary") Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-28 14:35:37 +02:00
Daniel Borkmann	39f1735c81	bpf: Fix tail_call_reachable rejection for interpreter when jit failed [ Upstream commit `5dd0a6b858` ] During testing of `f263a81451` ("bpf: Track subprog poke descriptors correctly and fix use-after-free") under various failure conditions, for example, when jit_subprogs() fails and tries to clean up the program to be run under the interpreter, we ran into the following freeze: [...] #127/8 tailcall_bpf2bpf_3:FAIL [...] [ 92.041251] BUG: KASAN: slab-out-of-bounds in ___bpf_prog_run+0x1b9d/0x2e20 [ 92.042408] Read of size 8 at addr ffff88800da67f68 by task test_progs/682 [ 92.043707] [ 92.044030] CPU: 1 PID: 682 Comm: test_progs Tainted: G O 5.13.0-53301-ge6c08cb33a30-dirty #87 [ 92.045542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 92.046785] Call Trace: [ 92.047171] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.047773] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.048389] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.049019] ? ktime_get+0x117/0x130 [...] // few hundred [similar] lines more [ 92.659025] ? ktime_get+0x117/0x130 [ 92.659845] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.660738] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.661528] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.662378] ? print_usage_bug+0x50/0x50 [ 92.663221] ? print_usage_bug+0x50/0x50 [ 92.664077] ? bpf_ksym_find+0x9c/0xe0 [ 92.664887] ? ktime_get+0x117/0x130 [ 92.665624] ? kernel_text_address+0xf5/0x100 [ 92.666529] ? __kernel_text_address+0xe/0x30 [ 92.667725] ? unwind_get_return_address+0x2f/0x50 [ 92.668854] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.670185] ? ktime_get+0x117/0x130 [ 92.671130] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.672020] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.672860] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.675159] ? ktime_get+0x117/0x130 [ 92.677074] ? lock_is_held_type+0xd5/0x130 [ 92.678662] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.680046] ? ktime_get+0x117/0x130 [ 92.681285] ? __bpf_prog_run32+0x6b/0x90 [ 92.682601] ? __bpf_prog_run64+0x90/0x90 [ 92.683636] ? lock_downgrade+0x370/0x370 [ 92.684647] ? mark_held_locks+0x44/0x90 [ 92.685652] ? ktime_get+0x117/0x130 [ 92.686752] ? lockdep_hardirqs_on+0x79/0x100 [ 92.688004] ? ktime_get+0x117/0x130 [ 92.688573] ? __cant_migrate+0x2b/0x80 [ 92.689192] ? bpf_test_run+0x2f4/0x510 [ 92.689869] ? bpf_test_timer_continue+0x1c0/0x1c0 [ 92.690856] ? rcu_read_lock_bh_held+0x90/0x90 [ 92.691506] ? __kasan_slab_alloc+0x61/0x80 [ 92.692128] ? eth_type_trans+0x128/0x240 [ 92.692737] ? __build_skb+0x46/0x50 [ 92.693252] ? bpf_prog_test_run_skb+0x65e/0xc50 [ 92.693954] ? bpf_prog_test_run_raw_tp+0x2d0/0x2d0 [ 92.694639] ? __fget_light+0xa1/0x100 [ 92.695162] ? bpf_prog_inc+0x23/0x30 [ 92.695685] ? __sys_bpf+0xb40/0x2c80 [ 92.696324] ? bpf_link_get_from_fd+0x90/0x90 [ 92.697150] ? mark_held_locks+0x24/0x90 [ 92.698007] ? lockdep_hardirqs_on_prepare+0x124/0x220 [ 92.699045] ? finish_task_switch+0xe6/0x370 [ 92.700072] ? lockdep_hardirqs_on+0x79/0x100 [ 92.701233] ? finish_task_switch+0x11d/0x370 [ 92.702264] ? __switch_to+0x2c0/0x740 [ 92.703148] ? mark_held_locks+0x24/0x90 [ 92.704155] ? __x64_sys_bpf+0x45/0x50 [ 92.705146] ? do_syscall_64+0x35/0x80 [ 92.706953] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Turns out that the program rejection from `e411901c0b` ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") is buggy since env->prog->aux->tail_call_reachable is never true. Commit `ebf7d1f508` ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") added a tracker into check_max_stack_depth() which propagates the tail_call_reachable condition throughout the subprograms. This info is then assigned to the subprogram's func[i]->aux->tail_call_reachable. However, in the case of the rejection check upon JIT failure, env->prog->aux->tail_call_reachable is used. func[0]->aux->tail_call_reachable which represents the main program's information did not propagate this to the outer env->prog->aux, though. Add this propagation into check_max_stack_depth() where it needs to belong so that the check can be done reliably. Fixes: `ebf7d1f508` ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: `e411901c0b` ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/618c34e3163ad1a36b1e82377576a6081e182f25.1626123173.git.daniel@iogearbox.net Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-28 14:35:37 +02:00
Greg Kroah-Hartman	67e686fc73	Revert "bpf: Track subprog poke descriptors correctly and fix use-after-free" This reverts commit `a9f36bf361` which is commit `f263a81451` upstream. It breaks the Android KABI and is not needed for Android devices at this point in time. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id5d1b6407f9bd5228630b9a813e9799dbc448b96	2021-07-26 11:53:25 +02:00
Greg Kroah-Hartman	afe9ed0e13	This is the 5.10.53 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmD9WtYACgkQONu9yGCS aT6IkRAAnW25XGt27ELqFNo2vCXI38YFEyNpe0aN8y2A4X1MXu8EKNunQNFKYVt9 5sGrSjt30qy0A9pbeajqXfnGmCNe5R825lRCQia08/+b/qyfaF358DdAI59q5zd1 0IhInfBo+TJyLLxltt0f2yGQJfo2DfmRuOpr0EXrE7Mo7lTqDqrpmDbLl+fZ8Ugd SaXRZMHLsH18Q8XuFbbBnU57tgGsIys9ie397jf3jYZBpfnzWjfsKbM4BnBz3fAv bUnZ5P1G8Bwps30ES8Qpc6ZoWMYl6MkEPNc+4f57+amvJME45MePUvgDRrQV3Z0P bveovBal8c13aO46DIa+bW0Wevy1pR/OcNQ78r8I1Sq4gp6rIp0oj6Q3rEPNfJqf 8s1/9FNbZVRkz/TA2vtILGhIlJfMm7QcESM8WA1kA5EN4er35DGlvN69kpspxhoh FBrgmAdMQQgmiCtLR1QllP5PVDaTNQlsXR3yeASc54UrEwuyFe/ELE5YEb7vvXIQ u7JF+zYbxVFqyUCtyoBEy6eoAe/jGAI/kiCvifitR33289lOItz5n64CQzCex54R WR1wX+rl9Jf/SWHuBh/JgCqS6x8O2+1WfpABmEWwT6m1leC6v0LG0CJoywGdHf1I +IlIsThsoPIMd0xFzd7GlV4+ot2ZHsxyhY7dDnkIfCZtFLKubuU= =PNDL -----END PGP SIGNATURE----- Merge 5.10.53 into android12-5.10-lts Changes in 5.10.53 ARM: dts: gemini: rename mdio to the right name ARM: dts: gemini: add device_type on pci ARM: dts: rockchip: Fix thermal sensor cells o rk322x ARM: dts: rockchip: fix pinctrl sleep nodename for rk3036-kylin and rk3288 arm64: dts: rockchip: fix pinctrl sleep nodename for rk3399.dtsi ARM: dts: rockchip: Fix the timer clocks order ARM: dts: rockchip: Fix IOMMU nodes properties on rk322x ARM: dts: rockchip: Fix power-controller node names for rk3066a ARM: dts: rockchip: Fix power-controller node names for rk3188 ARM: dts: rockchip: Fix power-controller node names for rk3288 arm64: dts: rockchip: Fix power-controller node names for px30 arm64: dts: rockchip: Fix power-controller node names for rk3328 arm64: dts: rockchip: Fix power-controller node names for rk3399 reset: ti-syscon: fix to_ti_syscon_reset_data macro ARM: brcmstb: dts: fix NAND nodes names ARM: Cygnus: dts: fix NAND nodes names ARM: NSP: dts: fix NAND nodes names ARM: dts: BCM63xx: Fix NAND nodes names ARM: dts: Hurricane 2: Fix NAND nodes names ARM: dts: imx6: phyFLEX: Fix UART hardware flow control ARM: imx: pm-imx5: Fix references to imx5_cpu_suspend_info arm64: dts: rockchip: fix regulator-gpio states array ARM: dts: ux500: Fix interrupt cells ARM: dts: ux500: Rename gpio-controller node ARM: dts: ux500: Fix orientation of accelerometer ARM: dts: imx6dl-riotboard: configure PHY clock and set proper EEE value rtc: mxc_v2: add missing MODULE_DEVICE_TABLE kbuild: sink stdout from cmd for silent build ARM: dts: am57xx-cl-som-am57x: fix ti,no-reset-on-init flag for gpios ARM: dts: am437x-gp-evm: fix ti,no-reset-on-init flag for gpios ARM: dts: am335x: fix ti,no-reset-on-init flag for gpios ARM: dts: OMAP2+: Replace underscores in sub-mailbox node names arm64: dts: ti: k3-am654x/j721e/j7200-common-proc-board: Fix MCU_RGMII1_TXC direction ARM: tegra: wm8903: Fix polarity of headphones-detection GPIO in device-trees ARM: tegra: nexus7: Correct 3v3 regulator GPIO of PM269 variant arm64: dts: qcom: sc7180: Move rmtfs memory region ARM: dts: stm32: Remove extra size-cells on dhcom-pdk2 ARM: dts: stm32: Fix touchscreen node on dhcom-pdk2 ARM: dts: stm32: fix stm32mp157c-odyssey card detect pin ARM: dts: stm32: fix gpio-keys node on STM32 MCU boards ARM: dts: stm32: fix RCC node name on stm32f429 MCU ARM: dts: stm32: fix timer nodes on STM32 MCU to prevent warnings memory: tegra: Fix compilation warnings on 64bit platforms firmware: arm_scmi: Add SMCCC discovery dependency in Kconfig firmware: arm_scmi: Fix the build when CONFIG_MAILBOX is not selected ARM: dts: bcm283x: Fix up MMC node names ARM: dts: bcm283x: Fix up GPIO LED node names arm64: dts: juno: Update SCPI nodes as per the YAML schema ARM: dts: rockchip: fix supply properties in io-domains nodes ARM: dts: stm32: fix i2c node name on stm32f746 to prevent warnings ARM: dts: stm32: move stmmac axi config in ethernet node on stm32mp15 ARM: dts: stm32: fix the Odyssey SoM eMMC VQMMC supply ARM: dts: stm32: Drop unused linux,wakeup from touchscreen node on DHCOM SoM ARM: dts: stm32: Rename spi-flash/mx66l51235l@N to flash@N on DHCOM SoM ARM: dts: stm32: fix stpmic node for stm32mp1 boards ARM: OMAP2+: Block suspend for am3 and am4 if PM is not configured soc/tegra: fuse: Fix Tegra234-only builds firmware: tegra: bpmp: Fix Tegra234-only builds arm64: dts: ls208xa: remove bus-num from dspi node arm64: dts: imx8mq: assign PCIe clocks thermal/core: Correct function name thermal_zone_device_unregister() thermal/drivers/rcar_gen3_thermal: Do not shadow rcar_gen3_ths_tj_1 thermal/drivers/imx_sc: Add missing of_node_put for loop iteration thermal/drivers/sprd: Add missing of_node_put for loop iteration kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set arch/arm64/boot/dts/marvell: fix NAND partitioning scheme rtc: max77686: Do not enforce (incorrect) interrupt trigger type scsi: aic7xxx: Fix unintentional sign extension issue on left shift of u8 scsi: libsas: Add LUN number check in .slave_alloc callback scsi: libfc: Fix array index out of bound exception scsi: qedf: Add check to synchronize abort and flush sched/fair: Fix CFS bandwidth hrtimer expiry type perf/x86/intel/uncore: Clean up error handling path of iio mapping thermal/core/thermal_of: Stop zone device before unregistering it s390/traps: do not test MONITOR CALL without CONFIG_BUG s390: introduce proper type handling call_on_stack() macro cifs: prevent NULL deref in cifs_compose_mount_options() firmware: turris-mox-rwtm: add marvell,armada-3700-rwtm-firmware compatible string arm64: dts: marvell: armada-37xx: move firmware node to generic dtsi file Revert "swap: fix do_swap_page() race with swapoff" f2fs: Show casefolding support only when supported mm/thp: simplify copying of huge zero page pmd when fork mm/userfaultfd: fix uffd-wp special cases for fork() mm/page_alloc: fix memory map initialization for descending nodes usb: cdns3: Enable TDL_CHK only for OUT ep net: bcmgenet: ensure EXT_ENERGY_DET_MASK is clear net: dsa: mv88e6xxx: enable .port_set_policy() on Topaz net: dsa: mv88e6xxx: use correct .stats_set_histogram() on Topaz net: dsa: mv88e6xxx: enable .rmu_disable() on Topaz net: dsa: mv88e6xxx: enable devlink ATU hash param for Topaz net: ipv6: fix return value of ip6_skb_dst_mtu netfilter: ctnetlink: suspicious RCU usage in ctnetlink_dump_helpinfo net/sched: act_ct: fix err check for nf_conntrack_confirm vmxnet3: fix cksum offload issues for tunnels with non-default udp ports net/sched: act_ct: remove and free nf_table callbacks net: bridge: sync fdb to new unicast-filtering ports net: netdevsim: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops net: bcmgenet: Ensure all TX/RX queues DMAs are disabled net: ip_tunnel: fix mtu calculation for ETHER tunnel devices net: moxa: fix UAF in moxart_mac_probe net: qcom/emac: fix UAF in emac_remove net: ti: fix UAF in tlan_remove_one net: send SYNACK packet with accepted fwmark net: validate lwtstate->data before returning from skb_tunnel_info() Revert "mm/shmem: fix shmem_swapin() race with swapoff" net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() net: fddi: fix UAF in fza_probe dma-buf/sync_file: Don't leak fences on merge failure kbuild: do not suppress Kconfig prompts for silent build ARM: dts: aspeed: Fix AST2600 machines line names ARM: dts: tacoma: Add phase corrections for eMMC tcp: consistently disable header prediction for mptcp tcp: annotate data races around tp->mtu_info tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized ipv6: tcp: drop silly ICMPv6 packet too big messages tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path tools: bpf: Fix error in 'make -C tools/ bpf_install' bpftool: Properly close va_list 'ap' by va_end() on error bpf: Track subprog poke descriptors correctly and fix use-after-free perf test bpf: Free obj_buf drm/panel: nt35510: Do not fail if DSI read fails udp: annotate data races around unix_sk(sk)->gso_size Linux 5.10.53 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iac8fe9cd2abb2d8dd993967205a97c89f01f3647	2021-07-25 15:37:14 +02:00
John Fastabend	a9f36bf361	bpf: Track subprog poke descriptors correctly and fix use-after-free commit `f263a81451` upstream. Subprograms are calling map_poke_track(), but on program release there is no hook to call map_poke_untrack(). However, on program release, the aux memory (and poke descriptor table) is freed even though we still have a reference to it in the element list of the map aux data. When we run map_poke_run(), we then end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run(): [...] [ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e [ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337 [ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399 [ 402.824715] Call Trace: [ 402.824719] dump_stack+0x93/0xc2 [ 402.824727] print_address_description.constprop.0+0x1a/0x140 [ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824744] kasan_report.cold+0x7c/0xd8 [ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824757] prog_array_map_poke_run+0xc2/0x34e [ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0 [...] The elements concerned are walked as follows: for (i = 0; i < elem->aux->size_poke_tab; i++) { poke = &elem->aux->poke_tab[i]; [...] The access to size_poke_tab is a 4 byte read, verified by checking offsets in the KASAN dump: [ 402.825004] The buggy address belongs to the object at ffff8881905a7800 which belongs to the cache kmalloc-1k of size 1024 [ 402.825008] The buggy address is located 320 bytes inside of 1024-byte region [ffff8881905a7800, ffff8881905a7c00) The pahole output of bpf_prog_aux: struct bpf_prog_aux { [...] /* --- cacheline 5 boundary (320 bytes) --- / u32 size_poke_tab; / 320 4 */ [...] In general, subprograms do not necessarily manage their own data structures. For example, BTF func_info and linfo are just pointers to the main program structure. This allows reference counting and cleanup to be done on the latter which simplifies their management a bit. The aux->poke_tab struct, however, did not follow this logic. The initial proposed fix for this use-after-free bug further embedded poke data tracking into the subprogram with proper reference counting. However, Daniel and Alexei questioned why we were treating these objects special; I agree, its unnecessary. The fix here removes the per subprogram poke table allocation and map tracking and instead simply points the aux->poke_tab pointer at the main programs poke table. This way, map tracking is simplified to the main program and we do not need to manage them per subprogram. This also means, bpf_prog_free_deferred(), which unwinds the program reference counting and kfrees objects, needs to ensure that we don't try to double free the poke_tab when free'ing the subprog structures. This is easily solved by NULL'ing the poke_tab pointer. The second detail is to ensure that per subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do this, we add a pointer in the poke structure to point at the subprogram value so JITs can easily check while walking the poke_tab structure if the current entry belongs to the current program. The aux pointer is stable and therefore suitable for such comparison. On the jit_subprogs() error path, we omit cleaning up the poke->aux field because these are only ever referenced from the JIT side, but on error we will never make it to the JIT, so its fine to leave them dangling. Removing these pointers would complicate the error path for no reason. However, we do need to untrack all poke descriptors from the main program as otherwise they could race with the freeing of JIT memory from the subprograms. Lastly, `a748c6975d` ("bpf: propagate poke descriptors to subprograms") had an off-by-one on the subprogram instruction index range check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'. However, subprog_end is the next subprogram's start instruction. Fixes: `a748c6975d` ("bpf: propagate poke descriptors to subprograms") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210707223848.14580-2-john.fastabend@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-25 14:36:21 +02:00
Odin Ugedal	892387e761	sched/fair: Fix CFS bandwidth hrtimer expiry type [ Upstream commit `72d0ad7cb5` ] The time remaining until expiry of the refresh_timer can be negative. Casting the type to an unsigned 64-bit value will cause integer underflow, making the runtime_refresh_within return false instead of true. These situations are rare, but they do happen. This does not cause user-facing issues or errors; other than possibly unthrottling cfs_rq's using runtime from the previous period(s), making the CFS bandwidth enforcement less strict in those (special) situations. Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Link: https://lore.kernel.org/r/20210629121452.18429-1-odin@uged.al Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-25 14:36:17 +02:00
Greg Kroah-Hartman	c0dd8de281	Merge branch 'android12-5.10' into `android12-5.10-lts` Sync up with android12-5.10 for the following commits: `90732d426e` UPSTREAM: mm/compaction: correct deferral logic for proactive compaction `d0a88ae479` ANDROID: Enable GKI Dr. No Enforcement `e6a59da61e` ANDROID: media: v4l2-core: Fix deadlock in vendor hook `fdb1cfe2d3` FROMGIT: dma_buf: remove dmabuf sysfs teardown before release `1dc0dd2573` ANDROID: update mtk symbol list `a669748346` UPSTREAM: mfd: syscon: Free the allocated name field of struct regmap_config `cffe67a351` ANDROID: Give UIC cmd timeout a larger value `3bca0b5344` ANDROID: binder: retry security_secid_to_secctx() `9fdfeda4c9` ANDROID: update new gki symbol for mtk `1dd167be9f` ANDROID: mm, kasan: fix for "integrate page_alloc init with HW_TAGS" `f215850ea0` UPSTREAM: kasan: fix conflict with page poisoning `a3da917661` ANDROID: GKI: Export two more mm symbols for GKI `0497b9601b` ANDROID: Update symbol list for mtk `0f27e1d317` FROMLIST: kfence: skip all GFP_ZONEMASK allocations `8478d8dc53` FROMLIST: kfence: move the size check to the beginning of __kfence_alloc() `ddabf14bea` ANDROID: ABI: update allowed list for galaxy `11cec52238` FROMGIT: f2fs: add sysfs nodes to get GC info for each GC mode `fdc46110cb` ANDROID: abi_gki_aarch64_qcom: Add android_debug_for_each_module `0bb433e014` ANDROID: debug_symbols: Add android_debug_for_each_module `a0429aa1d0` ANDROID: ABI: Update ABI for symbol list updates `c68e5ca9f8` ANDROID: GKI: Update symbols to symbol list `aac5a77959` ANDROID: Update symbol list for mtk `42fc1faf44` UPSTREAM: block, bfq: set next_rq to waker_bfqq->next_rq in waker injection `e37cc8a09f` UPSTREAM: mm/mremap: hold the rmap lock in write mode when moving page table entries. `9b136eab76` ANDROID: pstore/ram: Add backward compatibility for ramoops reserved region `df97297651` ANDROID: Update symbol list for mtk `4322bb07e8` ANDROID: vendor_hooks: Modify the function name `007213b1d0` BACKPORT: FROMLIST: kasan: add memzero int for unaligned size at DEBUG `7acbce0bf4` BACKPORT: FROMLIST: mm: move helper to check slub_debug_enabled `d6eeae59b5` ANDROID: ABI: initial update allowed list for galaxy Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6d28f1be3cbd8a1fed86d2d647e6c2bb55221b0a	2021-07-22 13:32:27 +02:00
Matthias Maennich	d0a88ae479	ANDROID: Enable GKI Dr. No Enforcement This effectively locks down OWNERS approval to a small group to guard the code base against unintentional breakages. Bug: 194314089 Signed-off-by: Matthias Maennich <maennich@google.com> Change-Id: Ifd1ea97639a622320ea83f901f6451e2e52b38d4	2021-07-21 20:51:47 +01:00
Greg Kroah-Hartman	51ab149d5f	This is the 5.10.52 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmD22M4ACgkQONu9yGCS aT6MbxAAmAmm5sj9fMOk4ijcRz+CbpTZZdSS5VT3BbJJrTrZ1qk9KO0/uAhY1SXe uuP8Q7BD2llWQ6hQiJj7vZnXKiJsBrngH//4QFa72TjXxa1ENPcw/UceLxMgT6e1 0r74mcikr4f2A6kFoAexyteHfzm0D+8ZfgzJtTKPbBPqCqRIaZFO84xyTclvNJ6h 0Xx2dSNWDU1dXvUe+43ggaZUFNe4SHGupgsc3GSsxPkyKTzrXMGsRh4m3p94t4py WQULkq/57JaeJwQcxWMOPqLIHF/IWtZSfr+YHx+q1zvThK9uUsAd3e8B8r6FJswV xmDIDkCw3VIRxTiYhYKFqiDZOanlxYOtWXCXAyI+YV/JIT4NGHapg91qEq5PR9Ti tkSmAbwaON9LzshzWuKjDHMDJO6x/i3YJmaXuVgBciD56xIvCMKiiardpey2574o M5m8fLcXQXQBPY/WrusAn312/PanUxgstn6EJgInq0M4GoMj1aUb+W8luQfqq0G5 gYNjzTCEQErTITAyha9SLQvBVH3snfHpKoDL669HK6woq2GH2K48YIKxCam5trlH 9P9LCXBhSHe19/r2qLotcMW7XmuSsu77FetZtxFJQeWVqRokmyHzZnTbeoQAxhpa xGkvnbnyLettCCtm/JAmwnASqIYnUxuvgj7zcPb5PGnhwk1rfXU= =b0QD -----END PGP SIGNATURE----- Merge 5.10.52 into android12-5.10-lts Changes in 5.10.52 certs: add 'x509_revocation_list' to gitignore cifs: handle reconnect of tcon when there is no cached dfs referral KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() scsi: core: Fix bad pointer dereference when ehandler kthread is invalid scsi: zfcp: Report port fc_security as unknown early during remote cable pull tracing: Do not reference char * as a string in histograms drm/i915/gtt: drop the page table optimisation drm/i915/gt: Fix -EDEADLK handling regression cgroup: verify that source is a string fbmem: Do not delete the mode that is still in use drm/dp_mst: Do not set proposed vcpi directly drm/dp_mst: Avoid to mess up payload table by ports in stale topology drm/dp_mst: Add missing drm parameters to recently added call to drm_dbg_kms() drm/ingenic: Fix non-OSD mode drm/ingenic: Switch IPU plane to type OVERLAY Revert "drm/ast: Remove reference to struct drm_device.pdev" net: bridge: multicast: fix PIM hello router port marking race net: bridge: multicast: fix MRD advertisement router port marking race leds: tlc591xx: fix return value check in tlc591xx_probe() ASoC: Intel: sof_sdw: add mutual exclusion between PCH DMIC and RT715 dmaengine: fsl-qdma: check dma_set_mask return value scsi: arcmsr: Fix the wrong CDB payload report to IOP srcu: Fix broken node geometry after early ssp init rcu: Reject RCU_LOCKDEP_WARN() false positives tty: serial: fsl_lpuart: fix the potential risk of division or modulo by zero serial: fsl_lpuart: disable DMA for console and fix sysrq misc/libmasm/module: Fix two use after free in ibmasm_init_one misc: alcor_pci: fix null-ptr-deref when there is no PCI bridge ASoC: intel/boards: add missing MODULE_DEVICE_TABLE partitions: msdos: fix one-byte get_unaligned() iio: gyro: fxa21002c: Balance runtime pm + use pm_runtime_resume_and_get(). iio: magn: bmc150: Balance runtime pm + use pm_runtime_resume_and_get() ALSA: usx2y: Avoid camelCase ALSA: usx2y: Don't call free_pages_exact() with NULL address Revert "ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro" usb: common: usb-conn-gpio: fix NULL pointer dereference of charger w1: ds2438: fixing bug that would always get page0 scsi: arcmsr: Fix doorbell status being updated late on ARC-1886 scsi: hisi_sas: Propagate errors in interrupt_init_v1_hw() scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLs scsi: core: Cap scsi_host cmd_per_lun at can_queue ALSA: ac97: fix PM reference leak in ac97_bus_remove() tty: serial: 8250: serial_cs: Fix a memory leak in error handling path scsi: mpt3sas: Fix deadlock while cancelling the running firmware event scsi: core: Fixup calling convention for scsi_mode_sense() scsi: scsi_dh_alua: Check for negative result value fs/jfs: Fix missing error code in lmLogInit() scsi: megaraid_sas: Fix resource leak in case of probe failure scsi: megaraid_sas: Early detection of VD deletion through RaidMap update scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs scsi: iscsi: Add iscsi_cls_conn refcount helpers scsi: iscsi: Fix conn use after free during resets scsi: iscsi: Fix shost->max_id use scsi: qedi: Fix null ref during abort handling scsi: qedi: Fix race during abort timeouts scsi: qedi: Fix TMF session block/unblock use scsi: qedi: Fix cleanup session block/unblock use mfd: da9052/stmpe: Add and modify MODULE_DEVICE_TABLE mfd: cpcap: Fix cpcap dmamask not set warnings ASoC: img: Fix PM reference leak in img_i2s_in_probe() fsi: Add missing MODULE_DEVICE_TABLE serial: tty: uartlite: fix console setup s390/sclp_vt220: fix console name to match device s390: disable SSP when needed selftests: timers: rtcpie: skip test if default RTC device does not exist ALSA: sb: Fix potential double-free of CSP mixer elements powerpc/ps3: Add dma_mask to ps3_dma_region iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation ASoC: soc-pcm: fix the return value in dpcm_apply_symmetry() gpio: zynq: Check return value of pm_runtime_get_sync gpio: zynq: Check return value of irq_get_irq_data scsi: storvsc: Correctly handle multiple flags in srb_status ALSA: ppc: fix error return code in snd_pmac_probe() selftests/powerpc: Fix "no_handler" EBB selftest gpio: pca953x: Add support for the On Semi pca9655 powerpc/mm/book3s64: Fix possible build error ASoC: soc-core: Fix the error return code in snd_soc_of_parse_audio_routing() habanalabs/gaudi: set the correct cpu_id on MME2_QM failure habanalabs: remove node from list before freeing the node s390/processor: always inline stap() and __load_psw_mask() s390/ipl_parm: fix program check new psw handling s390/mem_detect: fix diag260() program check new psw handling s390/mem_detect: fix tprot() program check new psw handling Input: hideep - fix the uninitialized use in hideep_nvm_unlock() ALSA: bebob: add support for ToneWeal FW66 ALSA: usb-audio: scarlett2: Fix 18i8 Gen 2 PCM Input count ALSA: usb-audio: scarlett2: Fix data_mutex lock ALSA: usb-audio: scarlett2: Fix scarlett2_*_ctl_put() return values usb: gadget: f_hid: fix endianness issue with descriptors usb: gadget: hid: fix error return code in hid_bind() powerpc/boot: Fixup device-tree on little endian ASoC: Intel: kbl_da7219_max98357a: shrink platform_id below 20 characters backlight: lm3630a: Fix return code of .update_status() callback ALSA: hda: Add IRQ check for platform_get_irq() ALSA: usb-audio: scarlett2: Fix 6i6 Gen 2 line out descriptions ALSA: firewire-motu: fix detection for S/PDIF source on optical interface in v2 protocol leds: turris-omnia: add missing MODULE_DEVICE_TABLE staging: rtl8723bs: fix macro value for 2.4Ghz only device intel_th: Wait until port is in reset before programming it i2c: core: Disable client irq on reboot/shutdown phy: intel: Fix for warnings due to EMMC clock 175Mhz change in FIP lib/decompress_unlz4.c: correctly handle zero-padding around initrds. kcov: add __no_sanitize_coverage to fix noinstr for all architectures power: supply: sc27xx: Add missing MODULE_DEVICE_TABLE power: supply: sc2731_charger: Add missing MODULE_DEVICE_TABLE pwm: spear: Don't modify HW state in .remove callback PCI: ftpci100: Rename macro name collision power: supply: ab8500: Avoid NULL pointers PCI: hv: Fix a race condition when removing the device power: supply: max17042: Do not enforce (incorrect) interrupt trigger type power: reset: gpio-poweroff: add missing MODULE_DEVICE_TABLE ARM: 9087/1: kprobes: test-thumb: fix for LLVM_IAS=1 PCI/P2PDMA: Avoid pci_get_slot(), which may sleep NFSv4: Fix delegation return in cases where we have to retry PCI: pciehp: Ignore Link Down/Up caused by DPC watchdog: Fix possible use-after-free in wdt_startup() watchdog: sc520_wdt: Fix possible use-after-free in wdt_turnoff() watchdog: Fix possible use-after-free by calling del_timer_sync() watchdog: imx_sc_wdt: fix pretimeout watchdog: iTCO_wdt: Account for rebooting on second timeout x86/fpu: Return proper error codes from user access functions remoteproc: core: Fix cdev remove and rproc del PCI: tegra: Add missing MODULE_DEVICE_TABLE orangefs: fix orangefs df output. ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty drm/gma500: Add the missed drm_gem_object_put() in psb_user_framebuffer_create() NFS: nfs_find_open_context() may only select open files power: supply: charger-manager: add missing MODULE_DEVICE_TABLE power: supply: ab8500: add missing MODULE_DEVICE_TABLE drm/amdkfd: fix sysfs kobj leak pwm: img: Fix PM reference leak in img_pwm_enable() pwm: tegra: Don't modify HW state in .remove callback ACPI: AMBA: Fix resource name in /proc/iomem ACPI: video: Add quirk for the Dell Vostro 3350 PCI: rockchip: Register IRQ handlers after device and data are ready virtio-blk: Fix memory leak among suspend/resume procedure virtio_net: Fix error handling in virtnet_restore() virtio_console: Assure used length from device is limited f2fs: atgc: fix to set default age threshold NFSD: Fix TP_printk() format specifier in nfsd_clid_class x86/signal: Detect and prevent an alternate signal stack overflow f2fs: add MODULE_SOFTDEP to ensure crc32 is included in the initramfs f2fs: compress: fix to disallow temp extension remoteproc: k3-r5: Fix an error message PCI/sysfs: Fix dsm_label_utf16s_to_utf8s() buffer overrun power: supply: rt5033_battery: Fix device tree enumeration NFSv4: Initialise connection to the server in nfs4_alloc_client() NFSv4: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT misc: alcor_pci: fix inverted branch condition um: fix error return code in slip_open() um: fix error return code in winch_tramp() ubifs: Fix off-by-one error ubifs: journal: Fix error return code in ubifs_jnl_write_inode() watchdog: aspeed: fix hardware timeout calculation watchdog: jz4740: Fix return value check in jz4740_wdt_probe() SUNRPC: prevent port reuse on transports which don't request it. nfs: fix acl memory leak of posix_acl_create() ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode PCI: iproc: Fix multi-MSI base vector number allocation PCI: iproc: Support multi-MSI only on uniprocessor kernel f2fs: fix to avoid adding tab before doc section x86/fpu: Fix copy_xstate_to_kernel() gap handling x86/fpu: Limit xstate copy size in xstateregs_set() PCI: intel-gw: Fix INTx enable pwm: imx1: Don't disable clocks at device remove time PCI: tegra194: Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift vdpa/mlx5: Fix umem sizes assignments on VQ create vdpa/mlx5: Fix possible failure in umem size calculation virtio_net: move tx vq operation under tx queue lock nvme-tcp: can't set sk_user_data without write_lock nfsd: Reduce contention for the nfsd_file nf_rwsem ALSA: isa: Fix error return code in snd_cmi8330_probe() vdpa/mlx5: Clear vq ready indication upon device reset NFSv4/pnfs: Fix the layout barrier update NFSv4/pnfs: Fix layoutget behaviour after invalidation NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times hexagon: handle {,SOFT}IRQENTRY_TEXT in linker script hexagon: use common DISCARDS macro ARM: dts: gemini-rut1xx: remove duplicate ethernet node reset: RESET_BRCMSTB_RESCAL should depend on ARCH_BRCMSTB reset: RESET_INTEL_GW should depend on X86 reset: a10sr: add missing of_match_table reference ARM: exynos: add missing of_node_put for loop iteration ARM: dts: exynos: fix PWM LED max brightness on Odroid XU/XU3 ARM: dts: exynos: fix PWM LED max brightness on Odroid HC1 ARM: dts: exynos: fix PWM LED max brightness on Odroid XU4 memory: stm32-fmc2-ebi: add missing of_node_put for loop iteration memory: atmel-ebi: add missing of_node_put for loop iteration reset: brcmstb: Add missing MODULE_DEVICE_TABLE memory: pl353: Fix error return code in pl353_smc_probe() ARM: dts: sun8i: h3: orangepi-plus: Fix ethernet phy-mode rtc: fix snprintf() checking in is_rtc_hctosys() arm64: dts: renesas: v3msk: Fix memory size ARM: dts: r8a7779, marzen: Fix DU clock names arm64: dts: ti: j7200-main: Enable USB2 PHY RX sensitivity workaround arm64: dts: renesas: Add missing opp-suspend properties arm64: dts: renesas: r8a7796[01]: Fix OPP table entry voltages ARM: dts: stm32: Connect PHY IRQ line on DH STM32MP1 SoM ARM: dts: stm32: Rework LAN8710Ai PHY reset on DHCOM SoM arm64: dts: qcom: trogdor: Add no-hpd to DSI bridge node firmware: tegra: Fix error return code in tegra210_bpmp_init() firmware: arm_scmi: Reset Rx buffer to max size during async commands dt-bindings: i2c: at91: fix example for scl-gpios ARM: dts: BCM5301X: Fixup SPI binding reset: bail if try_module_get() fails arm64: dts: renesas: r8a779a0: Drop power-domains property from GIC node arm64: dts: ti: k3-j721e-main: Fix external refclk input to SERDES memory: fsl_ifc: fix leak of IO mapping on probe failure memory: fsl_ifc: fix leak of private memory on probe failure arm64: dts: allwinner: a64-sopine-baseboard: change RGMII mode to TXID ARM: dts: dra7: Fix duplicate USB4 target module node ARM: dts: am335x: align ti,pindir-d0-out-d1-in property with dt-shema ARM: dts: am437x: align ti,pindir-d0-out-d1-in property with dt-shema thermal/drivers/sprd: Add missing MODULE_DEVICE_TABLE ARM: dts: imx6q-dhcom: Fix ethernet reset time properties ARM: dts: imx6q-dhcom: Fix ethernet plugin detection problems ARM: dts: imx6q-dhcom: Add gpios pinctrl for i2c bus recovery thermal/drivers/rcar_gen3_thermal: Fix coefficient calculations firmware: turris-mox-rwtm: fix reply status decoding function firmware: turris-mox-rwtm: report failures better firmware: turris-mox-rwtm: fail probing when firmware does not support hwrng firmware: turris-mox-rwtm: show message about HWRNG registration arm64: dts: rockchip: Re-add regulator-boot-on, regulator-always-on for vdd_gpu on rk3399-roc-pc arm64: dts: rockchip: Re-add regulator-always-on for vcc_sdio for rk3399-roc-pc scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe() sched/uclamp: Ignore max aggregation if rq is idle jump_label: Fix jump_label_text_reserved() vs __init static_call: Fix static_call_text_reserved() vs __init mips: always link byteswap helpers into decompressor mips: disable branch profiling in boot/decompress.o MIPS: vdso: Invalid GIC access through VDSO scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg() seq_file: disallow extremely large seq buffer allocations Linux 5.10.52 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic1b04661728db8b0e060ca6935783e15a22210da	2021-07-20 16:36:53 +02:00
Peter Zijlstra	53c5c2496f	static_call: Fix static_call_text_reserved() vs __init [ Upstream commit `2bee6d16e4` ] It turns out that static_call_text_reserved() was reporting __init text as being reserved past the time when the __init text was freed and re-used. This is mostly harmless and will at worst result in refusing a kprobe. Fixes: `6333e8f73b` ("static_call: Avoid kprobes on inline static_call()s") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210628113045.106211657@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:05:58 +02:00
Peter Zijlstra	59ae35884c	jump_label: Fix jump_label_text_reserved() vs __init [ Upstream commit `9e667624c2` ] It turns out that jump_label_text_reserved() was reporting __init text as being reserved past the time when the __init text was freed and re-used. For a long time, this resulted in, at worst, not being able to kprobe text that happened to land at the re-used address. However a recent commit `e7bf1ba97a` ("jump_label, x86: Emit short JMP") made it a fatal mistake because it now needs to read the instruction in order to determine the conflict -- an instruction that's no longer there. Fixes: `4c3ef6d793` ("jump label: Add jump_label_text_reserved() to reserve jump points") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210628113045.045141693@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:05:58 +02:00
Xuewen Yan	143a6b8ec5	sched/uclamp: Ignore max aggregation if rq is idle [ Upstream commit `3e1493f463` ] When a task wakes up on an idle rq, uclamp_rq_util_with() would max aggregate with rq value. But since there is no task enqueued yet, the values are stale based on the last task that was running. When the new task actually wakes up and enqueued, then the rq uclamp values should reflect that of the newly woken up task effective uclamp values. This is a problem particularly for uclamp_max because it default to 1024. If a task p with uclamp_max = 512 wakes up, then max aggregation would ignore the capping that should apply when this task is enqueued, which is wrong. Fix that by ignoring max aggregation if the rq is idle since in that case the effective uclamp value of the rq will be the ones of the task that will wake up. Fixes: `9d20ad7dfc` ("sched/uclamp: Add uclamp_util_with()") Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> [qias: Changelog] Reviewed-by: Qais Yousef <qais.yousef@arm.com> Link: https://lore.kernel.org/r/20210630141204.8197-1-xuewen.yan94@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:05:58 +02:00
Paul E. McKenney	35a35909ec	rcu: Reject RCU_LOCKDEP_WARN() false positives [ Upstream commit `3066820034` ] If another lockdep report runs concurrently with an RCU lockdep report from RCU_LOCKDEP_WARN(), the following sequence of events can occur: 1. debug_lockdep_rcu_enabled() sees that lockdep is enabled when called from (say) synchronize_rcu(). 2. Lockdep is disabled by a concurrent lockdep report. 3. debug_lockdep_rcu_enabled() evaluates its lockdep-expression argument, for example, lock_is_held(&rcu_bh_lock_map). 4. Because lockdep is now disabled, lock_is_held() plays it safe and returns the constant 1. 5. But in this case, the constant 1 is not safe, because invoking synchronize_rcu() under rcu_read_lock_bh() is disallowed. 6. debug_lockdep_rcu_enabled() wrongly invokes lockdep_rcu_suspicious(), resulting in a false-positive splat. This commit therefore changes RCU_LOCKDEP_WARN() to check debug_lockdep_rcu_enabled() after checking the lockdep expression, so that any "safe" returns from lock_is_held() are rejected by debug_lockdep_rcu_enabled(). This requires memory ordering, which is supplied by READ_ONCE(debug_locks). The resulting volatile accesses prevent the compiler from reordering and the fact that only one variable is being accessed prevents the underlying hardware from reordering. The combination works for IA64, which can reorder reads to the same location, but this is defeated by the volatile accesses, which compile to load instructions that provide ordering. Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com Reported-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:05:38 +02:00
Frederic Weisbecker	23597afbe0	srcu: Fix broken node geometry after early ssp init [ Upstream commit `b5befe842e` ] An srcu_struct structure that is initialized before rcu_init_geometry() will have its srcu_node hierarchy based on CONFIG_NR_CPUS. Once rcu_init_geometry() is called, this hierarchy is compressed as needed for the actual maximum number of CPUs for this system. Later on, that srcu_struct structure is confused, sometimes referring to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead to the new num_possible_cpus() hierarchy. For example, each of its ->mynode fields continues to reference the original leaf rcu_node structures, some of which might no longer exist. On the other hand, srcu_for_each_node_breadth_first() traverses to the new node hierarchy. There are at least two bad possible outcomes to this: 1) a) A callback enqueued early on an srcu_data structure (call it *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf (say 3 levels). b) The grace period ends after rcu_init_geometry() shrinks the nodes level to a single one. srcu_gp_end() walks through the new srcu_node hierarchy without ever reaching the old leaves so the callback is never executed. This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests verification never completes and the boot hangs: [ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds. [ 5413.147564] Not tainted 5.12.0-rc4+ #28 [ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000 [ 5413.168099] Call Trace: [ 5413.170555] __schedule+0x36c/0x930 [ 5413.174057] ? wait_for_completion+0x88/0x110 [ 5413.178423] schedule+0x46/0xf0 [ 5413.181575] schedule_timeout+0x284/0x380 [ 5413.185591] ? wait_for_completion+0x88/0x110 [ 5413.189957] ? mark_held_locks+0x61/0x80 [ 5413.193882] ? mark_held_locks+0x61/0x80 [ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50 [ 5413.202173] ? wait_for_completion+0x88/0x110 [ 5413.206535] wait_for_completion+0xb4/0x110 [ 5413.210724] ? srcu_torture_stats_print+0x110/0x110 [ 5413.215610] srcu_barrier+0x187/0x200 [ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50 [ 5413.224244] ? rdinit_setup+0x2b/0x2b [ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40 [ 5413.232700] do_one_initcall+0x63/0x310 [ 5413.236541] ? rdinit_setup+0x2b/0x2b [ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80 [ 5413.244912] kernel_init_freeable+0x253/0x28f [ 5413.249273] ? rest_init+0x250/0x250 [ 5413.252846] kernel_init+0xa/0x110 [ 5413.256257] ret_from_fork+0x22/0x30 2) An srcu_struct structure that is initialized before rcu_init_geometry() and used afterward will always have stale rdp->mynode references, resulting in callbacks to be missed in srcu_gp_end(), just like in the previous scenario. This commit therefore causes init_srcu_struct_nodes to initialize the geometry, if needed. This ensures that the srcu_node hierarchy is properly built and distributed from the get-go. Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-20 16:05:38 +02:00
Christian Brauner	811763e3be	cgroup: verify that source is a string commit `3b0462726e` upstream. The following sequence can be used to trigger a UAF: int fscontext_fd = fsopen("cgroup"); int fd_null = open("/dev/null, O_RDONLY); int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null); close_range(3, ~0U, 0); The cgroup v1 specific fs parser expects a string for the "source" parameter. However, it is perfectly legitimate to e.g. specify a file descriptor for the "source" parameter. The fs parser doesn't know what a filesystem allows there. So it's a bug to assume that "source" is always of type fs_value_is_string when it can reasonably also be fs_value_is_file. This assumption in the cgroup code causes a UAF because struct fs_parameter uses a union for the actual value. Access to that union is guarded by the param->type member. Since the cgroup paramter parser didn't check param->type but unconditionally moved param->string into fc->source a close on the fscontext_fd would trigger a UAF during put_fs_context() which frees fc->source thereby freeing the file stashed in param->file causing a UAF during a close of the fd_null. Fix this by verifying that param->type is actually a string and report an error if not. In follow up patches I'll add a new generic helper that can be used here and by other filesystems instead of this error-prone copy-pasta fix. But fixing it in here first makes backporting a it to stable a lot easier. Fixes: `8d2451f499` ("cgroup1: switch to option-by-option parsing") Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@kernel.org> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:05:36 +02:00
Steven Rostedt (VMware)	905169794d	tracing: Do not reference char * as a string in histograms commit `704adfb5a9` upstream. The histogram logic was allowing events with char * pointers to be used as normal strings. But it was easy to crash the kernel with: # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger And open some files, and boom! BUG: unable to handle page fault for address: 00007f2ced0c3280 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strlen+0x0/0x20 Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246 RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280 RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98 R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074 R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280 FS: 00007f2ced0f8580(0000) GS:ffff93825a800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0 Call Trace: event_hist_trigger+0x463/0x5f0 ? find_held_lock+0x32/0x90 ? sched_clock_cpu+0xe/0xd0 ? lock_release+0x155/0x440 ? kernel_init_free_pages+0x6d/0x90 ? preempt_count_sub+0x9b/0xd0 ? kernel_init_free_pages+0x6d/0x90 ? get_page_from_freelist+0x12c4/0x1680 ? __rb_reserve_next+0xe5/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 event_triggers_call+0x52/0xe0 ftrace_syscall_enter+0x264/0x2c0 syscall_trace_enter.constprop.0+0x1ee/0x210 do_syscall_64+0x1c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Where it triggered a fault on strlen(key) where key was the filename. The reason is that filename is a char * to user space, and the histogram code just blindly dereferenced it, with obvious bad results. I originally tried to use strncpy_from_user/kernel_nofault() but found that there's other places that its dereferenced and not worth the effort. Just do not allow "char *" to act like strings. Link: https://lkml.kernel.org/r/20210715000206.025df9d2@rorschach.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Tom Zanussi <zanussi@kernel.org> Fixes: `79e577cbce` ("tracing: Support string type key properly") Fixes: `5967bd5c42` ("tracing: Let filter_assign_type() detect FILTER_PTR_STRING") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-20 16:05:36 +02:00
Greg Kroah-Hartman	8db62be3c3	This is the 5.10.51 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmD1LZUACgkQONu9yGCS aT7gERAArjYemBSkD4/nq5HvxoVu7ueEyqI2orJCyB6b5npPrBZjlKna3SuYuNUF CmcX5Y2Ynxd3gWJvYFEJdAAQAEtKAzPdPv0QJ+KiLSP2bEZ4Q5grEJXfzcgxcB4L fUfaCZnUwjoII5xzW1+U2zCJ7YtGx5hZySLKMxSEc0IzawDlfMx4HdwlohXzczsA Zq3/sTJcW30PWSp6MSuMOH//lPh7sAoCnksAv4Yb8MZYZjC8JNnKFn+IwRUGWEMZ sFtNbq51sMgGq4TjYBIdO6wBElP4dgWJhYc4cO0667cDvgp6iod/bKlOLJSiwIVX uPWkyPihH9ZUvNVY0TxjjnxS1rnM0QhH+cEXNGn+SE7KaNmzsI2MaR+DVA2yWcFr 9edqTq5x2CJJ0R/oXHP4nYFtsvlV/QcirlrF0OZHYwz84b16f37Ac7tHOQpr3kNO N29AW0l5XmpxbfHgo1Iaoi02seouLC47vRkvjTpS9mValGUC0ciXTb97CK4FremE 34rskxIRWBU8HECYioFOeHTAi0+xb/9tOj87BnB5CJ28CD2Md27TTdorsISnqPb0 /ER89QfVtlJLi17wGB0rlAm8fDF3Cy6BnA57QIql1z1NbJGc1cxenEAFiIOnbQ5G t6SLs3mgqpQizsKUFYimCWa04ZzY4Bg8H9bAI+M+9w7J6yujQzI= =dGMa -----END PGP SIGNATURE----- Merge 5.10.51 into android12-5.10-lts Changes in 5.10.51 drm/mxsfb: Don't select DRM_KMS_FB_HELPER drm/zte: Don't select DRM_KMS_FB_HELPER drm/ast: Fixed CVE for DP501 drm/amd/display: fix HDCP reset sequence on reinitialize drm/amd/amdgpu/sriov disable all ip hw status by default drm/vc4: fix argument ordering in vc4_crtc_get_margins() drm/bridge: nwl-dsi: Force a full modeset when crtc_state->active is changed to be true net: pch_gbe: Use proper accessors to BE data in pch_ptp_match() drm/amd/display: fix use_max_lb flag for 420 pixel formats clk: renesas: rcar-usb2-clock-sel: Fix error handling in .probe() hugetlb: clear huge pte during flush function on mips platform atm: iphase: fix possible use-after-free in ia_module_exit() mISDN: fix possible use-after-free in HFC_cleanup() atm: nicstar: Fix possible use-after-free in nicstar_cleanup() net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT drm/mediatek: Fix PM reference leak in mtk_crtc_ddp_hw_init() net: mdio: ipq8064: add regmap config to disable REGCACHE drm/bridge: lt9611: Add missing MODULE_DEVICE_TABLE reiserfs: add check for invalid 1st journal block drm/virtio: Fix double free on probe failure net: mdio: provide shim implementation of devm_of_mdiobus_register net/sched: cls_api: increase max_reclassify_loop pinctrl: equilibrium: Add missing MODULE_DEVICE_TABLE drm/scheduler: Fix hang when sched_entity released drm/sched: Avoid data corruptions udf: Fix NULL pointer dereference in udf_symlink function drm/vc4: Fix clock source for VEC PixelValve on BCM2711 drm/vc4: hdmi: Fix PM reference leak in vc4_hdmi_encoder_pre_crtc_co() e100: handle eeprom as little endian igb: handle vlan types with checker enabled igb: fix assignment on big endian machines drm/bridge: cdns: Fix PM reference leak in cdns_dsi_transfer() clk: renesas: r8a77995: Add ZA2 clock net/mlx5e: IPsec/rep_tc: Fix rep_tc_update_skb drops IPsec packet net/mlx5: Fix lag port remapping logic drm: rockchip: add missing registers for RK3188 drm: rockchip: add missing registers for RK3066 net: stmmac: the XPCS obscures a potential "PHY not found" error RDMA/rtrs: Change MAX_SESS_QUEUE_DEPTH clk: tegra: Fix refcounting of gate clocks clk: tegra: Ensure that PLLU configuration is applied properly drm: bridge: cdns-mhdp8546: Fix PM reference leak in virtio-net: Add validation for used length ipv6: use prandom_u32() for ID generation MIPS: cpu-probe: Fix FPU detection on Ingenic JZ4760(B) MIPS: ingenic: Select CPU_SUPPORTS_CPUFREQ && MIPS_EXTERNAL_TIMER drm/amd/display: Avoid HDCP over-read and corruption drm/amdgpu: remove unsafe optimization to drop preamble ib net: tcp better handling of reordering then loss cases RDMA/cxgb4: Fix missing error code in create_qp() dm space maps: don't reset space map allocation cursor when committing dm writecache: don't split bios when overwriting contiguous cache content dm: Fix dm_accept_partial_bio() relative to zone management commands net: bridge: mrp: Update ring transitions. pinctrl: mcp23s08: fix race condition in irq handler ice: set the value of global config lock timeout longer ice: fix clang warning regarding deadcode.DeadStores virtio_net: Remove BUG() to avoid machine dead net: mscc: ocelot: check return value after calling platform_get_resource() net: bcmgenet: check return value after calling platform_get_resource() net: mvpp2: check return value after calling platform_get_resource() net: micrel: check return value after calling platform_get_resource() net: moxa: Use devm_platform_get_and_ioremap_resource() drm/amd/display: Fix DCN 3.01 DSCCLK validation drm/amd/display: Update scaling settings on modeset drm/amd/display: Release MST resources on switch from MST to SST drm/amd/display: Set DISPCLK_MAX_ERRDET_CYCLES to 7 drm/amd/display: Fix off-by-one error in DML net: phy: realtek: add delay to fix RXC generation issue selftests: Clean forgotten resources as part of cleanup() net: sgi: ioc3-eth: check return value after calling platform_get_resource() drm/amdkfd: use allowed domain for vmbo validation fjes: check return value after calling platform_get_resource() selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC r8169: avoid link-up interrupt issue on RTL8106e if user enables ASPM drm/amd/display: Verify Gamma & Degamma LUT sizes in amdgpu_dm_atomic_check xfrm: Fix error reporting in xfrm_state_construct. dm writecache: commit just one block, not a full page wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP wl1251: Fix possible buffer overflow in wl1251_cmd_scan cw1200: add missing MODULE_DEVICE_TABLE drm/amdkfd: fix circular locking on get_wave_state drm/amdkfd: Fix circular lock in nocpsch path bpf: Fix up register-based shifts in interpreter to silence KUBSAN ice: fix incorrect payload indicator on PTYPE ice: mark PTYPE 2 as reserved mt76: mt7615: fix fixed-rate tx status reporting net: fix mistake path for netdev_features_strings net: ipa: Add missing of_node_put() in ipa_firmware_load() net: sched: fix error return code in tcf_del_walker() io_uring: fix false WARN_ONCE drm/amdgpu: fix bad address translation for sienna_cichlid drm/amdkfd: Walk through list with dqm lock hold mt76: mt7915: fix IEEE80211_HE_PHY_CAP7_MAX_NC for station mode rtl8xxxu: Fix device info for RTL8192EU devices MIPS: add PMD table accounting into MIPS'pmd_alloc_one net: fec: add ndo_select_queue to fix TX bandwidth fluctuations atm: nicstar: use 'dma_free_coherent' instead of 'kfree' atm: nicstar: register the interrupt handler in the right place vsock: notify server to shutdown when client has pending signal RDMA/rxe: Don't overwrite errno from ib_umem_get() iwlwifi: mvm: don't change band on bound PHY contexts iwlwifi: mvm: fix error print when session protection ends iwlwifi: pcie: free IML DMA memory allocation iwlwifi: pcie: fix context info freeing sfc: avoid double pci_remove of VFs sfc: error code if SRIOV cannot be disabled wireless: wext-spy: Fix out-of-bounds warning cfg80211: fix default HE tx bitrate mask in 2G band mac80211: consider per-CPU statistics if present mac80211_hwsim: add concurrent channels scanning support over virtio IB/isert: Align target max I/O size to initiator size media, bpf: Do not copy more entries than user space requested net: ip: avoid OOM kills with large UDP sends over loopback RDMA/cma: Fix rdma_resolve_route() memory leak Bluetooth: btusb: Fixed too many in-token issue for Mediatek Chip. Bluetooth: Fix the HCI to MGMT status conversion table Bluetooth: Fix alt settings for incoming SCO with transparent coding format Bluetooth: Shutdown controller after workqueues are flushed or cancelled Bluetooth: btusb: Add a new QCA_ROME device (0cf3:e500) Bluetooth: L2CAP: Fix invalid access if ECRED Reconfigure fails Bluetooth: L2CAP: Fix invalid access on ECRED Connection response Bluetooth: btusb: Add support USB ALT 3 for WBS Bluetooth: mgmt: Fix the command returns garbage parameter value Bluetooth: btusb: fix bt fiwmare downloading failure issue for qca btsoc. sched/fair: Ensure _sum and _avg values stay consistent bpf: Fix false positive kmemleak report in bpf_ringbuf_area_alloc() flow_offload: action should not be NULL when it is referenced sctp: validate from_addr_param return sctp: add size validation when walking chunks MIPS: loongsoon64: Reserve memory below starting pfn to prevent Oops MIPS: set mips32r5 for virt extensions selftests/resctrl: Fix incorrect parsing of option "-t" MIPS: MT extensions are not available on MIPS32r1 ath11k: unlock on error path in ath11k_mac_op_add_interface() arm64: dts: rockchip: add rk3328 dwc3 usb controller node arm64: dts: rockchip: Enable USB3 for rk3328 Rock64 loop: fix I/O error on fsync() in detached loop devices mm,hwpoison: return -EBUSY when migration fails io_uring: simplify io_remove_personalities() io_uring: Convert personality_idr to XArray io_uring: convert io_buffer_idr to XArray scsi: iscsi: Fix race condition between login and sync thread scsi: iscsi: Fix iSCSI cls conn state powerpc/mm: Fix lockup on kernel exec fault powerpc/barrier: Avoid collision with clang's __lwsync macro powerpc/powernv/vas: Release reference to tgid during window close drm/amdgpu: Update NV SIMD-per-CU to 2 drm/amdgpu: enable sdma0 tmz for Raven/Renoir(V2) drm/radeon: Add the missed drm_gem_object_put() in radeon_user_framebuffer_create() drm/radeon: Call radeon_suspend_kms() in radeon_pci_shutdown() for Loongson64 drm/vc4: txp: Properly set the possible_crtcs mask drm/vc4: crtc: Skip the TXP drm/vc4: hdmi: Prevent clock unbalance drm/dp: Handle zeroed port counts in drm_dp_read_downstream_info() drm/rockchip: dsi: remove extra component_del() call drm/amd/display: fix incorrrect valid irq check pinctrl/amd: Add device HID for new AMD GPIO controller drm/amd/display: Reject non-zero src_y and src_x for video planes drm/tegra: Don't set allow_fb_modifiers explicitly drm/msm/mdp4: Fix modifier support enabling drm/arm/malidp: Always list modifiers drm/nouveau: Don't set allow_fb_modifiers explicitly drm/i915/display: Do not zero past infoframes.vsc mmc: sdhci-acpi: Disable write protect detection on Toshiba Encore 2 WT8-B mmc: sdhci: Fix warning message when accessing RPMB in HS400 mode mmc: core: clear flags before allowing to retune mmc: core: Allow UHS-I voltage switch for SDSC cards if supported ata: ahci_sunxi: Disable DIPM arm64: tlb: fix the TTL value of tlb_get_level cpu/hotplug: Cure the cpusets trainwreck clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround fpga: stratix10-soc: Add missing fpga_mgr_free() call ASoC: tegra: Set driver_name=tegra for all machine drivers i40e: fix PTP on 5Gb links qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute ipmi/watchdog: Stop watchdog timer when the current action is 'none' thermal/drivers/int340x/processor_thermal: Fix tcc setting ubifs: Fix races between xattr_{set\|get} and listxattr operations power: supply: ab8500: Fix an old bug mfd: syscon: Free the allocated name field of struct regmap_config nvmem: core: add a missing of_node_put lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE selftests/lkdtm: Fix expected text for CR4 pinning extcon: intel-mrfld: Sync hardware and software state on init seq_buf: Fix overflow in seq_buf_putmem_hex() rq-qos: fix missed wake-ups in rq_qos_throttle try two tracing: Simplify & fix saved_tgids logic tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe coresight: Propagate symlink failure coresight: tmc-etf: Fix global-out-of-bounds in tmc_update_etf_buffer() dm zoned: check zone capacity dm writecache: flush origin device when writing and cache is full dm btree remove: assign new_root only when removal succeeds PCI: Leave Apple Thunderbolt controllers on for s2idle or standby PCI: aardvark: Fix checking for PIO Non-posted Request PCI: aardvark: Implement workaround for the readback value of VEND_ID media: subdev: disallow ioctl for saa6588/davinci media: dtv5100: fix control-request directions media: zr364xx: fix memory leak in zr364xx_start_readpipe media: gspca/sq905: fix control-request direction media: gspca/sunplus: fix zero-length control requests media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K io_uring: fix clear IORING_SETUP_R_DISABLED in wrong function dm writecache: write at least 4k when committing pinctrl: mcp23s08: Fix missing unlock on error in mcp23s08_irq() drm/ast: Remove reference to struct drm_device.pdev jfs: fix GPF in diFree smackfs: restrict bytes count in smk_set_cipso() ext4: fix memory leak in ext4_fill_super f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances Linux 5.10.51 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Icb10fed733a0050848ecc23db13ae3d134895acd	2021-07-19 17:29:53 +02:00
Paul Burton	eb81b5a37d	tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT commit `4030a6e6a6` upstream. Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that on systems where pid_max is configured higher than PID_MAX_DEFAULT the ftrace record-tgid option doesn't work so well. Any tasks with PIDs higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and don't show up in the saved_tgids file. In particular since systemd v243 & above configure pid_max to its highest possible 1<<22 value by default on 64 bit systems this renders the record-tgids option of little use. Increase the size of tgid_map to the configured pid_max instead, allowing it to cover the full range of PIDs up to the maximum value of PID_MAX_LIMIT if the system is configured that way. On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in memory overhead sounds significant 64 bit systems are presumably best placed to accommodate it, and since tgid_map is only allocated when the record-tgid option is actually used presumably the user would rather it spends sufficient memory to actually record the tgids they expect. The size of tgid_map could also increase for CONFIG_BASE_SMALL=y configurations, but these seem unlikely to be systems upon which people are both configuring a large pid_max and running ftrace with record-tgid anyway. Of note is that we only allocate tgid_map once, the first time that the record-tgid option is enabled. Therefore its size is only set once, to the value of pid_max at the time the record-tgid option is first enabled. If a user increases pid_max after that point, the saved_tgids file will not contain entries for any tasks with pids beyond the earlier value of pid_max. Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com Fixes: `d914ba37d7` ("tracing: Add support for recording tgid of tasks") Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joelaf@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Burton <paulburton@google.com> [ Fixed comment coding style ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:45:01 +02:00
Paul Burton	3cda5b7f4e	tracing: Simplify & fix saved_tgids logic commit `b81b3e959a` upstream. The tgid_map array records a mapping from pid to tgid, where the index of an entry within the array is the pid & the value stored at that index is the tgid. The saved_tgids_next() function iterates over pointers into the tgid_map array & dereferences the pointers which results in the tgid, but then it passes that dereferenced value to trace_find_tgid() which treats it as a pid & does a further lookup within the tgid_map array. It seems likely that the intent here was to skip over entries in tgid_map for which the recorded tgid is zero, but instead we end up skipping over entries for which the thread group leader hasn't yet had its own tgid recorded in tgid_map. A minimal fix would be to remove the call to trace_find_tgid, turning: if (trace_find_tgid(ptr)) into: if (ptr) ..but it seems like this logic can be much simpler if we simply let seq_read() iterate over the whole tgid_map array & filter out empty entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that approach, removing the incorrect logic here entirely. Link: https://lkml.kernel.org/r/20210630003406.4013668-1-paulburton@google.com Fixes: `d914ba37d7` ("tracing: Add support for recording tgid of tasks") Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joelaf@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Burton <paulburton@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:45:00 +02:00
Jan Kara	8cc58a6e2c	rq-qos: fix missed wake-ups in rq_qos_throttle try two commit `11c7aa0dde` upstream. Commit `545fbd0775` ("rq-qos: fix missed wake-ups in rq_qos_throttle") tried to fix a problem that a process could be sleeping in rq_qos_wait() without anyone to wake it up. However the fix is not complete and the following can still happen: CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker) rq_qos_wait() rq_qos_wait() acquire_inflight_cb() -> fails acquire_inflight_cb() -> fails completes IOs, inflight decreased prepare_to_wait_exclusive() prepare_to_wait_exclusive() has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers has_sleeper = !wq_has_single_sleeper() -> true io_schedule() io_schedule() Deadlock as now there's nobody to wakeup the two waiters. The logic automatically blocking when there are already sleepers is really subtle and the only way to make it work reliably is that we check whether there are some waiters in the queue when adding ourselves there. That way, we are guaranteed that at least the first process to enter the wait queue will recheck the waiting condition before going to sleep and thus guarantee forward progress. Fixes: `545fbd0775` ("rq-qos: fix missed wake-ups in rq_qos_throttle") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:45:00 +02:00
Thomas Gleixner	b5e26be407	cpu/hotplug: Cure the cpusets trainwreck commit `b22afcdf04` upstream. Alexey and Joshua tried to solve a cpusets related hotplug problem which is user space visible and results in unexpected behaviour for some time after a CPU has been plugged in and the corresponding uevent was delivered. cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a workqueue. This is done because the cpusets code has already a lock nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or waiting for the work to finish with cpu_hotplug_lock held can and will deadlock because that results in the reverse lock order. As a consequence the uevent can be delivered before cpusets have consistent state which means that a user space invocation of sched_setaffinity() to move a task to the plugged CPU fails up to the point where the scheduled work has been processed. The same is true for CPU unplug, but that does not create user observable failure (yet). It's still inconsistent to claim that an operation is finished before it actually is and that's the real issue at hand. uevents just make it reliably observable. Obviously the problem should be fixed in cpusets/cgroups, but untangling that is pretty much impossible because according to the changelog of the commit which introduced this 8 years ago: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()") the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and the whole code is built around that. So bite the bullet and invoke the relevant cpuset function, which waits for the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and only when tasks are not frozen by suspend/hibernate because that would obviously wait forever. Waiting there with cpu_add_remove_lock, which is protecting the present and possible CPU maps, held is not a problem at all because neither work queues nor cpusets/cgroups have any lockchains related to that lock. Waiting in the hotplug machinery is not problematic either because there are already state callbacks which wait for hardware queues to drain. It makes the operations slightly slower, but hotplug is slow anyway. This ensures that state is consistent before returning from a hotplug up/down operation. It's still inconsistent during the operation, but that's a different story. Add a large comment which explains why this is done and why this is not a dump ground for the hack of the day to work around half thought out locking schemes. Document also the implications vs. hotplug operations and serialization or the lack of it. Thanks to Alexy and Joshua for analyzing why this temporary sched_setaffinity() failure happened. Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()") Reported-by: Alexey Klimov <aklimov@redhat.com> Reported-by: Joshua Baker <jobaker@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexey Klimov <aklimov@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-07-19 09:44:59 +02:00
Rustam Kovhaev	a61af01141	bpf: Fix false positive kmemleak report in bpf_ringbuf_area_alloc() [ Upstream commit `ccff81e1d0` ] kmemleak scans struct page, but it does not scan the page content. If we allocate some memory with kmalloc(), then allocate page with alloc_page(), and if we put kmalloc pointer somewhere inside that page, kmemleak will report kmalloc pointer as a false positive. We can instruct kmemleak to scan the memory area by calling kmemleak_alloc() and kmemleak_free(), but part of struct bpf_ringbuf is mmaped to user space, and if struct bpf_ringbuf changes we would have to revisit and review size argument in kmemleak_alloc(), because we do not want kmemleak to scan the user space memory. Let's simplify things and use kmemleak_not_leak() here. For posterity, also adding additional prior analysis from Andrii: I think either kmemleak or syzbot are misreporting this. I've added a bunch of printks around all allocations performed by BPF ringbuf. [...] On repro side I get these two warnings: [vmuser@archvm bpf]$ sudo ./repro BUG: memory leak unreferenced object 0xffff88810d538c00 (size 64): comm "repro", pid 2140, jiffies 4294692933 (age 14.540s) hex dump (first 32 bytes): 00 af 19 04 00 ea ff ff c0 ae 19 04 00 ea ff ff ................ 80 ae 19 04 00 ea ff ff c0 29 2e 04 00 ea ff ff .........)...... backtrace: [<0000000077bfbfbd>] __bpf_map_area_alloc+0x31/0xc0 [<00000000587fa522>] ringbuf_map_alloc.cold.4+0x48/0x218 [<0000000044d49e96>] __do_sys_bpf+0x359/0x1d90 [<00000000f601d565>] do_syscall_64+0x2d/0x40 [<0000000043d3112a>] entry_SYSCALL_64_after_hwframe+0x44/0xae BUG: memory leak unreferenced object 0xffff88810d538c80 (size 64): comm "repro", pid 2143, jiffies 4294699025 (age 8.448s) hex dump (first 32 bytes): 80 aa 19 04 00 ea ff ff 00 ab 19 04 00 ea ff ff ................ c0 ab 19 04 00 ea ff ff 80 44 28 04 00 ea ff ff .........D(..... backtrace: [<0000000077bfbfbd>] __bpf_map_area_alloc+0x31/0xc0 [<00000000587fa522>] ringbuf_map_alloc.cold.4+0x48/0x218 [<0000000044d49e96>] __do_sys_bpf+0x359/0x1d90 [<00000000f601d565>] do_syscall_64+0x2d/0x40 [<0000000043d3112a>] entry_SYSCALL_64_after_hwframe+0x44/0xae Note that both reported leaks (ffff88810d538c80 and ffff88810d538c00) correspond to pages array bpf_ringbuf is allocating and tracking properly internally. Note also that syzbot repro doesn't close FD of created BPF ringbufs, and even when ./repro itself exits with error, there are still two forked processes hanging around in my system. So clearly ringbuf maps are alive at that point. So reporting any memory leak looks weird at that point, because that memory is being used by active referenced BPF ringbuf. It's also a question why repro doesn't clean up its forks. But if I do a `pkill repro`, I do see that all the allocated memory is /properly/ cleaned up [and the] "leaks" are deallocated properly. BTW, if I add close() right after bpf() syscall in syzbot repro, I see that everything is immediately deallocated, like designed. And no memory leak is reported. So I don't think the problem is anywhere in bpf_ringbuf code, rather in the leak detection and/or repro itself. Reported-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> [ Daniel: also included analysis from Andrii to the commit log ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+5d895828587f49e7fe9b@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4BzYk+dqs+jwu6VKXP-RttcTEGFe+ySTGWT9CRNkagDiJVA@mail.gmail.com Link: https://lore.kernel.org/lkml/YNTAqiE7CWJhOK2M@nuc10 Link: https://lore.kernel.org/lkml/20210615101515.GC26027@arm.com Link: https://syzkaller.appspot.com/bug?extid=5d895828587f49e7fe9b Link: https://lore.kernel.org/bpf/20210626181156.1873604-1-rkovhaev@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-19 09:44:54 +02:00
Odin Ugedal	20285dc271	sched/fair: Ensure _sum and _avg values stay consistent [ Upstream commit `1c35b07e6d` ] The _sum and _avg values are in general sync together with the PELT divider. They are however not always completely in perfect sync, resulting in situations where _sum gets to zero while _avg stays positive. Such situations are undesirable. This comes from the fact that PELT will increase period_contrib, also increasing the PELT divider, without updating _sum and _avg values to stay in perfect sync where (_sum == _avg * divider). However, such PELT change will never lower _sum, making it impossible to end up in a situation where _sum is zero and _avg is not. Therefore, we need to ensure that when subtracting load outside PELT, that when _sum is zero, _avg is also set to zero. This occurs when (_sum < _avg * divider), and the subtracted (_avg * divider) is bigger or equal to the current _sum, while the subtracted _avg is smaller than the current _avg. Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20210624111815.57937-1-odin@uged.al Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-19 09:44:54 +02:00
Daniel Borkmann	2e66c36f13	bpf: Fix up register-based shifts in interpreter to silence KUBSAN [ Upstream commit `28131e9d93` ] syzbot reported a shift-out-of-bounds that KUBSAN observed in the interpreter: [...] UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2 shift exponent 255 is too large for 64-bit type 'long long unsigned int' CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 ___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420 __bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735 bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline] bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline] bpf_prog_run_clear_cb include/linux/filter.h:755 [inline] run_filter+0x1a1/0x470 net/packet/af_packet.c:2031 packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104 dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387 xmit_one net/core/dev.c:3588 [inline] dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609 __dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182 __bpf_tx_skb net/core/filter.c:2116 [inline] __bpf_redirect_no_mac net/core/filter.c:2141 [inline] __bpf_redirect+0x548/0xc80 net/core/filter.c:2164 ____bpf_clone_redirect net/core/filter.c:2448 [inline] bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420 ___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523 __bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737 bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline] bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50 bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582 bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline] __do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Generally speaking, KUBSAN reports from the kernel should be fixed. However, in case of BPF, this particular report caused concerns since the large shift is not wrong from BPF point of view, just undefined. In the verifier, K-based shifts that are >= {64,32} (depending on the bitwidth of the instruction) are already rejected. The register-based cases were not given their content might not be known at verification time. Ideas such as verifier instruction rewrite with an additional AND instruction for the source register were brought up, but regularly rejected due to the additional runtime overhead they incur. As Edward Cree rightly put it: Shifts by more than insn bitness are legal in the BPF ISA; they are implementation-defined behaviour [of the underlying architecture], rather than UB, and have been made legal for performance reasons. Each of the JIT backends compiles the BPF shift operations to machine instructions which produce implementation-defined results in such a case; the resulting contents of the register may be arbitrary but program behaviour as a whole remains defined. Guard checks in the fast path (i.e. affecting JITted code) will thus not be accepted. The case of division by zero is not truly analogous here, as division instructions on many of the JIT-targeted architectures will raise a machine exception / fault on division by zero, whereas (to the best of my knowledge) none will do so on an out-of-bounds shift. Given the KUBSAN report only affects the BPF interpreter, but not JITs, one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run(). That would make the shifts defined, and thus shuts up KUBSAN, and the compiler would optimize out the AND on any CPU that interprets the shift amounts modulo the width anyway (e.g., confirmed from disassembly that on x86-64 and arm64 the generated interpreter code is the same before and after this fix). The BPF interpreter is slow path, and most likely compiled out anyway as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of BPF instructions by the interpreter. Given the main argument was to avoid sacrificing performance, the fact that the AND is optimized away from compiler for mainstream archs helps as well as a solution moving forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors to provide guidance when they see the ___bpf_prog_run() interpreter code and use it as a model for a new JIT backend. Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com Reported-by: Kurt Manucredo <fuzzybritches0@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Co-developed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com Cc: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-19 09:44:50 +02:00
Elliot Berman	0bb433e014	ANDROID: debug_symbols: Add android_debug_for_each_module Allow vendors to obtain a list of modules loaded at given time. Vendor modules are able to register on part of notifier chain (register_module_notifer), but a vendor module would never see modules which are loaded before the one which registers on the notifier chain. The kernel doesn't offer load order control, so a hook is necessary to iterate through currently loaded kernel modules. Bug: 193552324 Change-Id: I3b01cc1b90f8c0c7c21a37992cc7d607316efc7b Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>	2021-07-15 13:59:25 -07:00
Greg Kroah-Hartman	1328352dcd	Merge branch 'android12-5.10' into `android12-5.10-lts` Sync up with android12-5.10 for the following commits: `870488eb07` ANDROID: GKI: 7/14/2021 KMI update `e9742a9ea5` ANDROID: Update the ABI symbol list `ec2190fd3f` FROMLIST: arm64: avoid double ISB on kernel entry `98b2c1dd1c` FROMLIST: arm64: mte: optimize GCR_EL1 modification on kernel entry/exit `a20103c331` BACKPORT: FROMLIST: arm64: mte: avoid TFSR related operations unless in async mode `3972be647a` FROMLIST: Documentation: document the preferred tag checking mode feature `5adf29adb5` FROMLIST: arm64: mte: introduce a per-CPU tag checking mode preference `ce5ba15abc` FROMLIST: arm64: move preemption disablement to prctl handlers `6c08feaa27` FROMLIST: arm64: mte: change ASYNC and SYNC TCF settings into bitfields `f438cf16cd` FROMLIST: arm64: mte: rename gcr_user_excl to mte_ctrl `a4c9e551b6` BACKPORT: arm64: pac: Optimize kernel entry/exit key installation code paths `50829b8901` BACKPORT: arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) `6119d18df7` ANDROID: cleancache: add oem data to cleancache_ops `a0c429e8e1` ANDROID: blkdev: add oem data to block_device_operations `26cd2564e1` FROMLIST: psi: stop relying on timer_pending for poll_work rescheduling `e85b291d7d` ANDROID: GKI: Enable CONFIG_MEMCG `0ed7424fa0` ANDROID: GKI: net: add vendor hooks for 'struct sock' lifecycle `4d30956478` ANDROID: GKI: net: add vendor hooks for 'struct nf_conn' lifecycle `7786463e48` ANDROID: GKI: add vendor padding variable in struct sock `280c9b98aa` ANDROID: GKI: add vendor padding variable in struct nf_conn `9d1b55d20a` ANDROID: vendor_hooks: add a field in mem_cgroup `65115fdbf8` ANDROID: vendor_hooks: add a field in pglist_data `26920e0f3a` FROMLIST: usb: dwc3: avoid NULL access of usb_gadget_driver `5bb2dd8d39` FROMGIT: usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by default `79274dbb00` FROMGIT: usb: dwc3: Resize TX FIFOs to meet EP bursting requirements `1e11f36199` FROMGIT: usb: gadget: configfs: Check USB configuration before adding `6da5e7afbf` FROMGIT: usb: gadget: udc: core: Introduce check_config to verify USB configuration `2ed5fbf261` ANDROID: GKI: fscrypt: add OEM data to struct fscrypt_operations `194fd9239a` ANDROID: GKI: fscrypt: add ABI padding to struct fscrypt_operations `8011eb2215` ANDROID: mm: provision to add shmem pages to inactive file lru head `9bb1247653` ANDROID: GKI: Enable CONFIG_CGROUP_NET_PRIO `a1ce719ca7` ANDROID: Delete the DMA-BUF attachment sysfs statistics `a2b3afb2f7` ANDROID: android: Add symbols to debug_symbols driver `914a7b14a0` UPSTREAM: USB: UDC core: Add udc_async_callbacks gadget op `9af9ef8dfa` ANDROID: vendor_hooks: Add oem data to file struct `37485a3025` ANDROID: add kabi padding for structures for the android12 release `429c78f9b0` ANDROID: GKI: device.h: add Android ABI padding to some structures `aea5e1c230` ANDROID: GKI: elevator: add Android ABI padding to some structures `1b79ef2754` ANDROID: GKI: scsi: add Android ABI padding to some structures `33175403b9` ANDROID: GKI: workqueue.h: add Android ABI padding to some structures `d5c344a498` ANDROID: GKI: sched: add Android ABI padding to some structures `9c4854fa5a` ANDROID: GKI: phy: add Android ABI padding to some structures `f4872b2353` ANDROID: GKI: fs.h: add Android ABI padding to some structures `48cddc7c42` ANDROID: GKI: dentry: add Android ABI padding to some structures `b9081a2925` ANDROID: GKI: bio: add Android ABI padding to some structures `99bf8cf8fa` ANDROID: GKI: ufs: add Android ABI padding to some structures `9df147298f` ANDROID: Update the generic symbol list `12f48605e8` ANDROID: mm: cma do not sleep for __GFP_NORETRY `0e688e972d` ANDROID: mm: cma: skip problematic pageblock `9938b82be1` ANDROID: mm: bail out tlb free batching on page zapping when cma is going on `c8578a3e90` ANDROID: mm: lru_cache_disable skips lru cache drainnig `c01ce3b5ef` ANDROID: mm: do not try test_page_isoalte if migration fails `675e504598` ANDROID: mm: add cma allocation statistics `b1e4543c27` UPSTREAM: mm, page_alloc: move draining pcplists to page isolation users `13bc06efd9` ANDROID: ALSA: compress: add vendor hook to support pause in draining `2faed77792` ANDROID: vendor_hooks: add vendor hook in blk_mq_rq_ctx_init() `292baba45a` ANDROID: abi_gki_aarch64_qcom: Add I3C core symbols to qcom tree `eecc725a8e` ANDROID: vendor_hooks: add vendor hook in blk_mq_alloc_rqs() `9c2958f454` ANDROID: GKI: Export put_task_stack symbol `288805c86a` ANDROID: abi_gki_aarch64_qcom: Add idr_alloc_u32 `e8516fd3af` ANDROID: sound: usb: add vendor hook for cpu suspend support `d820d22b5d` ANDROID: mm: page_pinner: use EXPORT_SYMBOL_GPL `efc09793ea` ANDROID: GKI: update allowed GKI symbol for Exynosauto SoC `67e3e39eb1` ANDROID: GKI: sync allowed list for exynosauto SoC `d25e256373` ANDROID: ABI: add new symbols required by fips140.ko `50661975be` ANDROID: fips140: add/update module help text `b7397e89db` ANDROID: fips140: add power-up cryptographic self-tests `bd7d13c36e` ANDROID: arm64: disable LSE when building the FIPS140 module `1061ef0493` ANDROID: jump_label: disable jump labels in fips140.ko `dcf509fea7` ANDROID: ipv6: add vendor hook for gen ipv6 link-local addr `018332e871` ANDROID: Revert "scsi: block: Do not accept any requests while suspended" `2ad2c3a25b` ANDROID: abi_gki_aarch64_qcom: whitelist vm_event_states `7bcfde2601` ANDROID: ashmem: Export is_ashmem_file Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I7d60121fa9f25f007dab97dd666adcdb1964afc8	2021-07-15 17:17:09 +02:00
Peter Collingbourne	50829b8901	BACKPORT: arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) This change introduces a prctl that allows the user program to control which PAC keys are enabled in a particular task. The main reason why this is useful is to enable a userspace ABI that uses PAC to sign and authenticate function pointers and other pointers exposed outside of the function, while still allowing binaries conforming to the ABI to interoperate with legacy binaries that do not sign or authenticate pointers. The idea is that a dynamic loader or early startup code would issue this prctl very early after establishing that a process may load legacy binaries, but before executing any PAC instructions. This change adds a small amount of overhead to kernel entry and exit due to additional required instruction sequences. On a DragonBoard 845c (Cortex-A75) with the powersave governor, the overhead of similar instruction sequences was measured as 4.9ns when simulating the common case where IA is left enabled, or 43.7ns when simulating the uncommon case where IA is disabled. These numbers can be seen as the worst case scenario, since in more realistic scenarios a better performing governor would be used and a newer chip would be used that would support PAC unlike Cortex-A75 and would be expected to be faster than Cortex-A75. On an Apple M1 under a hypervisor, the overhead of the entry/exit instruction sequences introduced by this patch was measured as 0.3ns in the case where IA is left enabled, and 33.0ns in the case where IA is disabled. Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://linux-review.googlesource.com/id/Ibc41a5e6a76b275efbaa126b31119dc197b927a5 Link: https://lore.kernel.org/r/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Bug: 192536783 (cherry picked from commit `201698626f`) Change-Id: Ic0a21c92a22575f9ec3599fb67bd2931a50b9f04 [quic_eberman@quicinc.com: Resolved merge conflict in arch/arm64/kernel/process.c] Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Peter Collingbourne <pcc@google.com>	2021-07-14 20:52:05 -07:00
Suren Baghdasaryan	26cd2564e1	FROMLIST: psi: stop relying on timer_pending for poll_work rescheduling Psi polling mechanism is trying to minimize the number of wakeups to run psi_poll_work and is currently relying on timer_pending() to detect when this work is already scheduled. This provides a window of opportunity for psi_group_change to schedule an immediate psi_poll_work after poll_timer_fn got called but before psi_poll_work could reschedule itself. Below is the depiction of this entire window: poll_timer_fn wake_up_interruptible(&group->poll_wait); psi_poll_worker wait_event_interruptible(group->poll_wait, ...) psi_poll_work psi_schedule_poll_work if (timer_pending(&group->poll_timer)) return; ... mod_timer(&group->poll_timer, jiffies + delay); Prior to `461daba06b` we used to rely on poll_scheduled atomic which was reset and set back inside psi_poll_work and therefore this race window was much smaller. The larger window causes increased number of wakeups and our partners report visible power regression of ~10mA after applying `461daba06b`. Bring back the poll_scheduled atomic and make this race window even narrower by resetting poll_scheduled only when we reach polling expiration time. This does not completely eliminate the possibility of extra wakeups caused by a race with psi_group_change however it will limit it to the worst case scenario of one extra wakeup per every tracking window (0.5s in the worst case). This patch also ensures correct ordering between clearing poll_scheduled flag and obtaining changed_states using memory barrier. Correct ordering between updating changed_states and setting poll_scheduled is ensured by atomic_xchg operation. By tracing the number of immediate rescheduling attempts performed by psi_group_change and the number of these attempts being blocked due to psi monitor being already active, we can assess the effects of this change: Before the patch: Run#1 Run#2 Run#3 Immediate reschedules attempted: 684365 1385156 1261240 Immediate reschedules blocked: 682846 1381654 1258682 Immediate reschedules (delta): 1519 3502 2558 Immediate reschedules (% of attempted): 0.22% 0.25% 0.20% After the patch: Run#1 Run#2 Run#3 Immediate reschedules attempted: 882244 770298 426218 Immediate reschedules blocked: 881996 769796 426074 Immediate reschedules (delta): 248 502 144 Immediate reschedules (% of attempted): 0.03% 0.07% 0.03% The number of non-blocked immediate reschedules dropped from 0.22-0.25% to 0.03-0.07%. The drop is attributed to the decrease in the race window size and the fact that we allow this race only when psi monitors reach polling window expiration time. Fixes: `461daba06b` ("psi: eliminate kthread_worker from psi trigger scheduling mechanism") Reported-by: Kathleen Chang <yt.chang@mediatek.com> Reported-by: Wenju Xu <wenju.xu@mediatek.com> Reported-by: Jonathan Chen <jonathan.jmchen@mediatek.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: SH Chen <show-hong.chen@mediatek.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/patchwork/patch/1455172/ Bug: 191127654 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ie61547ca043e702442a9c6db1468cfb60ff2e729	2021-07-14 20:52:04 -07:00
Greg Kroah-Hartman	11b396dfd9	Revert "Add a reference to ucounts for each cred" This reverts commit `b2c4d9a33c`. It breaks the abi and is not needed on Android kernels. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I295a6c8088e4297400c1feebd78abd2f6c5edd7e	2021-07-14 19:46:17 +02:00
Greg Kroah-Hartman	049c7d395d	Revert "cred: add missing return error code when set_cred_ucounts() failed" This reverts commit `0855952ed4`. It breaks the abi and is not needed on Android kernels. Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I65001e80f3776ae4ab56be5da555dba394278c68	2021-07-14 19:46:05 +02:00

1 2 3 4 5 ...

35699 Commits